Introduction

Ni-based alloys used at high temperatures are required to have both good mechanical properties and oxidation resistance. These alloys obtain their oxidation resistance by forming adherent protective chromia and/or alumina scales1,2. Currently, the evaluation of alloys’ high-temperature oxidation still heavily relies on experimental investigations, which is costly and time-consuming. Hence, developing the capability to predict the high-temperature oxidation kinetics of multi-component alloys is highly desirable and of great interest in many applications involving extreme environments. Although computational approaches using conventional analytical and physics-based simulations have made significant advancements in investigating high-temperature oxidation, they are usually only applicable for a given length/time scale3,4,5,6 or specific oxidation-related phenomena7,8,9,10,11.

On the other hand, improvements in the machine learning (ML) approach have propelled the use of data analytics to assist the discovery of materials and the prediction of properties12,13,14,15,16,17,18,19. It possesses many advantages in handling complex multi-component alloys and offers the potential to extract insights from complex experiments or synthetic datasets generated from physics-based simulations. Thus, modern data analytics approaches can be considered an alternate and/or complementary tool to accelerate the design and development of materials with reduced cost and risk12,13,14,20,21,22.

Recently, data analytics approaches have been successfully applied to predict the mechanical properties of multi-component high-temperature alloys18,22,23,24,25,26,27. While high-temperature oxidation also has scientific and practical importance, to the best of the authors’ knowledge, very limited effort has been made to predict the oxidation kinetics of complex multi-component alloys by ML. A preliminary attempt has been made by Bhattacharya et al.28 to build ML models to predict the high-temperature oxidation kinetics, described by the parabolic rate constant (kp), of Ti alloys between 550 and 750 °C. From a statistical perspective, a good agreement between the predicted and experimental kp was achieved. However, their dataset was collected from multiple literature sources; thus, the data may exhibit mixed oxidation mechanisms due to the difference in experimental protocols (i.e., atmospheres and isothermal or cyclic exposures)26. Moreover, their work focused only on building predictive models without analyzing the relationship between input features and kp.

Herein, we present a data analytics framework consisting of correlation analysis of a consistently measured and well-curated oxidation dataset followed by ML to predict the cyclic oxidation kinetics of NiCr-based alloys, as illustrated by the workflow shown in Fig. 1. The oxidation kinetics of studied alloys was numerically represented by kp from two representative oxidation models. Correlation analysis was performed using two algorithms (Pearson’s correlation coefficient (PCC)29 and maximal information coefficient (MIC)30 to quantitatively evaluate the strength of correlation between input features (i.e., alloying compositions/oxidation temperature) and kp, and hence rank the input features. The performance of five widely used ML models, i.e., linear regression (LR)31, Bayesian ridge regression (BR)32,33, k-nearest neighbor regression (NN)34, random forest regression (RF)35, and support vector machines (SVM) regression36, in accurately predicting the kp as a function of the number of top-ranking input features is evaluated and discussed in this paper.

Fig. 1: Workflow of the present study.
figure 1

Raw experimental data from cyclic oxidation experiments were fitted with select oxidation models to represent their oxidation kinetics, resulting in a dataset. Next, correlation analysis was performed to evaluate the strength of correlation between input features and kp. Then, the performance of various ML models was evaluated.

Results and discussion

Numerical representation of cyclic oxidation data

Extensive experimental investigations of the cyclic oxidation kinetics of 25 model NiCr alloys and four commercial Ni-based alloys (i.e., Nimonic 80 A, Nimonic 90, René 41, and Haynes 282) have been performed in both dry air and wet air (air + 10% water vapor) atmospheres at Oak Ridge National Laboratory (ORNL). Figure 2 shows examples of the cyclic oxidation behavior of select NiCr alloys37,38 in the dataset. The alloy composition ranges, experimental conditions, and details of the dataset are summarized in Table 1. This consistently measured oxidation and well-curated dataset, collected over years of experiments at ORNL, enabled exploring the potential of ML to predict the oxidation kinetics of multi-component high-temperature alloys. However, raw cyclic oxidation data, which is given as a 2 × N matrix (cycle vs. mass change) per alloy, cannot be directly used in ML because singular numeric values are needed as the target for the regression-type ML training. Hence, it is necessary to find a proper numerical representation for the oxidation kinetics of each alloy so that the correlation between the input features and the target, as well as the capability of ML in modeling the high-temperature oxidation kinetics, can be evaluated.

Fig. 2: Snapshot of ORNL cyclic oxidation dataset.
figure 2

Mass gain of select alloys as a function of oxidation time in 1 h cycles.

Table 1 List of alloying elements with composition range in wt% and experimental conditions, and details of the datasets.

High-temperature oxidation of multi-component alloys is a complex dynamic process, which requires concurrent consideration of both thermodynamics and diffusion kinetics2. As shown in Fig. 2, the addition of certain minor alloying elements can significantly alter the mass change behavior of some alloys during oxidation. Thus, identifying a proper numerical representation for such complex phenomena is the key to building reliable ML models to predict the high-temperature oxidation kinetics of NiCr-based alloys. Mathematically, any mass change curve can be fitted with an arbitrary polynomial. However, parameters obtained from such regression without any physical meaning will not allow gaining insights into the oxidation mechanisms and compare alloy oxidation performance. Thus, we have focused on identifying widely accepted oxidation models with a robust formalism to determine physically meaningful parameters to represent the oxidation kinetics of the studied alloys.

As illustrated in Fig. 3, two oxidation models, i.e., a simple parabolic law (∆m = \(\sqrt {k_{\mathrm{p}}t}\), where ∆m is the mass gain and t is the oxidation time; s-kp hereafter)39,40, and a statistical cyclic oxidation model (p-kp hereafter)41, were adopted in this work. Both models have been widely used to interpret the oxidation kinetics of numerous alloys42,43,44,45,46,47,48. In this study, the s-kp model evaluated the mass change data up to 100 h to exclusively represent growth rates with the minimized influence of mass loss. The p-kp model41, consisting of two parameters, i.e., p (discrete oxide spallation probability) and kp, evaluated the complete curve, thus, enabling us to assess the capability of the ML approach to predict oxidation kinetics involving both oxide growth and loss processes. It is anticipated that the derived kp will be a good numerical representation of our raw experimental cyclic oxidation data and can serve as proper target property for analyzing the oxidation kinetics by the data analytics approach. Since the oxidation mechanisms of NiCr-based alloys in dry and wet air are different49,50, the complete experimental data after being fitted by each oxidation model were divided into two datasets based on the oxidation atmosphere. Finally, four datasets, s-kp, and p-kp in dry and wet air were generated from the existing experimental data for further analysis in this study (see Table 1).

Fig. 3: Schematic diagram of the fitted mass change curve.
figure 3

The mass gain curve of Haynes 282 in wet air at 900 °C in a 1 h cycle was fitted with the s-kp and p-kp models, respectively.

Correlation analyses

We have used MIC and PCC methods to quantitatively analyze the correlation between all input features (alloy compositions and oxidation temperature from Table 1) and kp, respectively. While PCC can evaluate the strength of the positive and negative, but only linear relationships between two variables. The MIC method can identify the strength of both linear and nonlinear relationships, but only the magnitude without a sign. Both approaches are expected to provide insights into the correlation between input features and kp from differing statistical aspects, which may inspire alloy design experts to generate alloy hypotheses23,26. Moreover, correlation analysis can facilitate the training of high-fidelity ML models using highly ranked features. For the PCC analysis, a positive correlation between a feature and kp here implies that the increase of the feature will increase kp, i.e., a higher oxidation rate, and vice versa. Figure 4 shows the PCC analysis results for the p-kp of dry and wet air datasets as examples. The remaining correlation analysis results are presented in Supplementary Figs. 1 and 2. It is worth mentioning that although as the base element, Ni content is the balance of remaining alloying elements, we still use Ni content as an input feature since ML algorithms are only solving a mathematical problem and do not know such a correlation. The oxidation temperature (T) was identified as the feature having the most substantial impact on kp in all cases. It is encouraging that correlation analysis replicated the community’s existing knowledge: oxidation temperature is typically the factor that most strongly affects oxidation rate. PCC also correctly identified that temperature has the most positive correlation with kp, consistent with the fact that the oxidation rate increases with increasing temperature.

Fig. 4: Correlation analysis of the training dataset.
figure 4

Quantified correlation scores of input features (elemental compositions and oxidation temperature) and kp determined for the p-kp model using Pearson correlation coefficient at difference atmosphere: (a) dry and (b) wet air.

It can also be observed in Fig. 4 and Supplementary Fig. 1 that chromium (Cr) as the major alloying element in the studied alloys was identified to have the most negative correlation with kp, which is consistent with the fact that the Cr addition is beneficial to the formation of an external chromia solid-state diffusion barrier that slows the oxidation reaction as it increases in thickness51,52. Most of the alloys in this dataset were designed to form protective chromia scales. Aluminum (Al) is expected to further decelerate the growth rate of the chromia scale for NiCr-based alloys53; thus, the negative coefficient of Al in wet air (Fig. 4b) is reasonable. Fe is known to form fast-growing oxides54,55. Its second strongest positive correlations with kp in both dry and wet air (Fig. 4) are in accordance with this understanding. Similar correlations were also observed for Cr and Fe in the dry and wet air (s-kp) datasets, as shown in Supplementary Fig. 1. Other than identifying linear correlation as PCC does, MIC can also identify non-linear correlations. As presented in Supplementary Fig. 2, features like T, Cr, Fe, Mo, Mn, and Ni in dry air and T, Al, Fe, and Cr in wet air that have a significant impact on the oxidation kinetics of NiCr-based alloys were generally identified as high-ranking features by the MIC analysis.

Correlations that seem to contradict known oxidation mechanisms were also observed. For example, Ni exhibits a negative correlation with kp, meaning that Ni addition will decelerate the oxidation rate. However, correlation analysis results also heavily depend on the nature of the dataset, which means that the rankings from correlation analysis here do not necessarily all make intuitive sense and mean causation23. In addition, high-temperature oxidation kinetics of multi-component alloys is controlled by chemical compositions and temperature and by factors including, but not limited to, microstructure and its evolution during oxidation, as well as oxide scale spallation/evaporation2. Although our dataset was collected over a number of years from differing studies at ORNL and could be considered a big dataset from a material science/alloy oxidation perspective, it is still small from a data science perspective. Thus, the dataset does not consistently cover the ideal composition ranges and test conditions to fully facilitate data analytics for the classes of alloys included. Certainly, this limitation could most likely be said of any existing materials dataset, which includes complex alloy compositions, structures, and behavior. However, the correctly identified correlation in Fig. 4, Supplementary Figs. 1 and 2 still clearly demonstrated the value of correlation analysis in applying data analytics to materials science. The intention of performing correlation analysis here is not only to gain insights into the impact of input features on kp but also to have a numerical basis for the selection of input features to train ML models for understanding their influence on the performance of ML models.

Machine learning

Based on the ranking of features from correlation analyses, five widely used ML models, i.e., LR, BR, NN, RF, and SVM, have been trained using different numbers of top-ranking features to evaluate their performances. For models trained with the dry and wet air p-kp datasets, Fig. 5 presents their average performance, represented by the Nash and Sutcliffe coefficient of efficiency (NSE)56, and corresponding standard deviation as a function of the numbers of top-ranking features based on the rankings of |PCC| (the absolute values of PCC in Fig. 4). The remaining results are presented in Supplementary Figs. 3 and 4. Since NSE alone could not fully qualify the performance of these models, another metric, i.e., coefficient of determination (R2 COD), and a band showing the goodness-of-fit were adopted to facilitate the assessment of ML models, as presented in Fig. 6.

Fig. 5: Performance of trained ML models.
figure 5

NSE of five trained ML models (BR: Bayesian ridge regression, LR: linear regression, NN: nearest neighbor, RF: random forest, and SVM: support vector machines regression) as a function of the number of top-ranking features based on the rankings of |PCC| (the absolute values of PCC in Fig. 4) in (a) dry and (b) wet air. Each model was trained ten times for a given set of features to determine the averaged accuracy and its standard deviation (error bar).

Fig. 6: ML parity plots.
figure 6

Experimental and SVM-predicted p-kp of three groups of alloys (I: commercial alloys, II: NiCr binary alloys, and III: NiCr alloys with additions) at different oxidation temperatures are compared for (a) dry and (b) wet air. The region between the dashed lines indicates one order of magnitude fidelity range as an acceptable deviation.

NSE here was mainly used to compare the performance of these models. As shown in Fig. 5 and Supplementary Fig. 3, the performance of ML models trained with dry air datasets shows a similar trend to those with wet air datasets. In most cases, increasing the top-ranking features from 2 to ~4 or ~6 increased the NSE of these models markedly. Intriguingly, considering more features did not considerably improve the performance of the obtained models. The NSE of the models trained with the wet air datasets is relatively lower than their dry air datasets counterparts. This trend can be attributed to the increased variation of measured oxidation kinetics of alloys studied in wet air exposure. For the dry air dataset, the maximum NSE of all models is between 0.6 and 0.75. Among the considered models, SVM has the highest NSE (>0.7) in all cases (Fig. 5 and Supplementary Figs. 3 and 4). It reaches the maximum NSE when using the top 3 to 5 features. Afterward, it gradually decreases with an increasing number of features, revealing that in this study including features with a ranking lower than the 5th is not beneficial to the performance of ML models. For the wet air dataset, the maximum NSE of all models is only between 0.45 and 0.60. In general, the top four to six features in this study were required to achieve an NSE of 0.5 or higher. SVM still exhibits the highest accuracy among considered models. As discussed in the previous section, most of these features are experimentally identified features that significantly impact the oxidation of NiCr-based alloys.

In all cases, the performance of these ML models evaluated with NSE is generally in the order of SVM>BR>RF≈NN>LR (in dry air) and SVM>BR≈LR>NN>RF (in wet air). Results show that these models are sensitive to the number of features considered in training but to different extents. SVM performs similarly to the linear-based models BR and LR, particularly for those trained with the wet air datasets. A linear kernel function (assuming the input features and kp is in a linear relationship) was adopted for the present SVM models after hyperparameter tuning. Considering more than the top six to eight features significantly degraded the performance of LR for the dry air dataset, whereas this was not the case for the wet air dataset. This could be caused by the smaller data volume of the dry air dataset than the wet air dataset. BR uses probability distributors to formulate LR, rather than point estimates used by LR and identifies the posterior distribution for the model parameters. Thus, it is believed that BR is more tolerant of overfitting32,33. This is why, in this case, the NSE of the BR model does not notably decrease with an increasing number of features. RF in this study generally requires the inclusion of the top four features to obtain the NSE of >0.65 (for dry air datasets) and >0.5 (for wet air datasets); after that, its performance was almost independent of the number of considered features.

The observed trend can be understood as follows: as an ensemble learning method, RF assigns different importance to each feature during model training; thus, less critical features would have less or even no contribution to its performance57. Therefore, the performance of RF is not sensitive to the number of features considered in training after it reaches the maximum accuracy. NN hinges on the assumption that similar things are close to each other and simply outputs the average value of data points in k-nearest neighbors58. The performance of NN usually degrades as the dimensionality of data increases significantly59, because its distance measure, i.e., the Euclidean distance, becomes less representative in a higher-dimensional space. However, as shown in Fig. 5, Supplementary Figs. 3 and 4, when the number of features increases, accompanying the increase of the dimensionality of data, the accuracy of NN does not decrease significantly. This finding indicates that the dimensionality of data in this study does not induce a significant adverse effect on the performance of NN. Since SVM possesses the best performance among all considered ML models, its performance does not significantly degrade with increasing numbers of features.

SVM models trained with all 11 features in our datasets are believed to be the most promising ML models for predicting the kp of NiCr-based alloys since they exhibit relatively good performance and concurrently consider essential features from a high-temperature oxidation perspective. Comparisons between the experimental and predicted kp by the SVM models trained with all 11 features are presented in Fig. 6 (for p-kp) and Supplementary Fig. 5 (for s-kp). While the accuracy of trained ML models evaluated with NSE is around 0.7, it is encouraging that most of the predicted values are within the acceptable deviation range. In addition, the other metric R2 COD for both dry and wet air (p-kp) is close to 0.9 or higher, which strongly indicates that our ML models can efficiently capture the trend of cyclic oxidation response of NiCr-based alloys as a function of alloy compositions and oxidation temperature.

In practice, the same alloy tested under the same condition may exhibit significant test-to-test variations, particularly in wet air testing. For example, the kp from the p-kp model of Nimonic 80 A at 950 °C in wet air from six tests varied significantly (Fig. 7). Thus, dashed lines, as the acceptable deviation, representing one order of magnitude difference between the predicted and experimental kp are superimposed in Fig. 6 and Supplementary Fig. 5. The predicted kp correctly captures the experimental data trend, and most of the data points lie between the dashed lines. This indicates good agreement between experimental and predicted kp was achieved from a high-temperature oxidation perspective for the multi-component commercial alloys (group I). This conclusion is further supported by the high R2 COD of ≥0.9 in both cases. Data in Fig. 6b are slightly more scatted than those in Fig. 6a, attributing to the increased spallation probability and thus more considerable test-to-test variation in wet air exposure.

Fig. 7: Oxidation of Nimonic 80 A at 950 °C in wet air in 1 h cycle.
figure 7

The kp determined by the p-kp model largely varies from six tests under the same condition.

Although a satisfactory agreement between the experimental and predicted kp was achieved, more work is required in the future. Firstly, the performance of ML models depends not only on what features have been considered but also on the nature and repartition of the dataset used for training; thus, further detailed analysis of this aspect is required. Secondly, kp alone could not fully capture the complete oxidation kinetics of alloys. Other properties, such as spallation probability and oxidation lifetime, should also be concurrently considered as target properties to be modeled/predicted. Thirdly, besides the alloy composition and temperature considered in this study as input features, the high-temperature oxidation kinetics are also affected by factors like microstructure, oxide scale spallation, alloy surface depletion of elements that are selectively oxidizing, and oxide evaporation in wet air atmosphere. Features that can well represent these additional factors and ML frameworks that can incorporate these factors as much as possible will be highly conducive to establishing high-fidelity surrogate models.

Additionally, material datasets commonly suffer from the uneven data distribution. While the present dataset can be regarded as one of the largest and the most comprehensive cyclic oxidation dataset for high-temperature NiCr-based alloys, many gaps are identified. A highly desirable alloy dataset to apply data analytics would have an even distribution of data points of all elements without major gaps in a high-dimensional space. The movement toward such an ideal alloy dataset can be obtained by exercising a design of experiments (DOE)60 campaign to fill key gaps in the existing dataset(s). However, the dataset used in the current study is biased to some extent and has a number of gaps, as shown in Fig. 8. This is attributed to the fact that the objective of past empirical alloy research has been mainly focused on identifying alloys with superior properties over existing ones. Consequently, the alloy composition variation strategy has been heavily geared toward improving properties, more often ending up in biased and narrow boundary conditions from a data analytics perspective. It is evident that alloy data with poor properties can serve as equally useful learning input for ML models; however, an effort to strategically collect such data to fill the gaps has been overlooked, not only by us but likely throughout the materials community. Hence, a strategy that can identify the most important gaps in an existing dataset for alloy data analytics and efficiently augment a small amount of experimental data needs to be developed to leverage legacy data, particularly for complex materials.

Fig. 8: Data distribution of the current dataset.
figure 8

The concentration range of each element, i.e., the difference between the maximum and minimum content, is evenly divided into ten segments. The number in each segment represents how many data points are located in each cell. For example, a cell with 0 means that there is no data in this range. The degree of data aggregation in each segment is represented by color.

In summary, we demonstrated a workflow of applying a data analytics approach to predict the cyclic oxidation kinetics of NiCr-based alloys using a highly consistent and well-curated experimental dataset collected over many years and numerous different studies at ORNL. The oxidation data in dry and wet air as a function of alloy compositions and oxidation temperatures (800–950 °C) were used to train ML models. Identifying a proper numerical representation was the critical procedure for successfully predicting the high-temperature oxidation kinetics of NiCr-based alloys. Oxidation kinetics of alloys was represented by the kp from properly selected oxidation models, i.e., a simple parabolic growth law for data up to 100 h (s-kp) and a statistical cyclic oxidation model for the complete curve (p-kp, up to 2000 h), respectively.

Correlation analysis and training of ML models (i.e., BR, LR, NN, RF, and SVM regression) were performed to quantitatively evaluate the inter-relationship between input features and kp. The influence of top-ranking features on the performance of considered ML models was also identified. The trained ML models were able to predict kp from both s-kp and p-kp models satisfactorily. The typical test-to-test variation in experimental high-temperature oxidation data was believed to be the primary source for the discrepancy between the ML-predicted and experimental kp. The current results demonstrated the potential of the ML approach to predict the oxidation kinetics of multi-component NiCr alloys with wide composition and temperature ranges. We anticipate that this data analytics method can be applied to predict the oxidation kinetics of other classes of high-temperature alloys.

Methods

Experimental procedure

All specimens with the dimensions of ~10 × 20 × 1.5 mm3 were ground to a 600-grit finish and cleaned ultrasonically in acetone and methanol prior to cyclic oxidation experiments. A 1-h-cycle consisting of a 60 min exposure at temperature and 10 min cooling in automated cyclic rigs61 was adopted in this work. The specimens were exposed at temperatures between 800 and 950 °C in flowing dry and wet air (air + 10% water vapor), respectively, with a gas flow rate of 500 cm3 min−1 for up to 2000 h.

Correlation analysis and machine learning

The correlation between the input features and kp was evaluated by PCC62 and MIC30, respectively. PCC considers the strength of the linear relationship between two variables, while MIC can identify the strength of both linear and nonlinear relationships. The correlation coefficient of PCC is between −1 and 1, where 1 indicates a total positive linear correlation, −1 represents a complete negative/reciprocal linear correlation, and 0 indicates no correlation. The correlation coefficient of MIC ranges between 0 and 1. Coefficient values close to 1 or −1 indicate the strongest correlation between the variables and target property.

Five representative ML models were used: LR31, BR32,33, NN34, RF35, and SVM regression36. A brief introduction to these models is provided below. For details, please refer to the corresponding literature. LR models the relationship between input and output variables by fitting a linear equation. Parameters of the LR model are fitted to minimize the residual sum of squares between input and output data. BR is another LR model. It formulates an LR using probability distributors rather than point estimates that LR does. The output is assumed to be drawn from a probability distribution instead of being estimated as a single value. k-NN outputs the average of the values of given data points in the k-nearest neighbors. Since the function is approximated only locally, it only uses a subset of the relevant dataset. RF is an ensemble learning method that constructs multiple decision trees during training and outputs the mean prediction of the individual trees. SVM can be used for both classification and regression problems. For a classification problem, SVM constructs a set of hyperplanes in high-dimensional space to distinctly classifies the data points. For a regression problem, SVM is generalized by introducing an ε-insensitive region (ε-tube) around the function. This tube reformulates the regression problem to find a function exhibiting the least deviation from the obtained targets across all training data while balancing model complexity and prediction error.

These ML models were trained with different numbers of top-ranking features based on the ranking from MIC or |PCC| (the absolute value of PCC) to explore the influence of these features on the performance of ML models. The hyperparameters for each model were tuned up to 10,000 iterations using the randomized optimization approach to search for the optimal parameters. Each model was trained ten times for a given set of features to determine the averaged accuracy and its standard deviation. All features have the same weight in ML training regardless of their rankings determined by correlation analysis. Both correlation analysis and training of ML models were performed with the Python-based open-source frontend, Advanced data SCiEnce toolkit for Non-Data Scientists (ASCENDS)58,63, which is available via GitHub (https://github.com/ornlpmcp/ASCENDS). The performance/accuracy of the ML models was quantified by NSE56 and R2(COD), calculated without an intercept as a supplementary metric. The NSE is calculated as follows:

$${\mathrm{NSE}} = 1 - \frac{{\mathop {\sum }\nolimits_{i = 1}^N \left( {O_i - P_i} \right)^2}}{{\mathop {\sum }\nolimits_{i = 1}^N \left( {O_i - \bar O} \right)^2}}$$
(1)

where \(O_i\) and \(\bar O\) are the observed values and their mean, respectively. Pi is the model predicted value. N is the number of data in the dataset. The k-fold cross-validation approach64 with k = 5 was used in ML training. This approach randomly divided the input data into k groups. One group (i.e., unseen data) was withheld during training, and the remainder k-1 groups were used to train the ML model. Then the unseen dataset was used to evaluate the accuracy of models.