Abstract
The paper considers a new two-level ensemble regression method and its application to prediction problems. At the first stage, the target variable prediction is performed by regression trees included in the lower level optimal ensemble. The aggregated solution is computed by a regression random forest, using the predictions computed by the ensemble trees as input features. The method differs from common ensemble method by a new technique that is used to build trees that are added to ensemble. This technique is based on minimizing a special functional that is the difference of two components. The first component characterizes the approximation of the target variable dependency on input features. The second component is aimed at increasing variance of prognoses calculated by algorithms from ensemble. The developed method implements combination of approaches used in the random forest method and gradient boosting. The paper presents the results of the developed method for predicting the melting points for halides with various compositions, as well as for predicting of the crystal lattices parameters for \(A_{2}BB^{\prime}O_{6}\) composition compounds.
REFERENCES
Z. H. Zhou, Ensemble Methods: Foundations and Algorithms (Chapman and Hall/CRC, New York, 2012).
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning Data Mining, Inference, and Prediction, Springer Series in Statistics (Springer, New York, 2009).
V. A. Dudarev, N. N. Kiselyova, A. V. Stolyarenko, A. A. Dokukin, O. V. Senko, V. V. Ryazanov, E. A. Vashchenko, M. A. Vitushko, and V. S. Pereverzev-Orlov, ‘‘An information system for inorganic substances physical properties prediction based on machine learning methods,’’ in Supplementary Proceedings of the 22nd International Conference on Data Analytics and Management in Data Intensive Domains DAMDID/RCDL 2020, CEUR Workshop Proc. 2790, 89–102 (2020).
J. Im, S. Lee, T. W. Ko, H. W. Kim, Y. K. Hyon, and H. Chang, ‘‘Identifying Pb-free perovskites for solar cells by machine learning,’’ npj Comput. Mater. 5, 37 (2019). https://doi.org/10.1038/s41524-019-0177-0
X. Lin, C. Li, H. Hao, G. Zhao, and H. Liu, ‘‘Accelerated search for \(ABO_{3}\)-type the electronic contribution of polycrystalline dielectric constants by machine learning,’’ Comput. Mater. Sci. 193, 110404 (2021).
M. Mukherjee, S. Satsangi, and A. K. Singh, ‘‘A statistical approach for rapid prediction of electron relaxation time using elemental representatives,’’ Chem. Mater. 32, 6507–6514 (2020).
L. Breiman, ‘‘Random forests,’’ Machine Learn. 45, 5–32 (2001).
T. K. Ho, ‘‘The random subspace method for constructing decision forests,’’ IEEE Trans. Pattern Anal. Machine Intell. 20, 832–844 (1998).
N. García-Pedrajas and D. Ortiz-Boyer, ‘‘Boosting random subspace method,’’ Neural Networks 21, 1344–1362 (2008).
R. E. Schapire and Y. Freund, Foundations and Algorithms (MIT Press, Cambridge, MA, 2012).
Yu. I. Zhuravlev, O. V. Senko, A. A. Dokukin, N. N. Kiselyova, and I. A. Saenko, ‘‘Two-level regression method using ensembles of trees with optimal divergence,’’ Dokl. Math. 103, 1–4 (2021). https://doi.org/10.1134/S1064562421040177
D. H. Wolpert, ‘‘Stacked generalization,’’ Neural Networks 5, 241–259 (1992). https://doi.org/10.1016/S0893-6080(05)80023-1
E. M. Braverman and I. B. Muchnik, Structural Methods for Processing Empirical Data (Nauka, Moscow, 1983) [in Russian].
N. N. Kiseleva, ‘‘Prediction of new compounds in systems of monovalent and divalent metal halides,’’ Russ. J. Inorg. Chem. 59, 496–502 (2014). https://doi.org/10.1134/S0036023614050106
N. N. Kiselyova, A. V. Stolyarenko, V. V. Ryazanov, O. V. Senko, A. A. Dokukin, and V. V. Podbel’skii, ‘‘Prediction of new compounds in the \(AHal{-}BHal_{3}\) systems,’’ Russ. J. Inorg. Chem. 59, 1462–1471 (2014). https://doi.org/10.1134/S0036023614120109
B. G. Korshunov, V. V. Safonov, and D. V. Drobot, Phase Equilibria in Halide Systems, The Handbook (Metallurgy, Moscow, 1979) [in Russian].
B. G. Korshunov and V. V. Safonov, Fusion Diagrams. Handbook (Metallurgy, Moscow, 1991) [in Russian].
Thermal Constants of Substances DB. http://www.chem.msu.su/cgi-bin/tkv.pl?show=welcome.html/welcome.html. Accessed May 25, 2022.
V. I. Posypaiko and E. A. Alekseeva, Phase Equilibria in Binary Halides (Springer, New York, 1987).
G. Ryu and K. Son, ‘‘Surface defect free growth of a spin dimer \(TlCuCl_{3}\) compound crystals and investigations on its optical and magnetic properties,’’ J. Solid State Chem. 237, 358–363 (2016).
B. G. Korshunov and V. V. Safonov, Halide Systems, The Handbook (Metallurgiya, Moscow, 1984) [in Russian].
L. Rycerz, J. Kapala, and M. Gaune-Escard, ‘‘Phase diagram and thermodynamic properties of the \(EuBr_{2}{-}CsBr\) binary system,’’ J. Chem. Eng. Data 66, 1939–1946 (2021).
N. N. Kiselyova, V. A. Dudarev, A. V. Stolyarenko, A. A. Dokukin, O. V. Senko, V. V. Ryazanov, V. V. Vitushko, V. S. Pereverzev-Orlov, and E. A. Vaschenko, ‘‘Prediction of space groups for perovskite-like \(A^{II}_{2}B^{III}B^{\prime V}O_{6}\) compounds,’’ Inorg. Mater.: Appl. Res. 13, 277–293 (2022). https://doi.org/10.1134/S2075113
Funding
The study was carried out as a part of the state assignment (project nos. 075-01176-23-00 and 0063-2016-0003) with the partial financial support of the Russian Foundation for Basic Research, project nos. 20-01-00609 and 21-51-53019.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Senko, O.V., Dokukin, A.A., Kiselyova, N.N. et al. New Two-Level Ensemble Method and Its Application to Chemical Compounds Properties Prediction. Lobachevskii J Math 44, 188–197 (2023). https://doi.org/10.1134/S1995080223010341
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1995080223010341