Abstract
The framework of this paper is classification and regression trees, also known as tree-based methods, binary segmentation, tree partitioning, decision trees. Trees can be fruitfully used either to explore and understand the dependence relationship between the response variable and a set of predictors or to assign the response class or value for new objects on which only the measurements of predictors are known. Since the introduction of two-stage splitting procedure in 1992, the research unit in Naples has been introducing several contributions in this field, one of the main issues is combining tree partitioning with statistical models. This paper will provide a new idea of knowledge extraction using trees and models. It will deal with the trade off between the interpretability of the tree structure (i.e., exploratory trees) and the accuracy of the decision tree model (i.e., decision tree-based rules). Prospective and retrospective view of using models and trees will be discussed. In particular, we will introduce a tree-based methodology that grows an optimal tree structure with the posterior prediction modelling to be used as decision rule for new objects. The general methodology will be presented and a special case will be described in details. An application on a real world data set will be finally shown.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ARIA, M., (2005): Multi-Class Budget Exploratory Trees. In Studies in Classification, Data Analysis, and Knowledge Organization, Springer-Verlag, 3-8.
BERTHOLD, M., and HAND, D.J. (Eds.) (2003). Intelligent Data Analysis. Second edition, Springer, Berlin.
BREIMAN, L., FRIEDMAN, J.H., OLSHEN, R.A., and STONE, C.J. (1984). Classification and Regression Trees, Wadsworth, Belmont CA.
BREIMAN, L. (1996). Bagging Predictors, Machine Learning, 24, 123-140.
CAPPELLI, C., and CONVERSANO, C. (2002). Canonical Discriminant Function for Recursive Partitioning in Data Mining, in Härdle W. and Rönz B (eds), COMPSTAT 2002 Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, pag. 213-218.
CAPPELLI, C., MOLA, F., and SICILIANO, R. (2002). A Statistical Approach to Growing a Reliable Honest Tree, Computational Statistics and Data Analysis, 38, 285-299.
CIAMPI, A. (1991). Generalized Regression Trees. Computational Statistics and Data Analysis 12, 57-78.
CIAMPI, A., NEGASSA, A., LOU, Z. (1995) Tree-structured prediction for censored survival data and the Cox model. Journal of Clinical Epidemiology 48(5), 675-689.
CONVERSANO C., and CAPPELLI C., (2002). Missing Data Incremental Imputation through Tree Based Methods, in Härdle W. et al. (eds.), Proceedings in Computational Statistics COMPSTAT 2002-, Physica-Verlag, pag. 455-460.
CONVERSANO, C., DI BENEDETTO, D., and SICILIANO, R. (2003). The Clockwork Trees through Visual Multivariate Splitting, in Proceedings of CLADAG 2003, Physica Verlag, 113-116.
CONVERSANO, C., MOLA, F., and SICILIANO, R. (2001). Partitioning and Combined Model Integration for Data Mining. Journal of Computational Statistics, 16, 323-339, Physica Verlag, Heidelberg (D).
CONVERSANO C., and SICILIANO R., (2004). Incremental Tree-Based Imputation with lexicographic ordering, Interface 2003 Proceedings, Minotte M., Swzychak A. (eds.), Interface Foundation of North America, Washington, CD-ROM.
D’AMBROSIO, A., ARIA, M., and SICILIANO, R. (2007). Robust Tree-based Incremental Imputation Method for Data Fusion, in Proceedings of the 7th IDA2007 Conference(Ljubljana, 6-8 September, 2007), Lecture Notes in Com- puter Science Series of Springer, 174-183.
FREUND, Y., and SCHAPIRE, R.E. (1997) A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1) 119-139.
HASTIE, T., FRIEDMAN, J.H., and TIBSHIRANI, R. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer.
MC CULLAGH, P., and NELDER, J.A. (1990). Generalized Linear Models. Second edition, Chapman and Hall.
MOLA, F., and SICILIANO, R. (1992). A two-stage predictive splitting algorithm in binary segmentation, in Y. Dodge, J. Whittaker. (Eds.): Computational Statistics: COMPSTAT 92, 1, Physica Verlag, Heidelberg (D), 179-184.
MOLA, F., and SICILIANO, R. (1994). Alternative strategies and CATANOVA testing in two-stage binary segmentation, in E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, B. Burtschy (Eds.): New Approaches in Classification and Data Analysis: Proceedings of IFCS ’93, Springer Verlag, Heidelberg (D), 316–323.
MOLA, F., and SICILIANO, R. (1997). A Fast Splitting Procedure for Classification and Regression Trees, Statistics and Computing, 7, Chapman Hall, 208-216.
MOLA, F., and SICILIANO, R. (2002). Discriminant Analysis and Factorial Multiple Splits in Recursive Partitioning for Data Mining, in Roli, F., Kittler, J. (eds.): Proceedings of International Conference on Multiple Classifier Systems (Chia, June 24-26, 2002), 118-126, Lecture Notes in Computer Science, Springer, Heidelberg.
PETRAKOS, G., CONVERSANO, C., FARMAKIS, G., MOLA, F., SICILIANO, R., and STAVROPULOS, P. (2004): New ways to specify data edits. Journal of the Royal Statistical Society, Series A Statistics in Society, Ser. A, 167, Part 2, 249-274.
SICILIANO, R. (1998). Exploratory versus decision trees. In: Payne, R., Green, P. (Eds.), Proceedings in Computational Statistics. Physica-Verlag, 113-124.
SICILIANO, R. (1999). Latent budget trees for multiple classification, in M. Vichi, P. Optitz (Eds.): Classification and Data Analysis: Theory and Application, Springer Verlag, Heidelberg (D), 121-130.
SICILIANO, R., and MOLA, F. (1994). Modelling for Recursive Partitioning and Variable Selection, in R. Dutter, W. Grossmann (Eds.): Proceedings in Computational Statistics: COMPSTAT ’94 (Vienna, August 24-28, 1994), Physica Verlag, Heidelberg (D), 172-177.
SICILIANO, R., MOLA, F., and KLASCHKA, J. (1996). Logistic Classification Trees, in A. Prat (Ed.): Proceedings in Computational Statistics: COMPSTAT ’96 (Barcellona, August 24-28, 1996), Physica-Verlag, Heidelberg (D), 373-378.
SICILIANO, R., and MOLA, F. (1996). A Fast Regression Tree Procedure, in Forcina, A., Marchetti, G.M., Hatzinger, R., Galmacci, G. (Ed.): Statistical Modelling, Proceedings of the 11th International Workshop on Statistical Modeling (Orvieto, 15-19 luglio), 332-340, Graphos, Città di Castello.
SICILIANO, R., and MOLA, F. (1998). A general splitting criterion for classification trees, Metron, 56, 3-4.
SICILIANO, R., and MOLA, F. (1998). Ternary Classification Trees: a Factorial Approach, in M. Greenacre, J. Blasius (Eds.): Visualization of Categorical Data, cap. 22, , Academic Press, San Diego (CA), 311-323.
SICILIANO, R. and MOLA, F (2000): Multivariate Data Analysis through Classification and Regression Thees, Computational Statistics and Data Analysis, 32, Elsevier Science, 285-301.
SICILIANO, R., ARIA, M., and CONVERSANO, C. (2004). Tree Harvest: Methods, Software and Applications, in Antoch J. (ed.): COMPSTAT 2004 Proceedings, Springer, 1807-1814.
TUTORE, V.A., SICILIANO, R. and ARIA, M. (2007), Conditional Classification Trees using Instrumental Variables. Advances in Intelligent Data Analysis, Springer-Verlag, pp 163-173.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Physica-Verlag Heidelberg
About this paper
Cite this paper
Siciliano, R., Aria, M., D’Ambrosio, A. (2008). Posterior Prediction Modelling of Optimal Trees. In: Brito, P. (eds) COMPSTAT 2008. Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-2084-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-7908-2084-3_27
Publisher Name: Physica-Verlag HD
Print ISBN: 978-3-7908-2083-6
Online ISBN: 978-3-7908-2084-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)