Skip to main content

Posterior Prediction Modelling of Optimal Trees

  • Conference paper
COMPSTAT 2008

Abstract

The framework of this paper is classification and regression trees, also known as tree-based methods, binary segmentation, tree partitioning, decision trees. Trees can be fruitfully used either to explore and understand the dependence relationship between the response variable and a set of predictors or to assign the response class or value for new objects on which only the measurements of predictors are known. Since the introduction of two-stage splitting procedure in 1992, the research unit in Naples has been introducing several contributions in this field, one of the main issues is combining tree partitioning with statistical models. This paper will provide a new idea of knowledge extraction using trees and models. It will deal with the trade off between the interpretability of the tree structure (i.e., exploratory trees) and the accuracy of the decision tree model (i.e., decision tree-based rules). Prospective and retrospective view of using models and trees will be discussed. In particular, we will introduce a tree-based methodology that grows an optimal tree structure with the posterior prediction modelling to be used as decision rule for new objects. The general methodology will be presented and a special case will be described in details. An application on a real world data set will be finally shown.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • ARIA, M., (2005): Multi-Class Budget Exploratory Trees. In Studies in Classification, Data Analysis, and Knowledge Organization, Springer-Verlag, 3-8.

    Google Scholar 

  • BERTHOLD, M., and HAND, D.J. (Eds.) (2003). Intelligent Data Analysis. Second edition, Springer, Berlin.

    MATH  Google Scholar 

  • BREIMAN, L., FRIEDMAN, J.H., OLSHEN, R.A., and STONE, C.J. (1984). Classification and Regression Trees, Wadsworth, Belmont CA.

    MATH  Google Scholar 

  • BREIMAN, L. (1996). Bagging Predictors, Machine Learning, 24, 123-140.

    MATH  MathSciNet  Google Scholar 

  • CAPPELLI, C., and CONVERSANO, C. (2002). Canonical Discriminant Function for Recursive Partitioning in Data Mining, in Härdle W. and Rönz B (eds), COMPSTAT 2002 Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, pag. 213-218.

    Google Scholar 

  • CAPPELLI, C., MOLA, F., and SICILIANO, R. (2002). A Statistical Approach to Growing a Reliable Honest Tree, Computational Statistics and Data Analysis, 38, 285-299.

    Article  MATH  MathSciNet  Google Scholar 

  • CIAMPI, A. (1991). Generalized Regression Trees. Computational Statistics and Data Analysis 12, 57-78.

    Article  MATH  MathSciNet  Google Scholar 

  • CIAMPI, A., NEGASSA, A., LOU, Z. (1995) Tree-structured prediction for censored survival data and the Cox model. Journal of Clinical Epidemiology 48(5), 675-689.

    Article  Google Scholar 

  • CONVERSANO C., and CAPPELLI C., (2002). Missing Data Incremental Imputation through Tree Based Methods, in Härdle W. et al. (eds.), Proceedings in Computational Statistics COMPSTAT 2002-, Physica-Verlag, pag. 455-460.

    Google Scholar 

  • CONVERSANO, C., DI BENEDETTO, D., and SICILIANO, R. (2003). The Clockwork Trees through Visual Multivariate Splitting, in Proceedings of CLADAG 2003, Physica Verlag, 113-116.

    Google Scholar 

  • CONVERSANO, C., MOLA, F., and SICILIANO, R. (2001). Partitioning and Combined Model Integration for Data Mining. Journal of Computational Statistics, 16, 323-339, Physica Verlag, Heidelberg (D).

    Article  MATH  MathSciNet  Google Scholar 

  • CONVERSANO C., and SICILIANO R., (2004). Incremental Tree-Based Imputation with lexicographic ordering, Interface 2003 Proceedings, Minotte M., Swzychak A. (eds.), Interface Foundation of North America, Washington, CD-ROM.

    Google Scholar 

  • D’AMBROSIO, A., ARIA, M., and SICILIANO, R. (2007). Robust Tree-based Incremental Imputation Method for Data Fusion, in Proceedings of the 7th IDA2007 Conference(Ljubljana, 6-8 September, 2007), Lecture Notes in Com- puter Science Series of Springer, 174-183.

    Google Scholar 

  • FREUND, Y., and SCHAPIRE, R.E. (1997) A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1) 119-139.

    Article  MATH  MathSciNet  Google Scholar 

  • HASTIE, T., FRIEDMAN, J.H., and TIBSHIRANI, R. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction, Springer.

    Google Scholar 

  • MC CULLAGH, P., and NELDER, J.A. (1990). Generalized Linear Models. Second edition, Chapman and Hall.

    Google Scholar 

  • MOLA, F., and SICILIANO, R. (1992). A two-stage predictive splitting algorithm in binary segmentation, in Y. Dodge, J. Whittaker. (Eds.): Computational Statistics: COMPSTAT 92, 1, Physica Verlag, Heidelberg (D), 179-184.

    Google Scholar 

  • MOLA, F., and SICILIANO, R. (1994). Alternative strategies and CATANOVA testing in two-stage binary segmentation, in E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, B. Burtschy (Eds.): New Approaches in Classification and Data Analysis: Proceedings of IFCS ’93, Springer Verlag, Heidelberg (D), 316–323.

    Google Scholar 

  • MOLA, F., and SICILIANO, R. (1997). A Fast Splitting Procedure for Classification and Regression Trees, Statistics and Computing, 7, Chapman Hall, 208-216.

    Google Scholar 

  • MOLA, F., and SICILIANO, R. (2002). Discriminant Analysis and Factorial Multiple Splits in Recursive Partitioning for Data Mining, in Roli, F., Kittler, J. (eds.): Proceedings of International Conference on Multiple Classifier Systems (Chia, June 24-26, 2002), 118-126, Lecture Notes in Computer Science, Springer, Heidelberg.

    Chapter  Google Scholar 

  • PETRAKOS, G., CONVERSANO, C., FARMAKIS, G., MOLA, F., SICILIANO, R., and STAVROPULOS, P. (2004): New ways to specify data edits. Journal of the Royal Statistical Society, Series A Statistics in Society, Ser. A, 167, Part 2, 249-274.

    Google Scholar 

  • SICILIANO, R. (1998). Exploratory versus decision trees. In: Payne, R., Green, P. (Eds.), Proceedings in Computational Statistics. Physica-Verlag, 113-124.

    Google Scholar 

  • SICILIANO, R. (1999). Latent budget trees for multiple classification, in M. Vichi, P. Optitz (Eds.): Classification and Data Analysis: Theory and Application, Springer Verlag, Heidelberg (D), 121-130.

    Google Scholar 

  • SICILIANO, R., and MOLA, F. (1994). Modelling for Recursive Partitioning and Variable Selection, in R. Dutter, W. Grossmann (Eds.): Proceedings in Computational Statistics: COMPSTAT ’94 (Vienna, August 24-28, 1994), Physica Verlag, Heidelberg (D), 172-177.

    Google Scholar 

  • SICILIANO, R., MOLA, F., and KLASCHKA, J. (1996). Logistic Classification Trees, in A. Prat (Ed.): Proceedings in Computational Statistics: COMPSTAT ’96 (Barcellona, August 24-28, 1996), Physica-Verlag, Heidelberg (D), 373-378.

    Google Scholar 

  • SICILIANO, R., and MOLA, F. (1996). A Fast Regression Tree Procedure, in Forcina, A., Marchetti, G.M., Hatzinger, R., Galmacci, G. (Ed.): Statistical Modelling, Proceedings of the 11th International Workshop on Statistical Modeling (Orvieto, 15-19 luglio), 332-340, Graphos, Città di Castello.

    Google Scholar 

  • SICILIANO, R., and MOLA, F. (1998). A general splitting criterion for classification trees, Metron, 56, 3-4.

    MathSciNet  Google Scholar 

  • SICILIANO, R., and MOLA, F. (1998). Ternary Classification Trees: a Factorial Approach, in M. Greenacre, J. Blasius (Eds.): Visualization of Categorical Data, cap. 22, , Academic Press, San Diego (CA), 311-323.

    Google Scholar 

  • SICILIANO, R. and MOLA, F (2000): Multivariate Data Analysis through Classification and Regression Thees, Computational Statistics and Data Analysis, 32, Elsevier Science, 285-301.

    Google Scholar 

  • SICILIANO, R., ARIA, M., and CONVERSANO, C. (2004). Tree Harvest: Methods, Software and Applications, in Antoch J. (ed.): COMPSTAT 2004 Proceedings, Springer, 1807-1814.

    Google Scholar 

  • TUTORE, V.A., SICILIANO, R. and ARIA, M. (2007), Conditional Classification Trees using Instrumental Variables. Advances in Intelligent Data Analysis, Springer-Verlag, pp 163-173.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberta Siciliano .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Physica-Verlag Heidelberg

About this paper

Cite this paper

Siciliano, R., Aria, M., D’Ambrosio, A. (2008). Posterior Prediction Modelling of Optimal Trees. In: Brito, P. (eds) COMPSTAT 2008. Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-2084-3_27

Download citation

Publish with us

Policies and ethics