Optimal constraint-based decision tree induction from itemset lattices

Nijssen, Siegfried; Fromont, Elisa

doi:10.1007/s10618-010-0174-x

Optimal constraint-based decision tree induction from itemset lattices

Published: 06 April 2010

Volume 21, pages 9–51, (2010)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Siegfried Nijssen¹ &
Elisa Fromont²

456 Accesses
32 Citations
Explore all metrics

Abstract

In this article we show that there is a strong connection between decision tree learning and local pattern mining. This connection allows us to solve the computationally hard problem of finding optimal decision trees in a wide range of applications by post-processing a set of patterns: we use local patterns to construct a global model. We exploit the connection between constraints in pattern mining and constraints in decision tree induction to develop a framework for categorizing decision tree mining constraints. This framework allows us to determine which model constraints can be pushed deeply into the pattern mining process, and allows us to improve the state-of-the-art of optimal decision tree induction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Bocca JB, Jarke M, Zaniolo C (eds) VLDB’94, proceedings of 20th international conference on very large data bases. Morgan Kaufmann, pp 487–499
Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI/MIT Press, pp 307–328
Angelopoulos N, Cussens J (2005) Exploiting informative priors for Bayesian clas-sification and regression trees. In: Kaelbling LP, Saffiotti A (eds) IJCAI-05, proceedings of the 19th international joint conference on artificial intelligence. Professional Book Center, pp 641–646
Asuncion A, Newman D (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/~mlearn/MLRepository.html
Baixeries J, Szathmary L, Valtchev P, Godin R (2009) Yet a faster algorithm for building the Hasse diagram of a concept lattice. In: Ferre S, Rudolph S (eds) ICFCA’09, proceedings of the 7th international conference on formal concept analysis. Springer, pp 162–177
Bayardo R, Goethals B, Zaki MJ (eds) (2004) FIMI ’04, proceedings ofthe IEEE ICDM workshop on frequent itemset mining implementations. CEUR-WS.org
Blanchard G, Schafer C, Rozenholc Y, Müller KR (2007) Optimal dyadic decision trees. Mach Learn 66(2–3): 209–241
Article Google Scholar
Bonchi F, Lucchese C (2007) Extending the state-of-the-art of constraint-based pattern discovery. Data Knowl Eng 60(2): 377–399
Article Google Scholar
Boulicaut J-F, Bykowski A, Rigotti C (2000) Approximation of frequency queries by means of free-sets. In: Zighed DA, Komorowski HJ, Zytkow JM (eds) PKDD’00, proceedings ofthe 4th European conference on principles of data mining and knowledge discovery. Springer, pp 75–85
Boulicaut J-F, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining Knowl Discov 7(1): 5–22
Article MathSciNet Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth Publishing Company, Belmont, California, USA
MATH Google Scholar
Bringmann B, Nijssen S, Zimmermann A (2009) Pattern-based classification: a unifying perspective. In: Knobbe A, Furnkranz J (eds) LeGo ’09, proceedings of the ECML PKDD 2009 workshop ‘From Local Patterns to Global Models’
Bucila C, Gehrke J, Kifer D, White WM (2003) DualMiner: a dual-pruning algorithm for itemsets with constraints. Data Mining Knowl Discov 7(3): 241–272
Article MathSciNet Google Scholar
Buntine W (1992) Learning classification trees. Stat Comput 2: 63–73
Article Google Scholar
Chipman HA, George EI, McCulloch RE (1998) Bayesian CART model search. J Am Stat Assoc 93(443): 935–947
Article Google Scholar
De Raedt L, Zimmermann A (2007) Constraint-based pattern set mining. In: Parthasarathy S, Liu B (eds) SDM’07, proceedings of the seventh SIAM international conference on data mining. SIAM Press, pp 1–12
Dong G, Zhang X, Wong L, Li J (1999) CAEP: classification by aggregating emerging patterns. In: Arikawa S, Furukawa K (eds) DS’99, proceedings of the second international conference on discovery science. Springer, pp 30–42
Esmeir S, Markovitch S (2007a) Anytime induction of cost-sensitive trees. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) NIPS’07, proceedings of the 21st conference on neural information processing systems. MIT Press, pp 425–132
Esmeir S, Markovitch S (2007b) Anytime learning of decision trees. J Machine Learn Res 8: 891–933
Google Scholar
Friedman A, Schuster A, Wolff R (2006) k-anonymous decision tree induction. In: Furnkranz J, Scheffer T, Spiliopoulou M (eds) PKDD’06, proceedings ofthe 10th European conference on principles and practice of knowledge discovery in databases. Springer, pp 151–162
Ganter B, Wille R (1999) Formal concept analysis: mathematical foundations. Springer, Berlin
MATH Google Scholar
Garey MR (1972) Optimal binary identification procedures. SIAM J Appl Math 23(2): 173–186
Article MATH MathSciNet Google Scholar
Garofalakis MN, Hyun D, Rastogi R, Shim K (2003) Building decision trees with constraints. Data Min Knowl Discov 7(2): 187–214
Article MathSciNet Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton J, Bernstein PA (eds) SIGMOD’00, proceedings ofthe 2000 ACM SIGMOD international conference on management of Data. ACM Press, pp 1–12
Hyafil L, Rivest RL (1976) Constructing optimal binary decision trees is NP-complete. Inf Process Lett 5(1): 15–17
Article MATH MathSciNet Google Scholar
Imielinski T, Mannila H (1996) A database perspective on knowledge discovery. Commun ACM 39: 58–64
Article Google Scholar
Knobbe A, Ho E (2006) Maximally informative k-itemsets and their efficient discovery. In: Eliassi-Rad T, Ungar LH, Craven M, Gunopulos D (eds) KDD’06, proceeding ofthe 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, pp 237–244
Knobbe A, Cremilleux B, Furnkranz J, Scholz M (2008) From local patterns to global models: the LeGo approach to data mining. In: Furnkranz J, Knobbe A (eds) LeGo’08, proceedings ofthe ECML PKDD 2008 workshop ‘From Local Pat-terns to Global Models’, pp 1–16
Lew A (1978) Optimal conversion of extended-entry decision tables with general cost criteria. Commun ACM 21(4): 269–279
Article MATH MathSciNet Google Scholar
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Cercone N, Lin TY, Wu X (eds) ICDM’01, proceedings of the 2001 IEEE international conference on data mining. IEEE Computer Society, pp 369–376
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Agrawal R, Stolorz PE, Piatetsky-Shapiro G (eds) KDD’98, proceedings ofthe 4th international conference on knowledge discovery and data mining. AAAI Press, pp 80–86
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) l-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1): 3
Article Google Scholar
Mannila H, Toivonen H, Verkamo AI (1994) Efficient algorithms for discovering association rules. In: Fayyad UM, Uthurusamy R (eds) KDD’94, proceedings ofthe AAAI workshop on knowledge discovery in databases. AAAI Press, pp 181–192
Meisel WS, Michalopoulos D (1973) A partitioning algorithm with application in pattern classification and the optimization of decision tree. IEEE Trans Comput C-22:93–103
Google Scholar
Moore A, Lee MS (1998) Cached sufficient statistics for efficient machine learning with large datasets. J Artif Intell Res 8: 67–91
MATH MathSciNet Google Scholar
Murphy PM, Pazzani MJ (1997) Exploring the decision forest: an empirical in-vestigation of Occam’s razor in decision tree induction. In: Greiner R, Petsche T, Jose S (eds) Computational learning theory and natural learning systems vol IV: making learning systems practical. MIT Press, pp 171–187
Nadeau C, Bengio Y (2003) Inference for the generalization error. Machine Learn 52(3): 239–281
Article MATH Google Scholar
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Inf Syst 24(1): 25–46
Article Google Scholar
Payne HJ, Meisel WS (1977) An algorithm for constructing optimal binary decision trees. IEEE Trans Comput 26(9): 905–916
Article MATH MathSciNet Google Scholar
Pei J, Han J, Lakshmanan LVS (2001) Mining frequent item sets with convertible constraints. In: ICDE’01, proceedings of the IEEE international conference on data engineering. IEEE Computer Society, pp 433–142
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Francisco
Google Scholar
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6): 1010–1027
Article Google Scholar
Schumacher H, Sevcik KC (1976) The synthetic approach to decision table conversion. Commun ACM 19(6): 343–351
Article MATH Google Scholar
Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertainty, Fuzziness Knowledge-Based Syst 10(5): 557–570
Article MATH MathSciNet Google Scholar
Turney P (1995) Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. J Artif Intell Res 2: 369–409
Google Scholar
Uno T, Kiyomi M, Arimura H (2004) LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Bayardo R, Goethals B, Zaki MJ (eds) FIMI ’04, proceedings of the IEEE ICDM workshop on frequent itemset mining implementations. CEUR-WS.org
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
Yan X, Cheng H, Han J, Xin D (2005) Summarizing itemset patterns: a profile-based approach. In: Grossman R, Bayardo RJ, Bennett KP (eds) KDD’05, proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, pp 314–323
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997a) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds) KDD’97, proceedings of the 3rd international conference on knowledge discovery and data mining. AAAI Press, pp 283–286
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997b) Parallel algorithms for discovery of association rules. Data Mining Knowl Discov 1(4): 343–373
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Leuven, Belgium
Siegfried Nijssen
Laboratoire Hubert Curien, CNRS UMR 5516, Université de Saint-Étienne Jean Monnet, Université de Lyon, 42023, Saint-Étienne, France
Elisa Fromont

Authors

Siegfried Nijssen
View author publications
You can also search for this author in PubMed Google Scholar
Elisa Fromont
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siegfried Nijssen.

Additional information

Responsible editor: Johannes Fürnkranz and Arno Knobbe.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nijssen, S., Fromont, E. Optimal constraint-based decision tree induction from itemset lattices. Data Min Knowl Disc 21, 9–51 (2010). https://doi.org/10.1007/s10618-010-0174-x

Download citation

Received: 15 April 2009
Accepted: 06 March 2010
Published: 06 April 2010
Issue Date: July 2010
DOI: https://doi.org/10.1007/s10618-010-0174-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal constraint-based decision tree induction from itemset lattices

Abstract

Access this article

Similar content being viewed by others

Optimal classification trees

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Mathematical optimization in classification and regression trees

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal constraint-based decision tree induction from itemset lattices

Abstract

Access this article

Similar content being viewed by others

Optimal classification trees

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Mathematical optimization in classification and regression trees

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation