Data Mining with Decision Trees: Theory and Applications

Andreas Holzinger (Medical University Graz)

Online Information Review

ISSN: 1468-4527

Article publication date: 8 June 2015

1240

Citation

Andreas Holzinger (2015), "Data Mining with Decision Trees: Theory and Applications", Online Information Review, Vol. 39 No. 3, pp. 437-438. https://doi.org/10.1108/OIR-04-2015-0121

Publisher

:

Emerald Group Publishing Limited

Copyright © 2015, Emerald Group Publishing Limited


With the growing importance of exploring large and complex data sets in knowledge discovery and data mining, the application of decision trees has become a powerful and popular approach. Whilst the first edition of this work focused on using trees for classification tasks, this second edition describes how decision trees can be used for regression, clustering and survival analysis – very important topics for the discovery of useful patterns in complex data sets. The first edition is already a classic on the desks of scientists, and in this new edition all chapters have been revised and new topics added, including cost-sensitive active learning, learning with uncertain and imbalanced data, privacy preserving decision tree learning, lessons learned from comparative studies, and learning decision trees for Big Data plus an entire chapter on recommender systems. Very important to note is the practical walk-through guide to existing open source data mining software, which alone constitutes a huge additional benefit.

The book starts with an easy-to-read introduction to decision trees, and moves on to address the issue of how to train decision trees. Chapter 3 introduces a generic algorithm for top-down induction of decision trees, and Chapter 4 contains evaluation methods. Splitting criteria and pruning trees are discussed in Chapters 5 and 6, and continued by popular decision trees induction algorithms (ID3, C4.5, CART, CHAID, QUEST, etc.). Chapter 8 deals with matters beyond classification tasks, i.e. regression trees, survival trees, clustering trees and hidden Markov model trees. Chapter 9 deals extensively with decision forests, including, for example Naïve Bayes, entropy weighting and random forests. Chapter 10 is on Weka and R, whilst Chapter 11 is on advanced decision trees, including oblivious decision trees, lazy trees, option trees, and so on. Chapter 12 addresses cost-sensitive active and proactive learning, and Chapter 13 concentrates on feature selection. In chapter 14 the authors describe fuzzy decision trees, and in Chapter 15 they focus on hybridisation of decision trees with other techniques, for example CPOM, and evolutionary algorithms. Finally, Chapter 16 deals with the use of decision trees for recommending items and preferences elicitation.

Overall, this book is a must read for anybody who is working in the area of knowledge discovery/data mining, and it has the potential to became a standard on our desks.

Related articles