ABSTRACT
Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output with an acceptable level of predictive performance. Since generating optimal model trees is a NP-Complete problem, the traditional model tree induction algorithms make use of a greedy heuristic, which may not converge to the global optimal solution. We propose the use of the evolutionary algorithms paradigm (EA) as an alternate heuristic to generate model trees in order to improve the convergence to global optimal solutions. We test the predictive performance of this new approach using public UCI datasets, and compare the results with traditional greedy regression/model trees induction algorithms.
- Asuncion, A. and Newman, D. UCI Machine Learning Repository. 2007.Google Scholar
- Basgalupp, M., Barros, Rodrigo C., Carvalho, A. P. L. F. de, Freitas, Alex A., and Ruiz, Duncan D. LEGAL-Tree: A lexicographic multi-objective genetic algorithm for decision tree induction. In Proceedings of the 24th Annual ACM Symposium on Applied Computing (Honolulu, Hawaii, USA 2009), 1085--1090. Google ScholarDigital Library
- Basgalupp, M., Barros, Rodrigo C., Carvalho, A. P. L. F. de, Freitas, Alex A., and Ruiz, Duncan D. Lexicographic multi-objective evolutionary induction of decision trees. International Journal of Bio-Inspired Computation, 1 (2009), 105--117. Google ScholarDigital Library
- Bjorck, A. Numerical Methods for Least Square Problems. SIAM, 1996.Google Scholar
- Breiman, L. Bagging Predictors. Machine Learning, 24, 2 (1996), 123--140. Google ScholarDigital Library
- Breiman, L. Random Forests. Machine Learning, 45, 1 (2001), 5--32. Google ScholarDigital Library
- Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. Classification and Regression Trees. Chapman & Hall, 1984.Google Scholar
- Burgess, C. J. and Lefley, M. Can genetic programming improve software effort estimation? A comparative evaluation. Information and Software Technology, 43, 14 (2001), 863--873.Google ScholarCross Ref
- Fan, G. and Gray, J. B. Regression tree analysis using target. Journal of Computational and Graphical Statistics, 14, 1 (2005), 206--218.Google ScholarCross Ref
- Freitas, Alex A. A critical review of multi-objective optimization in data mining: a position paper. SIGKDD Explorations Newsletter, 6, 2 (2004), 77--86. Google ScholarDigital Library
- Freitas, Alex A. A Review of Evolutionary Algorithms for Data Mining. In Soft Computing for Knowledge Discovery and Data Mining. Springer US, 2008, 79--111.Google Scholar
- Freitas, Alex A, Wieser, D. C., and Apweiler, R. On the importance of comprehensible classification models for protein function prediction. To appear in IEEE/ACM Transactions on Computational Biology and Bioinformatics. Google ScholarDigital Library
- Freund, Y. and Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 1 (1997), 119--139. Google ScholarDigital Library
- Fu, Z., Golden, B. L., Lele, S., Raghavan, S., and Wasil, E. Diversification for better classification trees. Computers & Operations Research, 33, 11 (2006), 3185--3202. Google ScholarDigital Library
- Goldberg, D. Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley, Boston, 1989. Google ScholarDigital Library
- Gray, J. B. and Fan, G. Classification tree analysis using TARGET. Computational Statistics & Data Analysis, 52 (2008), 1362--1372. Google ScholarDigital Library
- Nadeau, C. and Bengio, Y. Inference for the generalization error. Machine Learning, 52, 3 (2003), 239--281. Google ScholarDigital Library
- Potgieter, G. and Engelbrecht, A. Evolving model trees for mining data sets with continuous-valued classes. Expert Systems with Applications, 35 (2008), 1513--1532. Google ScholarDigital Library
- Potgieter, G. and Engelbrecht, A. Genetic algorithms for the structural optimisation of learned polynomial expressions. Applied Mathematics and Computation, 186, 2 (2007), 1441--1466.Google ScholarCross Ref
- Quinlan, J. R. Learning with Continuous Classes. In 5th Australian Joint Conference on Artificial Intelligence (1992), 343--348.Google Scholar
- Rokach, L. and Maimon, O. Data Mining with Decision Trees: Theory and Applications. World Scientific Publishing, 2008. Google ScholarDigital Library
- Setiono, R., Leow, W. K., and Zurada, J. M. Extraction of rules from artificial neural networks for nonlinear regression. IEEE Transactions on Neural Networks, 13, 3 (2002), 564--577. Google ScholarDigital Library
- Tan, P. N., Steinbach, M., and Kumar, V. Introduction to Data Mining. Pearson Education, 2006. Google ScholarDigital Library
- Wang, Y. and Witten, I. Inducing model trees for continuous classes. In Poster papers of the 9th European Conference on Machine Learning (1997).Google Scholar
- Witten, I. H. and Frank, E. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005. Google ScholarDigital Library
- Zhao, H. A multi-objective genetic programming approach to developing Pareto optimal decision trees. Decision Support Systems, 43, 3 (2007), 809--826. Google ScholarDigital Library
Index Terms
- Evolutionary model tree induction
Recommendations
Evolutionary model trees for handling continuous classes in machine learning
Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-...
A multi-objective evolutionary approach to Pareto-optimal model trees
This paper discusses the multi-objective evolutionary approach to induction of model trees. The model tree is a particular case of a decision tree designed to solve regression problems. Although the decision tree induction is inherently a multi-...
Does memetic approach improve global induction of regression and model trees?
SIDE'12: Proceedings of the 2012 international conference on Swarm and Evolutionary ComputationMemetic algorithms are popular approaches to improve pure evolutionary methods. But were and when in the system the local search should be applied and does it really speed up evolutionary search is a still an open question. In this paper we investigate ...
Comments