ABSTRACT
Programs evolved by genetic programming unfortunately often do not generalize to unseen data. Reliable synthesis of programs that generalize to unseen data is therefore an important open problem. We present evidence that smaller programs evolved using the PushGP system tend to generalize better over a range of program synthesis problems. Like in many genetic programming systems, programs evolved by PushGP usually have pieces that can be removed without changing the behavior of the program. We describe methods for automatically simplifying evolved programs to make them smaller and potentially improve their generalization. We present five simplification methods and analyze their strengths and weaknesses on a suite of general program synthesis benchmark problems. All of our methods use a straightforward hill-climbing procedure to remove pieces of a program while ensuring that the resulting program gives the same errors on the training data as did the original program. We show that automatic simplification, previously used both for post-run analysis and as a genetic operator, can significantly improve the generalization rates of evolved programs.
- Alexandras Agapitos, Anthony Brabazon, and Michael O'Neill. 2012. Controlling Overfitting in Symbolic Regression Based on a Bias/Variance Error Decomposition. In Parallel Problem Solving from Nature, PPSN XII (part 1) (Lecture Notes in Computer Science), Vol. 7491. Springer, Taormina, Italy, 438--447. DOI:http://dx.doi.org/ Google ScholarDigital Library
- R. Muhammad Atif Azad and Conor Ryan. 2011. Variance based selection to improve test set performance in genetic programming. In GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation. ACM, Dublin, Ireland, 1315--1322. DOI:http://dx.doi.org/ Google ScholarDigital Library
- Markus Brameier and Wolfgang Banzhaf. 2001. A Comparison of Linear Genetic Programming and Neural Networks in Medical Data Mining. IEEE Transactions on Evolutionary Computation 5, 1 (Feb. 2001), 17--26. http://web.cs.mun.ca/~banzhaf/papers/ieee_taec.pdf Google ScholarDigital Library
- Mauro Castelli, Luca Manzoni, Sara Silva, and Leonardo Vanneschi. 2010. A comparison of the generalization ability of different genetic programming frameworks. In IEEE Congress on Evolutionary Computation (CEC 2010). IEEE Press, Barcelona, Spain. DOI:http://dx.doi.org/Google ScholarCross Ref
- Pedro Domingos. 2016. Master Algorithm. Penguin Books.Google Scholar
- Aniko Ekart. 2000. Shorter Fitness Preserving Genetic Programs. In Artificial Evolution. 4th European Conference, AE'99, Selected Papers (LNCS), C. Fonlupt, J.-K. Hao, E. Lutton, E. Ronald, and M. Schoenauer (Eds.), Vol. 1829. Dunkerque, France, 73--83. http://www.sztaki.hu/~ekart/ea.ps Google ScholarDigital Library
- Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2017. A Grammar Design Pattern for Arbitrary Program Synthesis Problems in Genetic Programming. In 20th European Conference on Genetic Programming. In press.Google ScholarCross Ref
- Ashley George and Malcolm I. Heywood. 2006. Improving GP classifier generalization using a cluster separation metric. In GECCO 2006: Proceedings of the 8th annual conference on Genetic and evolutionary computation, Vol. 1. ACM Press, Seattle, Washington, USA, 939--940. DOI:http://dx.doi.org/ Google ScholarDigital Library
- Ivo Goncalves and Sara Silva. 2013. Balancing Learning and Overfitting in Genetic Programming with Interleaved Sampling of Training data. In Proceedings of the 16th European Conference on Genetic Programming, EuroGP 2013 (LNCS), Vol. 7831. Springer Verlag, Vienna, Austria, 73--84. DOI:http://dx.doi.org/ Google ScholarDigital Library
- Ivo Goncalves, Sara Silva, and Carlos M. Fonseca. 2015. On the Generalization Ability of Geometric Semantic Genetic Programming. In 18th European Conference on Genetic Programming (LNCS), Vol. 9025. Springer, Copenhagen, 41--52. DOI:http://dx.doi.org/Google Scholar
- Thomas Helmuth. 2015. General Program Synthesis from Examples Using Genetic Programming with Parent Selection Based on Random Lexicographic Orderings of Test Cases. Ph.D. dissertation. University of Massachusetts, Amherst, http://scholarworks.umass.edu/dissertations_2/465/Google Scholar
- Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2015. Lexicase Selection For Program Synthesis: A Diversity Analysis. In Genetic Programming Theory and Practice XIII (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA. DOI:http://dx.doi.org/Google Scholar
- Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2016. The Impact of Hyperselection on Lexicase Selection. In GECCO '16: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference, Tobias Friedrich (Ed.). ACM, Denver, USA, 717--724. DOI:http://dx.doi.org/ Google ScholarDigital Library
- Thomas Helmuth and Lee Spector. 2015. General Program Synthesis Benchmark Suite. In GECCO '15: Proceedings of the 2015 on Genetic and Evolutionary Computation Conference. ACM, Madrid, Spain, 1039--1046. DOI:http://dx.doi.org/ Google ScholarDigital Library
- Thomas Helmuth, Lee Spector, Nicholas Freitag McPhee, and Saul Shanabrook. 2016. Linear Genomes for Structured Programs. In Genetic Programming Theory and Practice XIV (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA.Google Scholar
- M. Hollander and D.A. Wolfe. 1999. Nonparametric Statistical Methods. Wiley.Google Scholar
- Dale Hooper and Nicholas S. Flann. 1996. Improving the Accuracy and Robustness of Genetic Programming through Expression Simplification. In Genetic Programming 1996: Proceedings of the First Annual Conference. MIT Press, Stanford University, CA, USA, 428. http://digital.cs.usu.edu/~flann/gp.pdf Google ScholarDigital Library
- Hitoshi Iba, Hugo De Garis, and Taisuke Sato. 1994. Genetic programming using a minimum description length principle. Advances in genetic programming 1 (1994), 265--284. Google ScholarDigital Library
- David Kinzett, Mengjie Zhang, and Mark Johnston. 2010. Investigation of simplification threshold and noise level of input data in numerical simplification of genetic programs. In IEEE Congress on Evolutionary Computation (CEC 2010). IEEE Press, Barcelona, Spain. DOI:http://dx.doi.org/Google ScholarCross Ref
- John R. Koza. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA. http://mitpress.mit.edu/books/genetic-programming Google ScholarDigital Library
- William La Cava and Lee Spector. 2014. Inheritable Epigenetics in Genetic Programming. In Genetic Programming Theory and Practice XII (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA, 37--51. DOI:http://dx.doi.org/Google Scholar
- Nicholas Freitag McPhee, Mitchell Finzel, Maggie M. Casale, Thomas Helmuth, and Lee Spector. 2016. A detailed analysis of a PushGP run. In Genetic Programming Theory and Practice XIV (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA.Google Scholar
- Alberto Moraglio, Krzysztof Krawiec, and Colin G. Johnson. 2012. Geometric Semantic Genetic Programming. In Parallel Problem Solving from Nature, PPSN XII (part 1) (Lecture Notes in Computer Science), Vol. 7491. Springer, Taormina, Italy, 21--31. DOI:http://dx.doi.org/ Google ScholarDigital Library
- Riccardo Poli, William B. Langdon, and Nicholas Freitag McPhee. 2008. A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk. http://www.gp-field-guide.org.uk (With contributions by J. R. Koza).Google Scholar
- Alan Robinson. 2001. Genetic Programming: Theory, Implementation, and the Evolution of Unconstrained Solutions. Division III thesis. Hampshire College. http://hampshire.edu/lspector/robinson-div3.pdfGoogle Scholar
- Justinian Rosca. 1996. Generality Versus Size in Genetic Programming. In Genetic Programming 1996: Proceedings of the First Annual Conference. MIT Press, Stanford University, CA, USA, 381--387. ftp://ftp.cs.rochester.edu/pub/u/rosca/gp/96.gp.ps.gz Google ScholarDigital Library
- Sara Silva, Stephen Dignum, and Leonardo Vanneschi. 2012. Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genetic Programming and Evolvable Machines 13, 2 (2012), 197--238. Google ScholarDigital Library
- Lee Spector. 2001. Autoconstructive Evolution: Push, PushGP, and Push-pop. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001). Morgan Kaufmann, San Francisco, California, USA, 137--146. http://hampshire.edu/lspector/pubs/ace.pdfGoogle Scholar
- Lee Spector and Thomas Helmuth. 2014. Effective simplification of evolved push programs using a simple, stochastic hill-climber. In GECCO Comp '14: Proceedings of the 2014 conference companion on Genetic and evolutionary computation companion. ACM, Vancouver, BC, Canada, 147--148. DOI:http://dx.doi.org/ Google ScholarDigital Library
- Lee Spector, Jon Klein, and Maarten Keijzer. 2005. The Push3 execution stack and the evolution of control. In GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation. ACM Press, Washington DC, USA, 1689--1696. DOI:http://dx.doi.org/ Google ScholarDigital Library
- Lee Spector and Alan Robinson. 2002. Genetic Programming and Autoconstructive Evolution with the Push Programming Language. Genetic Programming and Evolvable Machines 3, 1 (March 2002), 7--40. DOI:http://dx.doi.org/ Google ScholarDigital Library
- Leonardo Vanneschi, Mauro Castelli, and Sara Silva. 2010. Measuring bloat, overfitting and functional complexity in genetic programming. In GECCO '10: Proceedings of the 12th annual conference on Genetic and evolutionary computation. ACM, Portland, Oregon, USA, 877--884. DOI:http://dx.doi.org/ Google ScholarDigital Library
- Leonardo Vanneschi and Steven Gustafson. 2009. Using crossover based similarity measure to improve genetic programming generalization ability. In GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation. ACM, Montreal, 1139--1146. DOI:http://dx.doi.org/ Google ScholarDigital Library
- Phillip Wong and Mengjie Zhang. 2006. Algebraic simplification of GP programs during evolution. In GECCO 2006: Proceedings of the 8th annual conference on Genetic and evolutionary computation, Vol. 1. ACM Press, Seattle, Washington, USA, 927--934. DOI:http://dx.doi.org/ Google ScholarDigital Library
- Haoxi Zhan. 2014. A quantitative analysis of the simplification genetic operator. In GECCO 2014 student workshop, Tea Tusar and Boris Naujoks (Eds.). ACM, Vancouver, BC, Canada, 1077--1080. DOI:http://dx.doi.org/ Google ScholarDigital Library
- Byoung-Tak Zhang and Heinz Mühlenbein. 1995. Balancing accuracy and parsimony in genetic programming. Evolutionary Computation 3, 1 (1995), 17--38. Google ScholarDigital Library
Index Terms
- Improving generalization of evolved programs through automatic simplification
Recommendations
Effective simplification of evolved push programs using a simple, stochastic hill-climber
GECCO Comp '14: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary ComputationGenetic programming systems often produce programs that include unnecessary code. This is undesirable for several reasons, including the burdens that overly-large programs put on end-users for program interpretation and maintenance. The problem is ...
Improving GP generalization: a variance-based layered learning approach
This paper introduces a new method that improves the generalization ability of genetic programming (GP) for symbolic regression problems, named variance-based layered learning GP. In this approach, several datasets, called primitive training sets, are ...
Algebraic simplification of GP programs during evolution
GECCO '06: Proceedings of the 8th annual conference on Genetic and evolutionary computationProgram bloat is a fundamental problem in the field of Genetic Programming (GP). Exponential growth of redundant and functionally useless sections of programs can quickly overcome a GP system, exhausting system resources and causing premature ...
Comments