skip to main content
10.1145/3071178.3071330acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article
Public Access

Improving generalization of evolved programs through automatic simplification

Published:01 July 2017Publication History

ABSTRACT

Programs evolved by genetic programming unfortunately often do not generalize to unseen data. Reliable synthesis of programs that generalize to unseen data is therefore an important open problem. We present evidence that smaller programs evolved using the PushGP system tend to generalize better over a range of program synthesis problems. Like in many genetic programming systems, programs evolved by PushGP usually have pieces that can be removed without changing the behavior of the program. We describe methods for automatically simplifying evolved programs to make them smaller and potentially improve their generalization. We present five simplification methods and analyze their strengths and weaknesses on a suite of general program synthesis benchmark problems. All of our methods use a straightforward hill-climbing procedure to remove pieces of a program while ensuring that the resulting program gives the same errors on the training data as did the original program. We show that automatic simplification, previously used both for post-run analysis and as a genetic operator, can significantly improve the generalization rates of evolved programs.

References

  1. Alexandras Agapitos, Anthony Brabazon, and Michael O'Neill. 2012. Controlling Overfitting in Symbolic Regression Based on a Bias/Variance Error Decomposition. In Parallel Problem Solving from Nature, PPSN XII (part 1) (Lecture Notes in Computer Science), Vol. 7491. Springer, Taormina, Italy, 438--447. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Muhammad Atif Azad and Conor Ryan. 2011. Variance based selection to improve test set performance in genetic programming. In GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation. ACM, Dublin, Ireland, 1315--1322. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Markus Brameier and Wolfgang Banzhaf. 2001. A Comparison of Linear Genetic Programming and Neural Networks in Medical Data Mining. IEEE Transactions on Evolutionary Computation 5, 1 (Feb. 2001), 17--26. http://web.cs.mun.ca/~banzhaf/papers/ieee_taec.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Mauro Castelli, Luca Manzoni, Sara Silva, and Leonardo Vanneschi. 2010. A comparison of the generalization ability of different genetic programming frameworks. In IEEE Congress on Evolutionary Computation (CEC 2010). IEEE Press, Barcelona, Spain. DOI:http://dx.doi.org/Google ScholarGoogle ScholarCross RefCross Ref
  5. Pedro Domingos. 2016. Master Algorithm. Penguin Books.Google ScholarGoogle Scholar
  6. Aniko Ekart. 2000. Shorter Fitness Preserving Genetic Programs. In Artificial Evolution. 4th European Conference, AE'99, Selected Papers (LNCS), C. Fonlupt, J.-K. Hao, E. Lutton, E. Ronald, and M. Schoenauer (Eds.), Vol. 1829. Dunkerque, France, 73--83. http://www.sztaki.hu/~ekart/ea.ps Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2017. A Grammar Design Pattern for Arbitrary Program Synthesis Problems in Genetic Programming. In 20th European Conference on Genetic Programming. In press.Google ScholarGoogle ScholarCross RefCross Ref
  8. Ashley George and Malcolm I. Heywood. 2006. Improving GP classifier generalization using a cluster separation metric. In GECCO 2006: Proceedings of the 8th annual conference on Genetic and evolutionary computation, Vol. 1. ACM Press, Seattle, Washington, USA, 939--940. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ivo Goncalves and Sara Silva. 2013. Balancing Learning and Overfitting in Genetic Programming with Interleaved Sampling of Training data. In Proceedings of the 16th European Conference on Genetic Programming, EuroGP 2013 (LNCS), Vol. 7831. Springer Verlag, Vienna, Austria, 73--84. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ivo Goncalves, Sara Silva, and Carlos M. Fonseca. 2015. On the Generalization Ability of Geometric Semantic Genetic Programming. In 18th European Conference on Genetic Programming (LNCS), Vol. 9025. Springer, Copenhagen, 41--52. DOI:http://dx.doi.org/Google ScholarGoogle Scholar
  11. Thomas Helmuth. 2015. General Program Synthesis from Examples Using Genetic Programming with Parent Selection Based on Random Lexicographic Orderings of Test Cases. Ph.D. dissertation. University of Massachusetts, Amherst, http://scholarworks.umass.edu/dissertations_2/465/Google ScholarGoogle Scholar
  12. Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2015. Lexicase Selection For Program Synthesis: A Diversity Analysis. In Genetic Programming Theory and Practice XIII (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA. DOI:http://dx.doi.org/Google ScholarGoogle Scholar
  13. Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2016. The Impact of Hyperselection on Lexicase Selection. In GECCO '16: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference, Tobias Friedrich (Ed.). ACM, Denver, USA, 717--724. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Thomas Helmuth and Lee Spector. 2015. General Program Synthesis Benchmark Suite. In GECCO '15: Proceedings of the 2015 on Genetic and Evolutionary Computation Conference. ACM, Madrid, Spain, 1039--1046. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Thomas Helmuth, Lee Spector, Nicholas Freitag McPhee, and Saul Shanabrook. 2016. Linear Genomes for Structured Programs. In Genetic Programming Theory and Practice XIV (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA.Google ScholarGoogle Scholar
  16. M. Hollander and D.A. Wolfe. 1999. Nonparametric Statistical Methods. Wiley.Google ScholarGoogle Scholar
  17. Dale Hooper and Nicholas S. Flann. 1996. Improving the Accuracy and Robustness of Genetic Programming through Expression Simplification. In Genetic Programming 1996: Proceedings of the First Annual Conference. MIT Press, Stanford University, CA, USA, 428. http://digital.cs.usu.edu/~flann/gp.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hitoshi Iba, Hugo De Garis, and Taisuke Sato. 1994. Genetic programming using a minimum description length principle. Advances in genetic programming 1 (1994), 265--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. David Kinzett, Mengjie Zhang, and Mark Johnston. 2010. Investigation of simplification threshold and noise level of input data in numerical simplification of genetic programs. In IEEE Congress on Evolutionary Computation (CEC 2010). IEEE Press, Barcelona, Spain. DOI:http://dx.doi.org/Google ScholarGoogle ScholarCross RefCross Ref
  20. John R. Koza. 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, USA. http://mitpress.mit.edu/books/genetic-programming Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. William La Cava and Lee Spector. 2014. Inheritable Epigenetics in Genetic Programming. In Genetic Programming Theory and Practice XII (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA, 37--51. DOI:http://dx.doi.org/Google ScholarGoogle Scholar
  22. Nicholas Freitag McPhee, Mitchell Finzel, Maggie M. Casale, Thomas Helmuth, and Lee Spector. 2016. A detailed analysis of a PushGP run. In Genetic Programming Theory and Practice XIV (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA.Google ScholarGoogle Scholar
  23. Alberto Moraglio, Krzysztof Krawiec, and Colin G. Johnson. 2012. Geometric Semantic Genetic Programming. In Parallel Problem Solving from Nature, PPSN XII (part 1) (Lecture Notes in Computer Science), Vol. 7491. Springer, Taormina, Italy, 21--31. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Riccardo Poli, William B. Langdon, and Nicholas Freitag McPhee. 2008. A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk. http://www.gp-field-guide.org.uk (With contributions by J. R. Koza).Google ScholarGoogle Scholar
  25. Alan Robinson. 2001. Genetic Programming: Theory, Implementation, and the Evolution of Unconstrained Solutions. Division III thesis. Hampshire College. http://hampshire.edu/lspector/robinson-div3.pdfGoogle ScholarGoogle Scholar
  26. Justinian Rosca. 1996. Generality Versus Size in Genetic Programming. In Genetic Programming 1996: Proceedings of the First Annual Conference. MIT Press, Stanford University, CA, USA, 381--387. ftp://ftp.cs.rochester.edu/pub/u/rosca/gp/96.gp.ps.gz Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sara Silva, Stephen Dignum, and Leonardo Vanneschi. 2012. Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genetic Programming and Evolvable Machines 13, 2 (2012), 197--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Lee Spector. 2001. Autoconstructive Evolution: Push, PushGP, and Push-pop. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001). Morgan Kaufmann, San Francisco, California, USA, 137--146. http://hampshire.edu/lspector/pubs/ace.pdfGoogle ScholarGoogle Scholar
  29. Lee Spector and Thomas Helmuth. 2014. Effective simplification of evolved push programs using a simple, stochastic hill-climber. In GECCO Comp '14: Proceedings of the 2014 conference companion on Genetic and evolutionary computation companion. ACM, Vancouver, BC, Canada, 147--148. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lee Spector, Jon Klein, and Maarten Keijzer. 2005. The Push3 execution stack and the evolution of control. In GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation. ACM Press, Washington DC, USA, 1689--1696. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lee Spector and Alan Robinson. 2002. Genetic Programming and Autoconstructive Evolution with the Push Programming Language. Genetic Programming and Evolvable Machines 3, 1 (March 2002), 7--40. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Leonardo Vanneschi, Mauro Castelli, and Sara Silva. 2010. Measuring bloat, overfitting and functional complexity in genetic programming. In GECCO '10: Proceedings of the 12th annual conference on Genetic and evolutionary computation. ACM, Portland, Oregon, USA, 877--884. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Leonardo Vanneschi and Steven Gustafson. 2009. Using crossover based similarity measure to improve genetic programming generalization ability. In GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation. ACM, Montreal, 1139--1146. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Phillip Wong and Mengjie Zhang. 2006. Algebraic simplification of GP programs during evolution. In GECCO 2006: Proceedings of the 8th annual conference on Genetic and evolutionary computation, Vol. 1. ACM Press, Seattle, Washington, USA, 927--934. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Haoxi Zhan. 2014. A quantitative analysis of the simplification genetic operator. In GECCO 2014 student workshop, Tea Tusar and Boris Naujoks (Eds.). ACM, Vancouver, BC, Canada, 1077--1080. DOI:http://dx.doi.org/ Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Byoung-Tak Zhang and Heinz Mühlenbein. 1995. Balancing accuracy and parsimony in genetic programming. Evolutionary Computation 3, 1 (1995), 17--38. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving generalization of evolved programs through automatic simplification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      GECCO '17: Proceedings of the Genetic and Evolutionary Computation Conference
      July 2017
      1427 pages
      ISBN:9781450349208
      DOI:10.1145/3071178

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 July 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      GECCO '17 Paper Acceptance Rate178of462submissions,39%Overall Acceptance Rate1,669of4,410submissions,38%

      Upcoming Conference

      GECCO '24
      Genetic and Evolutionary Computation Conference
      July 14 - 18, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader