Co-Evolution in the Successful Learning of Backgammon Strategy

Pollack, Jordan B.; Blair, Alan D.

doi:10.1023/A:1007417214905

Co-Evolution in the Successful Learning of Backgammon Strategy

Published: September 1998

Volume 32, pages 225–240, (1998)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Co-Evolution in the Successful Learning of Backgammon Strategy

Download PDF

Jordan B. Pollack¹ &
Alan D. Blair¹

1475 Accesses
132 Citations
Explore all metrics

Abstract

Following Tesauro's work on TD-Gammon, we used a 4,000 parameter feedforward neural network to develop a competitive backgammon evaluation function. Play proceeds by a roll of the dice, application of the network to all legal moves, and selection of the position with the highest evaluation. However, no backpropagation, reinforcement or temporal difference learning methods were employed. Instead we apply simple hillclimbing in a relative fitness environment. We start with an initial champion of all zero weights and proceed simply by playing the current champion network against a slightly mutated challenger and changing weights if the challenger wins. Surprisingly, this worked rather well. We investigate how the peculiar dynamics of this domain enabled a previously discarded weak method to succeed, by preventing suboptimal equilibria in a “meta-game” of self-learning.

References

Angeline, P.J. (1994). An alternate interpretation of the iterated prisoner's dilemma and the evolution of nonmutual cooperation. In R. Brooks, & P. Maes (Eds.), Proceedings 4th Artificial Life Conference (pp. 353–358). MIT Press.
Angeline, P.J., & Pollack, J.B. (1994). Competitive environments evolve better solutions for complex tasks. In S. Forrest (Ed.), Genetic Algorithms: Proceedings of the Fifth International Conference.
Axelrod, R. (1984). The Evolution of Cooperation. New York: Basic Books.
Google Scholar
Boyan, J.A. (1992). Modular neural networks for learning context-dependent game strategies. Master's thesis, Computer Speech and Language Processing, Cambridge University.
Cliff, D., & Miller, G. (1995). Tracking the red queen: Measurements of adaptive progress in co-evolutionary simulations. Third European Conference on Artificial Life (pp. 200–218).
Crites, R., & Barto, A. (1996). Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8, pp. 1024–1030).
Epstein, S.L. (1994). Toward an ideal trainer. Machine Learning, 15, 251–277.
Google Scholar
Fogel, D.B. (1993). Using evolutionary programming to create neural networks that are capable of playing tictac-toe. International Conference on Neural Networks 1993 (pp. 875–880). IEEE Press.
Hillis, D. (1992). Co-evolving parasites improve simulated evolution as an optimization procedure. In C. Langton, C. Taylor, J.D. Farmer & S. Rasmussen (Eds.), Artificial Life II (pp. 313–324). Addison-Wesley.
Juille, H., & Pollack, J. (1995). Massively parallel genetic programming. In P. Angeline, & K. Kinnear (Eds.), Advances in Genetic Programming II. Cambridge: MIT Press.
Google Scholar
Littman, M.L. (1994). Markov games as a framework for multi-agent reinforcement learning. Machine Learning: Proceedings of the Eleventh International Conference (pp. 157–163). Morgan Kaufmann.
Littman, M.L. (1996). Algorithms for sequential decision making. Ph.D. dissertation, Providence: Brown University Computer Science Department.
Maynard Smith, J. (1982). Evolution and the Theory of Games. Cambridge: Cambridge University Press.
Google Scholar
Michie, D. (1961). Trial and error. Science Survey, part 2 (pp. 129–145). Penguin.
Mitchell, M., Hraber, P.T., & Crutchfield, J.P. (1993). Revisiting the edge of chaos: Evolving cellular automata to perform computations. Complex Systems, 7, 89–130.
Google Scholar
Packard, N. (1988). Adaptation towards the edge of chaos. In J.A.S. Kelso, A.J. Mandell, & M.F. Shlesinger (Eds.), Dynamic Patterns in Complex Systems (pp. 293–301). World Scientific.
Reynolds, C. (1994). Competition, coevolution, and the game of tag. Proceedings 4th Artificial Life Conference. MIT Press.
Rosin, C.D., & Belew, R.K. (1995). Methods for competitive co-evolution: Finding opponents worth beating. Proceedings of the 6th International Conference on Genetic Algorithms (pp. 373–380). Morgan Kaufman.
Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.
Google Scholar
Samuel, A.L. (1959). Some studies of machine learning using the game of checkers. IBM Journal of Research and Development.
Schraudolph, N.N., Dayan, P., & Sejnowski, T. J. (1994). Temporal difference learning of position evaluation in the game of go. Advances in Neural Information Processing Systems (Vol. 6, pp. 817–824). Morgan Kauffman.
Google Scholar
Sims, K. (1994). Evolving 3D morphology and behavior by competition. In R. Brooks, & P. Maes (Eds.), Proceedings 4th Artificial Life Conference. MIT Press.
Sutton, R. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Google Scholar
Tesauro, G. (1989). Connectionist learning of expert preferences by comparison training. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems, Denver (Vol. 1, pp. 99–106). San Mateo: Morgan Kaufmann.
Google Scholar
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277.
Google Scholar
Tesauro, G. (1995). Temporal difference learning and td-gammon. Communications of the ACM, 38(3), 58–68.
Google Scholar
Walker, S., Lister, R., & Downs, T. (1994). Temporal difference, non-determinism, and noise: A case study on the ‘othello’ board game. International Conference on Artificial Neural Networks 1994 (pp. 1428–1431). Sorrento, Italy.
Zhang, W., & Dietterich, T. (1996). High-performance job-shop scheduling with a time-delay td(lambda) network. In D. Touretzky, M. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8).

Download references

Author information

Authors and Affiliations

Computer Science Department, Volen Center for Complex Systems, Brandeis University, Waltham, MA, 02254. E-mail: Email
Jordan B. Pollack & Alan D. Blair

Authors

Jordan B. Pollack
View author publications
You can also search for this author in PubMed Google Scholar
Alan D. Blair
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pollack, J.B., Blair, A.D. Co-Evolution in the Successful Learning of Backgammon Strategy. Machine Learning 32, 225–240 (1998). https://doi.org/10.1023/A:1007417214905

Download citation

Issue Date: September 1998
DOI: https://doi.org/10.1023/A:1007417214905

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Co-Evolution in the Successful Learning of Backgammon Strategy

Abstract

Article PDF

Similar content being viewed by others

The Role of Behavioral Diversity and Difficulty of Opponents in Coevolving Game-Playing Agents

Learning to alternate

Learning and Evolution in Games: Adaptive Heuristics

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Co-Evolution in the Successful Learning of Backgammon Strategy

Abstract

Article PDF

Similar content being viewed by others

The Role of Behavioral Diversity and Difficulty of Opponents in Coevolving Game-Playing Agents

Learning to alternate

Learning and Evolution in Games: Adaptive Heuristics

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation