Abstract
Following Tesauro's work on TD-Gammon, we used a 4,000 parameter feedforward neural network to develop a competitive backgammon evaluation function. Play proceeds by a roll of the dice, application of the network to all legal moves, and selection of the position with the highest evaluation. However, no backpropagation, reinforcement or temporal difference learning methods were employed. Instead we apply simple hillclimbing in a relative fitness environment. We start with an initial champion of all zero weights and proceed simply by playing the current champion network against a slightly mutated challenger and changing weights if the challenger wins. Surprisingly, this worked rather well. We investigate how the peculiar dynamics of this domain enabled a previously discarded weak method to succeed, by preventing suboptimal equilibria in a “meta-game” of self-learning.
Article PDF
Similar content being viewed by others
References
Angeline, P.J. (1994). An alternate interpretation of the iterated prisoner's dilemma and the evolution of nonmutual cooperation. In R. Brooks, & P. Maes (Eds.), Proceedings 4th Artificial Life Conference (pp. 353–358). MIT Press.
Angeline, P.J., & Pollack, J.B. (1994). Competitive environments evolve better solutions for complex tasks. In S. Forrest (Ed.), Genetic Algorithms: Proceedings of the Fifth International Conference.
Axelrod, R. (1984). The Evolution of Cooperation. New York: Basic Books.
Boyan, J.A. (1992). Modular neural networks for learning context-dependent game strategies. Master's thesis, Computer Speech and Language Processing, Cambridge University.
Cliff, D., & Miller, G. (1995). Tracking the red queen: Measurements of adaptive progress in co-evolutionary simulations. Third European Conference on Artificial Life (pp. 200–218).
Crites, R., & Barto, A. (1996). Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8, pp. 1024–1030).
Epstein, S.L. (1994). Toward an ideal trainer. Machine Learning, 15, 251–277.
Fogel, D.B. (1993). Using evolutionary programming to create neural networks that are capable of playing tictac-toe. International Conference on Neural Networks 1993 (pp. 875–880). IEEE Press.
Hillis, D. (1992). Co-evolving parasites improve simulated evolution as an optimization procedure. In C. Langton, C. Taylor, J.D. Farmer & S. Rasmussen (Eds.), Artificial Life II (pp. 313–324). Addison-Wesley.
Juille, H., & Pollack, J. (1995). Massively parallel genetic programming. In P. Angeline, & K. Kinnear (Eds.), Advances in Genetic Programming II. Cambridge: MIT Press.
Littman, M.L. (1994). Markov games as a framework for multi-agent reinforcement learning. Machine Learning: Proceedings of the Eleventh International Conference (pp. 157–163). Morgan Kaufmann.
Littman, M.L. (1996). Algorithms for sequential decision making. Ph.D. dissertation, Providence: Brown University Computer Science Department.
Maynard Smith, J. (1982). Evolution and the Theory of Games. Cambridge: Cambridge University Press.
Michie, D. (1961). Trial and error. Science Survey, part 2 (pp. 129–145). Penguin.
Mitchell, M., Hraber, P.T., & Crutchfield, J.P. (1993). Revisiting the edge of chaos: Evolving cellular automata to perform computations. Complex Systems, 7, 89–130.
Packard, N. (1988). Adaptation towards the edge of chaos. In J.A.S. Kelso, A.J. Mandell, & M.F. Shlesinger (Eds.), Dynamic Patterns in Complex Systems (pp. 293–301). World Scientific.
Reynolds, C. (1994). Competition, coevolution, and the game of tag. Proceedings 4th Artificial Life Conference. MIT Press.
Rosin, C.D., & Belew, R.K. (1995). Methods for competitive co-evolution: Finding opponents worth beating. Proceedings of the 6th International Conference on Genetic Algorithms (pp. 373–380). Morgan Kaufman.
Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.
Samuel, A.L. (1959). Some studies of machine learning using the game of checkers. IBM Journal of Research and Development.
Schraudolph, N.N., Dayan, P., & Sejnowski, T. J. (1994). Temporal difference learning of position evaluation in the game of go. Advances in Neural Information Processing Systems (Vol. 6, pp. 817–824). Morgan Kauffman.
Sims, K. (1994). Evolving 3D morphology and behavior by competition. In R. Brooks, & P. Maes (Eds.), Proceedings 4th Artificial Life Conference. MIT Press.
Sutton, R. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Tesauro, G. (1989). Connectionist learning of expert preferences by comparison training. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems, Denver (Vol. 1, pp. 99–106). San Mateo: Morgan Kaufmann.
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277.
Tesauro, G. (1995). Temporal difference learning and td-gammon. Communications of the ACM, 38(3), 58–68.
Walker, S., Lister, R., & Downs, T. (1994). Temporal difference, non-determinism, and noise: A case study on the ‘othello’ board game. International Conference on Artificial Neural Networks 1994 (pp. 1428–1431). Sorrento, Italy.
Zhang, W., & Dietterich, T. (1996). High-performance job-shop scheduling with a time-delay td(lambda) network. In D. Touretzky, M. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Pollack, J.B., Blair, A.D. Co-Evolution in the Successful Learning of Backgammon Strategy. Machine Learning 32, 225–240 (1998). https://doi.org/10.1023/A:1007417214905
Issue Date:
DOI: https://doi.org/10.1023/A:1007417214905