Skip to main content

Temporal Difference Learning and Simulated Annealing for Optimal Control: A Case Study

  • Conference paper
Agent and Multi-Agent Systems: Technologies and Applications (KES-AMSTA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4953))

Abstract

The trade-off between exploration and exploitation has an important impact on the performance of temporal difference learning. There are several action selection strategies, however, it is unclear which strategy is better. The impact of action selection strategies may depend on the application domains and human factors. This paper presents a modified Sarsa(λ) control algorithm by sampling actions in conjunction with simulated annealing technique. A game of soccer is utilised as the simulation environment, which has a large, dynamic and continuous state space. The empirical results demonstrate that the quality of convergence has been significantly improved by using the simulated annealing approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Teambots (2000), http://www.cs.cmu.edu/~trb/Teambots/Domains/SoccerBots

  2. Albus, J.S.: A Theory of Cerebellar Function. Mathematical Biosciences 10, 25–61 (1971)

    Article  Google Scholar 

  3. Atiya, A.F., Parlos, A.G., Ingber, L.: A Reinforcement Learning Method Based on Adaptive Simulated Annealing. In: Proceedings of the 46th IEEE International Midwest Symposium on, pp. 121–124 (2003)

    Google Scholar 

  4. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)

    Google Scholar 

  5. Chaharsooghi, S.K., Jafari, N.: A Simulated Annealing Approach for Product Mix Decisions. Scientia Iranica 14(3), 230–235 (2007)

    Google Scholar 

  6. Dowsland, K.A.: Simulated Annealing. In: Modern Heuristic Techniques for Combinatorial Problems (1995)

    Google Scholar 

  7. Guo, M., Liu, Y., Malec, J.: A New Q-learning Algorithm Based on the Metropolis Criterion. Systems, Man and Cybernetics, Part B, IEEE Transactions on 34(5), 2140–2143 (2004)

    Article  Google Scholar 

  8. Howard, R.A.: Dynamic Programming and Markov Processes. MIT Press, Cambridge (1960)

    MATH  Google Scholar 

  9. Ingber, L.: Very Fast Simulated Re-annealing. Mathematical Computer Modelling 12(8), 967–973 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  10. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by Simulated Annealing. Science 220(4598), 671–680 (1983)

    Article  MathSciNet  Google Scholar 

  11. Klopf, A.H.: Brain Function and Adaptive Systems–A Heterostatic Theory. Technical report, AFCRL–72–0164, Air Force Cambridge Research Laboratories, Bedford, MA (1972)

    Google Scholar 

  12. Leng, J., Fyfe, C., Jain, L.: Reinforcement Learning of Competitive Skills with Soccer Agents. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part I. LNCS (LNAI), vol. 4692, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  13. Leng, J., Jain, L., Fyfe, C.: Simulation and Reinforcement Learning with Soccer Agents. Journal of Multiagent and Grid systems, IOS Press, The Netherlands 4(4) (to be published, 2008)

    Google Scholar 

  14. Leng, J., Jain, L., Fyfe, C.: Convergence Analysis on Approximate Reinforcement Learning. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part I. LNCS (LNAI), vol. 4692, pp. 85–91. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.: Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 21, 1087–1092 (1953)

    Article  Google Scholar 

  16. Russel, S., Norwig, P.: Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs (2003)

    Google Scholar 

  17. Stefán, P., Monostori, L.: On the relationship between learning capability and the boltzmann-formula. In: Monostori, L., Váncza, J., Ali, M. (eds.) IEA/AIE 2001. LNCS (LNAI), vol. 2070, pp. 227–236. Springer, Heidelberg (2001)

    Google Scholar 

  18. Sutton, R.S.: Learning to Predict by the Method of Temporal Differences. Machine Learning 3, 9–44 (1988)

    Google Scholar 

  19. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  20. Vien, N.A., Viet, N.H., Lee, S., Chung, T.: Heuristic Search Based Exploration in Reinforcement Learning. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 110–118. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  21. White, S.R.: Concepts of scale in simulated annealing. In: AIP Conference Proceedings, vol. 122, pp. 261–270 (1984)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ngoc Thanh Nguyen Geun Sik Jo Robert J. Howlett Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Leng, J., Sathyaraj, B.M., Jain, L. (2008). Temporal Difference Learning and Simulated Annealing for Optimal Control: A Case Study. In: Nguyen, N.T., Jo, G.S., Howlett, R.J., Jain, L.C. (eds) Agent and Multi-Agent Systems: Technologies and Applications. KES-AMSTA 2008. Lecture Notes in Computer Science(), vol 4953. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78582-8_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78582-8_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78581-1

  • Online ISBN: 978-3-540-78582-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics