skip to main content
10.1145/1390156.1390251acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning

Published:05 July 2008Publication History

ABSTRACT

We show that linear value-function approximation is equivalent to a form of linear model approximation. We then derive a relationship between the model-approximation error and the Bellman error, and show how this relationship can guide feature selection for model improvement and/or value-function improvement. We also show how these results give insight into the behavior of existing feature-selection algorithms.

References

  1. Boyan, J. A. (1999). Least-squares temporal difference learning. ICML-99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bradtke, S., & Barto, A. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dean, T., & Givan, R. (1997). Model minimization in Markov decision processes. AAAI-97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Keller, P., Mannor, S., & Precup, D. (2006). Automatic basis function construction for approximate dynamic programming and reinforcement learning. ICML 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Koller, D., & Parr, R. (1999). Computing factored value functios for policies in structured MDPs. IJCAI-99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Lagoudakis, M., & Parr, R. (2003). Least squares policy iteration. JMLR, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mahadevan, S., & Maggioni, M. (2007). Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. JMLR, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Trans. on Signal Processing, 41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Parr, R., Painter-Wakefield, C., Li, L., & Littman, M. (2007). Analyzing feature generation for value-function approximation. ICML-07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Petrik, M. (2007). An analysis of Laplacian methods for value function approximation in MDPs. IJCAI-07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sanner, S., & Boutilier, C. (2005). Approximate linear programming for first-order MDPs. UAI-05.Google ScholarGoogle Scholar
  12. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Wu, J.-H., & Givan, R. (2004). Feature-discovering approximate value iteration methods (Technical Report TR-ECE-04-06). Purdue University.Google ScholarGoogle Scholar
  15. Yu, H., & Bertsekas, D. (2006). Convergence results for some temporal difference methods based on least squares (Technical Report LIDS-2697). MIT.Google ScholarGoogle Scholar

Index Terms

  1. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ICML '08: Proceedings of the 25th international conference on Machine learning
            July 2008
            1310 pages
            ISBN:9781605582054
            DOI:10.1145/1390156

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 5 July 2008

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate140of548submissions,26%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader