skip to main content
10.1145/1390156.1390199acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Reinforcement learning in the presence of rare events

Published:05 July 2008Publication History

ABSTRACT

We consider the task of reinforcement learning in an environment in which rare significant events occur independently of the actions selected by the controlling agent. If these events are sampled according to their natural probability of occurring, convergence of conventional reinforcement learning algorithms is likely to be slow, and the learning algorithms may exhibit high variance. In this work, we assume that we have access to a simulator, in which the rare event probabilities can be artificially altered. Then, importance sampling can be used to learn with this simulation data. We introduce algorithms for policy evaluation, using both tabular and function approximation representations of the value function. We prove that in both cases, the reinforcement learning algorithms converge. In the tabular case, we also analyze the bias and variance of our approach compared to TD-learning. We evaluate empirically the performance of the algorithm on random Markov Decision Processes, as well as on a large network planning task.

References

  1. Ahamed, T. P. I., Borkar, V. S., & Juneja, S. (2006). Adaptive importance sampling technique for Markov chains using stochastic approximation. Oper. Res., 54, 489--504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Asmussen, S. & Glynn, P. (2007). Stochastic Simulation: Algorithms and Analysis. Springer.Google ScholarGoogle Scholar
  3. Baxter, J. & Bartlett, P. (2001). Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15, 319--350. Google ScholarGoogle ScholarCross RefCross Ref
  4. Bertsekas, D. & Tsitsiklis, J. (1996). Neuro-Dynamic Programming. Athena Scientific. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bhatnagar, S., Borkar, V. S., & Akarapu, M. (2006). A simulation-based algorithm for ergodic control of Markov chains conditioned on rare events. Journal of Machine Learning Research, 7, 1937--1962. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bucklew, J. (2004). Introduction to Rare Event Simulation. Springer.Google ScholarGoogle Scholar
  7. Mannor, S., Simester, D., Sun, P., & Tsitsiklis, J. (2007). Bias and variance approximation in value function estimates. Management Science, 53, 308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Precup, D., Sutton, R., & Dasgupta, S. (2001). Off-policy temporal-difference learning with function approximation. Proc. 18th International Conf. on Machine Learning, 417--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Precup, D., Sutton, R., & Singh, S. (2000). Eligibility traces for off-policy policy evaluation. Proc. 17th International Conf. on Machine Learning, 759--766. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Rubinstein, R. & Kroese, D. (2004). The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sutton, R. S. & Barto, A. G. (1998). Reinforcement Learning. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reinforcement learning in the presence of rare events

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                ICML '08: Proceedings of the 25th international conference on Machine learning
                July 2008
                1310 pages
                ISBN:9781605582054
                DOI:10.1145/1390156

                Copyright © 2008 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 5 July 2008

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                Overall Acceptance Rate140of548submissions,26%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader