Skip to main content

Pure Exploration in Multi-armed Bandits Problems

  • Conference paper
Algorithmic Learning Theory (ALT 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5809))

Included in the following conference series:

Abstract

We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that perform an online exploration of the arms. The strategies are assessed in terms of their simple regret, a regret notion that captures the fact that exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast to the case when the cumulative regret is considered and when exploitation needs to be performed at the same time. We believe that this performance criterion is suited to situations when the cost of pulling an arm is expressed in terms of resources rather than rewards. We discuss the links between the simple and the cumulative regret. The main result is that the required exploration–exploitation trade-offs are qualitatively different, in view of a general lower bound on the simple regret in terms of the cumulative regret.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning Journal 47, 235–256 (2002)

    Article  MATH  Google Scholar 

  2. Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The non-stochastic multi-armed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bubeck, S., Munos, R., Stoltz, G.: Pure exploration for multi-armed bandit problems. Technical report, HAL report hal-00257454 (2009), http://hal.archives-ouvertes.fr/hal-00257454/en

  4. Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: Online optimization in \(\mathcal{X}\)–armed bandits. In: Advances in Neural Information Processing Systems, vol. 21 (2009)

    Google Scholar 

  5. Coquelin, P.-A., Munos, R.: Bandit algorithms for tree search. In: Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (2007)

    Google Scholar 

  6. Even-Dar, E., Mannor, S., Mansour, Y.: PAC bounds for multi-armed bandit and Markov decision processes. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 255–270. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  7. Gelly, S., Wang, Y., Munos, R., Teytaud, O.: Modification of UCT with patterns in Monte-Carlo go. Technical Report RR-6062, INRIA (2006)

    Google Scholar 

  8. Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004)

    Google Scholar 

  9. Kocsis, L., Szepesvari, C.: Bandit based Monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  11. Madani, O., Lizotte, D., Greiner, R.: The budgeted multi-armed bandit problem. In: Proceedings of the 17th Annual Conference on Computational Learning Theory, pp. 643–645 (2004); Open problems session

    Google Scholar 

  12. Mannor, S., Tsitsiklis, J.N.: The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research 5, 623–648 (2004)

    MathSciNet  MATH  Google Scholar 

  13. Robbins, H.: Some aspects of the sequential design of experiments. Bulletin of the American Mathematics Society 58, 527–535 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  14. Schlag, K.: Eleven tests needed for a recommendation. Technical Report ECO2006/2, European University Institute (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bubeck, S., Munos, R., Stoltz, G. (2009). Pure Exploration in Multi-armed Bandits Problems. In: Gavaldà, R., Lugosi, G., Zeugmann, T., Zilles, S. (eds) Algorithmic Learning Theory. ALT 2009. Lecture Notes in Computer Science(), vol 5809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04414-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04414-4_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04413-7

  • Online ISBN: 978-3-642-04414-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics