Pure Exploration in Multi-armed Bandits Problems

Bubeck, Sébastien; Munos, Rémi; Stoltz, Gilles

doi:10.1007/978-3-642-04414-4_7

Sébastien Bubeck²³,
Rémi Munos²³ &
Gilles Stoltz^24,25

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5809))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

1777 Accesses
96 Citations

Abstract

We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that perform an online exploration of the arms. The strategies are assessed in terms of their simple regret, a regret notion that captures the fact that exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast to the case when the cumulative regret is considered and when exploitation needs to be performed at the same time. We believe that this performance criterion is suited to situations when the cost of pulling an arm is expressed in terms of resources rather than rewards. We discuss the links between the simple and the cumulative regret. The main result is that the required exploration–exploitation trade-offs are qualitatively different, in view of a general lower bound on the simple regret in terms of the cumulative regret.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning Journal 47, 235–256 (2002)
Article MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The non-stochastic multi-armed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002)
Article MathSciNet MATH Google Scholar
Bubeck, S., Munos, R., Stoltz, G.: Pure exploration for multi-armed bandit problems. Technical report, HAL report hal-00257454 (2009), http://hal.archives-ouvertes.fr/hal-00257454/en
Bubeck, S., Munos, R., Stoltz, G., Szepesvari, C.: Online optimization in \(\mathcal{X}\)–armed bandits. In: Advances in Neural Information Processing Systems, vol. 21 (2009)
Google Scholar
Coquelin, P.-A., Munos, R.: Bandit algorithms for tree search. In: Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (2007)
Google Scholar
Even-Dar, E., Mannor, S., Mansour, Y.: PAC bounds for multi-armed bandit and Markov decision processes. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 255–270. Springer, Heidelberg (2002)
Chapter Google Scholar
Gelly, S., Wang, Y., Munos, R., Teytaud, O.: Modification of UCT with patterns in Monte-Carlo go. Technical Report RR-6062, INRIA (2006)
Google Scholar
Kleinberg, R.: Nearly tight bounds for the continuum-armed bandit problem. In: 18th Advances in Neural Information Processing Systems (2004)
Google Scholar
Kocsis, L., Szepesvari, C.: Bandit based Monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Chapter Google Scholar
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)
Article MathSciNet MATH Google Scholar
Madani, O., Lizotte, D., Greiner, R.: The budgeted multi-armed bandit problem. In: Proceedings of the 17th Annual Conference on Computational Learning Theory, pp. 643–645 (2004); Open problems session
Google Scholar
Mannor, S., Tsitsiklis, J.N.: The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research 5, 623–648 (2004)
MathSciNet MATH Google Scholar
Robbins, H.: Some aspects of the sequential design of experiments. Bulletin of the American Mathematics Society 58, 527–535 (1952)
Article MathSciNet MATH Google Scholar
Schlag, K.: Eleven tests needed for a recommendation. Technical Report ECO2006/2, European University Institute (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA Lille, SequeL Project, France
Sébastien Bubeck & Rémi Munos
Ecole normale supérieure, CNRS, Paris, France
Gilles Stoltz
HEC Paris, CNRS, Jouy-en-Josas, France
Gilles Stoltz

Authors

Sébastien Bubeck
View author publications
You can also search for this author in PubMed Google Scholar
Rémi Munos
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Stoltz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Group, Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya,, LARCA Jordi Girona Salgado 1-3, 08034, Barcelona, Spain
Ricard Gavaldà
ICREA and Department of Economics, Pompeu Fabra Universitat, Ramon Trias Fargas 25-27, 08005, Barcelona, Spain
Gábor Lugosi
Division of Computer Science, Hokkaido University, N-14, W-9, 060-0814, Sapporo, Japan
Thomas Zeugmann
Department of Computer Science, University of Regina, S4S 0A2, Regina, Saskatchewan, Canada
Sandra Zilles

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bubeck, S., Munos, R., Stoltz, G. (2009). Pure Exploration in Multi-armed Bandits Problems. In: Gavaldà, R., Lugosi, G., Zeugmann, T., Zilles, S. (eds) Algorithmic Learning Theory. ALT 2009. Lecture Notes in Computer Science(), vol 5809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04414-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-04414-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04413-7
Online ISBN: 978-3-642-04414-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics