Bandits and Experts in Metric Spaces

Authors:
Robert Kleinberg

Computer Science Department, Cornell University, Ithaca, NY, USA

Computer Science Department, Cornell University, Ithaca, NY, USA
View Profile

,
Aleksandrs Slivkins

Microsoft Research NYC, New York, NY, USA USA

Microsoft Research NYC, New York, NY, USA USA
View Profile

,
Eli Upfal

Computer Science Department, Brown University, Providence, RI, USA

Computer Science Department, Brown University, Providence, RI, USA
View Profile

Authors Info & Claims

Journal of the ACM Volume 66 Issue 4Article No.: 30pp 1–77https://doi.org/10.1145/3299873

Published:31 May 2019Publication History

Journal of the ACM

Abstract

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of trials to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is well understood, bandit problems with large strategy sets are still a topic of active investigation, motivated by practical applications, such as online auctions and web advertisement. The goal of such research is to identify broad and natural classes of strategy sets and payoff functions that enable the design of efficient solutions.

In this work, we study a general setting for the multi-armed bandit problem, in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric. We refer to this problem as the Lipschitz MAB problem. We present a solution for the multi-armed bandit problem in this setting. That is, for every metric space, we define an isometry invariant that bounds from below the performance of Lipschitz MAB algorithms for this metric space, and we present an algorithm that comes arbitrarily close to meeting this bound. Furthermore, our technique gives even better results for benign payoff functions. We also address the full-feedback (“best expert”) version of the problem, where after every round the payoffs from all arms are revealed.

References

Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved algorithms for linear stochastic bandits. In Proceedings of the 25th Advances in Neural Information Processing Systems (NIPS’11). 2312--2320. Google ScholarDigital Library
Jacob Abernethy, Elad Hazan, and Alexander Rakhlin. 2008. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 263--274.Google Scholar
Ittai Abraham and Dahlia Malkhi. 2005. Name independent routing for growth bounded networks. In Proceedings of the 17th ACM Symposium on Parallel Algorithms and Architectures (SPAA). 49--55. Google ScholarDigital Library
Rajeev Agrawal. 1995. The continuum-armed bandit problem. SIAM J. Control Optimiz. 33, 6 (1995), 1926--1951. Google ScholarDigital Library
Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, and Assaf Zeevi. 2016. A near-optimal exploration-exploitation approach for assortment selection. In Proceedings of the 17th ACM Conference on Economics and Computation (ACM EC’16). 599--600. Google ScholarDigital Library
Shipra Agrawal and Nikhil R. Devanur. 2014. Bandits with concave rewards and convex knapsacks. In Proceedings of the 15th ACM Conference on Economics and Computation (ACM EC’14). Google ScholarDigital Library
Kareem Amin, Michael Kearns, and Umar Syed. 2011. Bandits, query learning, and the haystack dimension. In Proceedings of the 24th Conference on Learning Theory (COLT’11).Google Scholar
J.-Y. Audibert and S. Bubeck. 2010. Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11 (2010), 2785--2836. Google ScholarDigital Library
J.-Y. Audibert, R. Munos, and Cs. Szepesvári. 2009. Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoret. Comput. Sci. 410 (2009), 1876--1902. Google ScholarDigital Library
Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3 (2002), 397--422. Google ScholarDigital Library
Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47, 2--3 (2002), 235--256. Google ScholarDigital Library
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2002. The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32, 1 (2002), 48--77. Google ScholarDigital Library
Peter Auer and Ronald Ortner. 2010. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica 61 (2010), 55--65.Google ScholarCross Ref
Peter Auer, Ronald Ortner, and Csaba Szepesvári. 2007. Improved rates for the stochastic continuum-armed bandit problem. In Proceedings of the 20th Conference on Learning Theory (COLT’07). 454--468. Google ScholarDigital Library
Baruch Awerbuch and Robert Kleinberg. 2008. Online linear optimization and adaptive routing. J. Comput. Syst. Sci. 74, 1 (Feb. 2008), 97--114. Google ScholarDigital Library
Mohammad Gheshlaghi Azar, Alessandro Lazaric, and Emma Brunskill. 2014. Online stochastic optimization under correlated bandit feedback. In Proceedings of the 31st International Conference on Machine Learning (ICML’14). 1557--1565. Google ScholarDigital Library
Moshe Babaioff, Shaddin Dughmi, Robert D. Kleinberg, and Aleksandrs Slivkins. 2015. Dynamic pricing with limited supply. ACM Trans. Econ. Comput. 3, 1 (2015), 4. Google ScholarDigital Library
Moshe Babaioff, Robert Kleinberg, and Aleksandrs Slivkins. 2015. Truthful mechanisms with implicit payment computation. J. ACM 62, 2 (2015), 10. Google ScholarDigital Library
Moshe Babaioff, Yogeshwer Sharma, and Aleksandrs Slivkins. 2014. Characterizing truthful multi-armed bandit mechanisms. SIAM J. Comput. 43, 1 (2014), 194--230.Google ScholarDigital Library
Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. 2018. Bandits with knapsacks. J. ACM 65, 3 (2018). Google ScholarDigital Library
Dirk Bergemann and Juuso Välimäki. 2006. Bandit problems. In The New Palgrave Dictionary of Economics, 2nd ed., Steven Durlauf and Larry Blume (Eds.). Macmillan Press.Google Scholar
Donald Berry and Bert Fristedt. 1985. Bandit Problems: Sequential Allocation of Experiments. Chapman 8 Hall.Google ScholarCross Ref
Donald A. Berry, Robert W. Chen, Alan Zame, David C. Heath, and Larry A. Shepp. 1997. Bandit problems with infinitely many arms. Ann. Stat. 25, 5 (1997), 2103--2116.Google ScholarCross Ref
Omar Besbes and Assaf Zeevi. 2009. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operat. Res. 57, 6 (2009), 1407--1420. Google ScholarDigital Library
Avrim Blum. 1997. Empirical support for winnow and weighted-majority-based algorithms: Results on a calendar scheduling domain. Mach. Learn. 26 (1997), 5--23. Google ScholarDigital Library
Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu. 2003. Online learning in online auctions. In Proceedings of the 14th ACM-SIAM Symposium on Discrete Algorithms (SODA’03). 202--204. Google ScholarDigital Library
Sébastien Bubeck and Nicolo Cesa-Bianchi. 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5, 1 (2012).Google ScholarCross Ref
Sébastien Bubeck and Rémi Munos. 2010. Open loop optimistic planning. In Proceedings of the 23rd Conference on Learning Theory (COLT’10). 477--489.Google Scholar
Sébastien Bubeck, Rémi Munos, Gilles Stoltz, and Csaba Szepesvári. 2008. Online optimization in X-armed bandits. In Proceedings of the 21st Advances in Neural Information Processing Systems (NIPS’08). 201--208. Google ScholarDigital Library
Sébastien Bubeck, Rémi Munos, Gilles Stoltz, and Csaba Szepesvari. 2011. Online optimization in X-armed bandits. J. Mach. Learn. Res. 12 (2011), 1587--1627. Google ScholarDigital Library
Sébastien Bubeck, Gilles Stoltz, and Jia Yuan Yu. 2011. Lipschitz bandits without the Lipschitz constant. In Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT’11). 144--158. Google ScholarDigital Library
Adam Bull. 2015. Adaptive-treed bandits. Bernoulli J. Stat. 21, 4 (2015), 2289--2307.Google ScholarCross Ref
G. Cantor. 1883. Über unendliche, lineare Punktmannichfaltigkeiten, 4. Math. Ann. 21 (1883), 51--58.Google ScholarCross Ref
Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. 1997. How to use expert advice. J. ACM 44, 3 (1997), 427--485. Google ScholarDigital Library
Nicolò Cesa-Bianchi and Gábor Lugosi. 2006. Prediction, Learning, and Games. Cambridge University Press. Google Scholar
Hubert T.-H. Chan, Anupam Gupta, Bruce M. Maggs, and Shuheng Zhou. 2005. On hierarchical routing in bounded-growth metrics. In Proceedings of the 16th ACM-SIAM Symposium on Discrete Algorithms (SODA’05). 762--771. Google ScholarDigital Library
Richard Cole and Lee-Ad Gottlieb. 2006. Searching dynamic point sets in spaces with bounded doubling dimension. In Proceedings of the 38th ACM Symposium on Theory of Computing (STOC’06). 574--583. Google ScholarDigital Library
Eric Cope. 2009. Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces. IEEE Trans. Auto. Control 54, 6 (2009), 1243--1253.Google ScholarCross Ref
Thomas M. Cover and Joy A. Thomas. 1991. Elements of Information Theory. John Wiley 8 Sons, New York. Google ScholarDigital Library
Varsha Dani, Thomas P. Hayes, and Sham Kakade. 2007. The price of bandit information for online optimization. In Proceedings of the 20th Advances in Neural Information Processing Systems (NIPS’07). Google ScholarDigital Library
Varsha Dani, Thomas P. Hayes, and Sham Kakade. 2008. Stochastic linear optimization under bandit feedback. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 355--366.Google Scholar
Thomas Desautels, Andreas Krause, and Joel Burdick. 2012. Parallelizing exploration-exploitation tradeoffs with Gaussian process bandit optimization. In Proceedings of the 29th International Conference on Machine Learning (ICML’12). Google ScholarDigital Library
Nikhil Devanur and Sham M. Kakade. 2009. The price of truthfulness for pay-per-click auctions. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC’09). 99--106. Google ScholarDigital Library
Abraham Flaxman, Adam Kalai, and H. Brendan McMahan. 2005. Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the 16th ACM-SIAM Symposium on Discrete Algorithms (SODA’05). 385--394. Google ScholarDigital Library
Christodoulos A. Floudas. 1999. Deterministic Global Optimization: Theory, Algorithms and Applications. Kluwer Academic Publishers.Google ScholarCross Ref
Yoav Freund, Robert E. Schapire, Yoram Singer, and Manfred K. Warmuth. 1997. Using and combining predictors that specialize. In Proceedings of the 29th ACM Symposium on Theory of Computing (STOC’97). 334--343. Google ScholarDigital Library
Aurélien Garivier and Olivier Cappé. 2011. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th Conference on Learning Theory (COLT’11).Google Scholar
E. N. Gilbert. 1952. A comparison of signalling alphabets. Bell Syst. Tech. J. 31 (May 1952), 504--522.Google ScholarCross Ref
John Gittins, Kevin Glazebrook, and Richard Weber. 2011. Multi-Armed Bandit Allocation Indices. John Wiley 8 Sons.Google Scholar
Anupam Gupta, Mike Dinitz, and Kanat Tangwongsan. 2007. Private communication.Google Scholar
Anupam Gupta, Robert Krauthgamer, and James R. Lee. 2003. Bounded geometries, fractals, and low--distortion embeddings. In Proceedings of the 44th IEEE Symposium on Foundations of Computer Science (FOCS’03). 534--543. Google ScholarDigital Library
Elad Hazan and Satyen Kale. 2011. Better algorithms for benign bandits. J. Mach. Learn. Res. 12 (2011), 1287--1311. Google ScholarDigital Library
Elad Hazan and Nimrod Megiddo. 2007. Online learning with prior information. In Proceedings of the 20th Conference on Learning Theory (COLT’07). 499--513. Google ScholarDigital Library
J. Heinonen. 2001. Lectures on Analysis on Metric Spaces. Springer-Verlag, New York.Google Scholar
Kirsten Hildrum, John Kubiatowicz, and Satish Rao. 2004. Object location in realistic networks. In Proceedings of the 16th ACM Symposium on Parallel Algorithms and Architectures (SPAA’04). 25--35. Google ScholarDigital Library
Chien-Ju Ho, Aleksandrs Slivkins, and Jennifer Wortman Vaughan. 2016. Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. Artific. Intell. Res. 55 (2016), 317--359. Google ScholarDigital Library
Junya Honda and Akimichi Takemura. 2010. An asymptotically optimal bandit algorithm for bounded support models. In Proceedings of the 23rd Conference on Learning Theory (COLT’10).Google Scholar
D. R. Karger and M. Ruhl. 2002. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the 34th ACM Symposium on Theory of Computing (STOC’02). 63--66. Google ScholarDigital Library
Jon Kleinberg, Aleksandrs Slivkins, and Tom Wexler. 2009. Triangulation and embedding using small sets of beacons. J. ACM 56, 6 (Sept. 2009). Google ScholarDigital Library
Robert Kleinberg. 2004. Nearly tight bounds for the continuum-armed bandit problem. In Proceedings of the 18th Advances in Neural Information Processing Systems (NIPS’04). Google ScholarDigital Library
Robert Kleinberg. 2005. Online Decision Problems with Large Strategy Sets. Ph.D. Dissertation. MIT. Google ScholarDigital Library
Robert Kleinberg, Alexandru Niculescu-Mizil, and Yogeshwer Sharma. 2008. Regret bounds for sleeping experts and bandits. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 425--436.Google Scholar
Robert Kleinberg and Aleksandrs Slivkins. 2010. Sharp dichotomies for regret minimization in metric spaces. In Proceedings of the 21st ACM-SIAM Symposium on Discrete Algorithms (SODA’10). Google ScholarDigital Library
Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. 2008. Multi-armed bandits in metric spaces. In Proceedings of the 40th ACM Symposium on Theory of Computing (STOC’08). 681--690. Google ScholarDigital Library
Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. 2008. Multi-Armed Bandits in Metric Spaces. Technical report. Retrieved from http://arxiv.org/abs/0809.4882.Google Scholar
Robert D. Kleinberg and Frank T. Leighton. 2003. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS’03). Google ScholarDigital Library
Levente Kocsis and Csaba Szepesvari. 2006. Bandit-based Monte-Carlo planning. In Proceedings of the 17th European Conference on Machine Learning (ECML’06). 282--293. Google ScholarDigital Library
Andreas Krause and Cheng Soon Ong. 2011. Contextual Gaussian process bandit optimization. In Proceedings of the 25th Advances in Neural Information Processing Systems (NIPS’11). 2447--2455. Google ScholarDigital Library
Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6 (1985), 4--22. Google ScholarDigital Library
Tyler Lu, Dávid Pál, and Martin Pál. 2010. Showing relevant ads via Lipschitz context multi-armed bandits. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS’10).Google Scholar
Stefan Magureanu, Richard Combes, and Alexandre Proutiere. 2014. Lipschitz bandits: Regret lower bound and optimal algorithms. In Proceedings of the 27th Conference on Learning Theory (COLT’14). 975--999.Google Scholar
Odalric-Ambrym Maillard and Rémi Munos. 2010. Online learning in adversarial Lipschitz environments. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’10). 305--320. Google ScholarDigital Library
Odalric-Ambrym Maillard and Rémi Munos. 2011. Adaptive bandits: Towards the best history-dependent strategy. In Proceedings of the 24th Conference on Learning Theory (COLT’11).Google Scholar
S. Mazurkiewicz and W. Sierpinski. 1920. Contribution à la topologie des ensembles dénombrables. Fund. Math. 1 (1920), 17--27.Google ScholarCross Ref
Manor Mendel and Sariel Har-Peled. 2005. Fast construction of nets in low dimensional metrics, and their applications. In Proceedings of the 21st ACM Symposium on Computational Geometry (SoCG’05). 150--158. Google ScholarDigital Library
Stanislav Minsker. 2013. Estimation of extreme values and associated level sets of a regression function via selective sampling. In Proceedings of the 26th Conference on Learning Theory (COLT’13). 105--121.Google Scholar
Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press. Google ScholarDigital Library
Rémi Munos. 2011. Optimistic optimization of a deterministic function without the knowledge of its smoothness. In Proceedings of the 25th Conference on Advances in Neural Information Processing Systems (NIPS’11). 783--791. Google ScholarDigital Library
Rémi Munos. 2014. From bandits to Monte-Carlo tree search: The optimistic principle applied to optimization and planning. Found. Trends Mach. Learn. 7, 1 (2014), 1--129.Google ScholarDigital Library
Rémi Munos and Pierre-Arnaud Coquelin. 2007. Bandit algorithms for tree search. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI’07).Google Scholar
Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti, and Vanja Josifovski. 2007. Bandits for taxonomies: A model-based approach. In Proceedings of the SIAM International Conference on Data Mining (SDM’07).Google ScholarCross Ref
Sandeep Pandey, Deepayan Chakrabarti, and Deepak Agarwal. 2007. Multi-armed bandit problems with dependent arms. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). Google ScholarDigital Library
Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 784--791. Google ScholarDigital Library
Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58 (1952), 527--535.Google ScholarCross Ref
Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. A metric for distributions with applications to image databases. Int. J. Comput. Vision 40, 2 (2000), 99--121.Google ScholarDigital Library
Manfred Schroeder. 1991. Fractal, Chaos and Power Laws: Minutes from an Infinite Paradise. W. H. Freeman and Co.Google Scholar
Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press. Google ScholarDigital Library
Aleksandrs Slivkins. 2007. Distance estimation and object location via rings of neighbors. Distributed Computing 19, 4 (Mar. 2007), 313--333. Google ScholarDigital Library
Aleksandrs Slivkins. 2007. Towards fast decentralized construction of locality-aware overlay networks. In Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing (PODC’07). 89--98. Google ScholarDigital Library
Aleksandrs Slivkins. 2011. Multi-armed bandits on implicit metric spaces. In Proceedings of the 25th Advances in Neural Information Processing Systems (NIPS’11). Google ScholarDigital Library
Aleksandrs Slivkins. 2014. Contextual bandits with similarity information. J. Mach. Learn. Res. 15, 1 (2014), 2533--2568. Google ScholarDigital Library
Aleksandrs Slivkins, Filip Radlinski, and Sreenivas Gollapudi. 2013. Ranked bandits in metric spaces: Learning optimally diverse rankings over large document collections. J. Mach. Learn. Res. 14 (Feb. 2013), 399--436. Google ScholarDigital Library
Aleksandrs Slivkins and Eli Upfal. 2008. Adapting to a changing environment: the Brownian restless bandits. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 343--354.Google Scholar
Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. 2010. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 1015--1022. Google ScholarDigital Library
Michel Talagrand. 2005. The Generic Chaining: Upper and Lower Bounds of Stochastic Processes. Springer.Google Scholar
Kunal Talwar. 2004. Bypassing the embedding: Algorithms for low-dimensional metrics. In Proceedings of the 36th ACM Symposium on Theory of Computing (STOC’04). 281--290. Google ScholarDigital Library
William R. Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3--4 (1933), 285--294.Google ScholarCross Ref
Michal Valko, Alexandra Carpentier, and Rémi Munos. 2013. Stochastic simultaneous optimistic optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 19--27. Google ScholarDigital Library
R. R. Varshamov. 1957. Estimate of the number of signals in error correcting codes. Doklady Akadamii Nauk 177 (1957), 739--741.Google Scholar
V. Vovk. 1998. A game of prediction with expert advice. J. Comput. Syst. Sci. 56, 2 (1998), 153--173. Google ScholarDigital Library
Yizao Wang, Jean-Yves Audibert, and Rémi Munos. 2008. Algorithms for infinitely many-armed bandits. In Advances in Neural Information Processing Systems. MIT Press, 1729--1736. Google ScholarDigital Library
Zizhuo Wang, Shiming Deng, and Yinyu Ye. 2014. Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Operat. Res. 62, 2 (2014), 318--331. Google ScholarDigital Library

Index Terms

Bandits and Experts in Metric Spaces
1. Theory of computation
  1. Design and analysis of algorithms
    1. Online algorithms
      1. Online learning algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Online learning theory
      2. Regret bounds

Recommendations

Multi-armed bandits in metric spaces
STOC '08: Proceedings of the fortieth annual ACM symposium on Theory of computing

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of $n$ trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is ...
Read More
Ranked bandits in metric spaces: learning diverse rankings over large document collections

Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical ...
Read More
Contextual bandits with similarity information

In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with this alternative. While the case of small strategy sets is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of the ACM Volume 66, Issue 4
Networking, Computational Complexity, Design and Analysis of Algorithms, Real Computation, Algorithms, Online Algorithms and Computer-aided Verification
August 2019
299 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/3338848
Editor:
Éva Tardos
Cornell University
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 May 2019
- Accepted: 1 November 2018
- Revised: 1 May 2018
- Received: 1 November 2015
Published in jacm Volume 66, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Lipschitz-continuity
Multi-armed bandits
covering dimension
metric spaces
online learning
regret
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 1,588
  Total Downloads
- Downloads (Last 12 months)437
- Downloads (Last 6 weeks)66
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Bandits and Experts in Metric Spaces

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Multi-armed bandits in metric spaces

Ranked bandits in metric spaces: learning diverse rankings over large document collections

Contextual bandits with similarity information