skip to main content
research-article
Public Access

Bandits and Experts in Metric Spaces

Authors Info & Claims
Published:31 May 2019Publication History
Skip Abstract Section

Abstract

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of trials to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is well understood, bandit problems with large strategy sets are still a topic of active investigation, motivated by practical applications, such as online auctions and web advertisement. The goal of such research is to identify broad and natural classes of strategy sets and payoff functions that enable the design of efficient solutions.

In this work, we study a general setting for the multi-armed bandit problem, in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric. We refer to this problem as the Lipschitz MAB problem. We present a solution for the multi-armed bandit problem in this setting. That is, for every metric space, we define an isometry invariant that bounds from below the performance of Lipschitz MAB algorithms for this metric space, and we present an algorithm that comes arbitrarily close to meeting this bound. Furthermore, our technique gives even better results for benign payoff functions. We also address the full-feedback (“best expert”) version of the problem, where after every round the payoffs from all arms are revealed.

References

  1. Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved algorithms for linear stochastic bandits. In Proceedings of the 25th Advances in Neural Information Processing Systems (NIPS’11). 2312--2320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jacob Abernethy, Elad Hazan, and Alexander Rakhlin. 2008. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 263--274.Google ScholarGoogle Scholar
  3. Ittai Abraham and Dahlia Malkhi. 2005. Name independent routing for growth bounded networks. In Proceedings of the 17th ACM Symposium on Parallel Algorithms and Architectures (SPAA). 49--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rajeev Agrawal. 1995. The continuum-armed bandit problem. SIAM J. Control Optimiz. 33, 6 (1995), 1926--1951. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, and Assaf Zeevi. 2016. A near-optimal exploration-exploitation approach for assortment selection. In Proceedings of the 17th ACM Conference on Economics and Computation (ACM EC’16). 599--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Shipra Agrawal and Nikhil R. Devanur. 2014. Bandits with concave rewards and convex knapsacks. In Proceedings of the 15th ACM Conference on Economics and Computation (ACM EC’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kareem Amin, Michael Kearns, and Umar Syed. 2011. Bandits, query learning, and the haystack dimension. In Proceedings of the 24th Conference on Learning Theory (COLT’11).Google ScholarGoogle Scholar
  8. J.-Y. Audibert and S. Bubeck. 2010. Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11 (2010), 2785--2836. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J.-Y. Audibert, R. Munos, and Cs. Szepesvári. 2009. Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoret. Comput. Sci. 410 (2009), 1876--1902. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3 (2002), 397--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47, 2--3 (2002), 235--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2002. The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32, 1 (2002), 48--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Peter Auer and Ronald Ortner. 2010. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica 61 (2010), 55--65.Google ScholarGoogle ScholarCross RefCross Ref
  14. Peter Auer, Ronald Ortner, and Csaba Szepesvári. 2007. Improved rates for the stochastic continuum-armed bandit problem. In Proceedings of the 20th Conference on Learning Theory (COLT’07). 454--468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Baruch Awerbuch and Robert Kleinberg. 2008. Online linear optimization and adaptive routing. J. Comput. Syst. Sci. 74, 1 (Feb. 2008), 97--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mohammad Gheshlaghi Azar, Alessandro Lazaric, and Emma Brunskill. 2014. Online stochastic optimization under correlated bandit feedback. In Proceedings of the 31st International Conference on Machine Learning (ICML’14). 1557--1565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Moshe Babaioff, Shaddin Dughmi, Robert D. Kleinberg, and Aleksandrs Slivkins. 2015. Dynamic pricing with limited supply. ACM Trans. Econ. Comput. 3, 1 (2015), 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Moshe Babaioff, Robert Kleinberg, and Aleksandrs Slivkins. 2015. Truthful mechanisms with implicit payment computation. J. ACM 62, 2 (2015), 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Moshe Babaioff, Yogeshwer Sharma, and Aleksandrs Slivkins. 2014. Characterizing truthful multi-armed bandit mechanisms. SIAM J. Comput. 43, 1 (2014), 194--230.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. 2018. Bandits with knapsacks. J. ACM 65, 3 (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dirk Bergemann and Juuso Välimäki. 2006. Bandit problems. In The New Palgrave Dictionary of Economics, 2nd ed., Steven Durlauf and Larry Blume (Eds.). Macmillan Press.Google ScholarGoogle Scholar
  22. Donald Berry and Bert Fristedt. 1985. Bandit Problems: Sequential Allocation of Experiments. Chapman 8 Hall.Google ScholarGoogle ScholarCross RefCross Ref
  23. Donald A. Berry, Robert W. Chen, Alan Zame, David C. Heath, and Larry A. Shepp. 1997. Bandit problems with infinitely many arms. Ann. Stat. 25, 5 (1997), 2103--2116.Google ScholarGoogle ScholarCross RefCross Ref
  24. Omar Besbes and Assaf Zeevi. 2009. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operat. Res. 57, 6 (2009), 1407--1420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Avrim Blum. 1997. Empirical support for winnow and weighted-majority-based algorithms: Results on a calendar scheduling domain. Mach. Learn. 26 (1997), 5--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu. 2003. Online learning in online auctions. In Proceedings of the 14th ACM-SIAM Symposium on Discrete Algorithms (SODA’03). 202--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sébastien Bubeck and Nicolo Cesa-Bianchi. 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5, 1 (2012).Google ScholarGoogle ScholarCross RefCross Ref
  28. Sébastien Bubeck and Rémi Munos. 2010. Open loop optimistic planning. In Proceedings of the 23rd Conference on Learning Theory (COLT’10). 477--489.Google ScholarGoogle Scholar
  29. Sébastien Bubeck, Rémi Munos, Gilles Stoltz, and Csaba Szepesvári. 2008. Online optimization in X-armed bandits. In Proceedings of the 21st Advances in Neural Information Processing Systems (NIPS’08). 201--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sébastien Bubeck, Rémi Munos, Gilles Stoltz, and Csaba Szepesvari. 2011. Online optimization in X-armed bandits. J. Mach. Learn. Res. 12 (2011), 1587--1627. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sébastien Bubeck, Gilles Stoltz, and Jia Yuan Yu. 2011. Lipschitz bandits without the Lipschitz constant. In Proceedings of the 22nd International Conference on Algorithmic Learning Theory (ALT’11). 144--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Adam Bull. 2015. Adaptive-treed bandits. Bernoulli J. Stat. 21, 4 (2015), 2289--2307.Google ScholarGoogle ScholarCross RefCross Ref
  33. G. Cantor. 1883. Über unendliche, lineare Punktmannichfaltigkeiten, 4. Math. Ann. 21 (1883), 51--58.Google ScholarGoogle ScholarCross RefCross Ref
  34. Nicolò Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, and Manfred K. Warmuth. 1997. How to use expert advice. J. ACM 44, 3 (1997), 427--485. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Nicolò Cesa-Bianchi and Gábor Lugosi. 2006. Prediction, Learning, and Games. Cambridge University Press. Google ScholarGoogle Scholar
  36. Hubert T.-H. Chan, Anupam Gupta, Bruce M. Maggs, and Shuheng Zhou. 2005. On hierarchical routing in bounded-growth metrics. In Proceedings of the 16th ACM-SIAM Symposium on Discrete Algorithms (SODA’05). 762--771. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Richard Cole and Lee-Ad Gottlieb. 2006. Searching dynamic point sets in spaces with bounded doubling dimension. In Proceedings of the 38th ACM Symposium on Theory of Computing (STOC’06). 574--583. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Eric Cope. 2009. Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces. IEEE Trans. Auto. Control 54, 6 (2009), 1243--1253.Google ScholarGoogle ScholarCross RefCross Ref
  39. Thomas M. Cover and Joy A. Thomas. 1991. Elements of Information Theory. John Wiley 8 Sons, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Varsha Dani, Thomas P. Hayes, and Sham Kakade. 2007. The price of bandit information for online optimization. In Proceedings of the 20th Advances in Neural Information Processing Systems (NIPS’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Varsha Dani, Thomas P. Hayes, and Sham Kakade. 2008. Stochastic linear optimization under bandit feedback. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 355--366.Google ScholarGoogle Scholar
  42. Thomas Desautels, Andreas Krause, and Joel Burdick. 2012. Parallelizing exploration-exploitation tradeoffs with Gaussian process bandit optimization. In Proceedings of the 29th International Conference on Machine Learning (ICML’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Nikhil Devanur and Sham M. Kakade. 2009. The price of truthfulness for pay-per-click auctions. In Proceedings of the 10th ACM Conference on Electronic Commerce (EC’09). 99--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Abraham Flaxman, Adam Kalai, and H. Brendan McMahan. 2005. Online convex optimization in the bandit setting: Gradient descent without a gradient. In Proceedings of the 16th ACM-SIAM Symposium on Discrete Algorithms (SODA’05). 385--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Christodoulos A. Floudas. 1999. Deterministic Global Optimization: Theory, Algorithms and Applications. Kluwer Academic Publishers.Google ScholarGoogle ScholarCross RefCross Ref
  46. Yoav Freund, Robert E. Schapire, Yoram Singer, and Manfred K. Warmuth. 1997. Using and combining predictors that specialize. In Proceedings of the 29th ACM Symposium on Theory of Computing (STOC’97). 334--343. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Aurélien Garivier and Olivier Cappé. 2011. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th Conference on Learning Theory (COLT’11).Google ScholarGoogle Scholar
  48. E. N. Gilbert. 1952. A comparison of signalling alphabets. Bell Syst. Tech. J. 31 (May 1952), 504--522.Google ScholarGoogle ScholarCross RefCross Ref
  49. John Gittins, Kevin Glazebrook, and Richard Weber. 2011. Multi-Armed Bandit Allocation Indices. John Wiley 8 Sons.Google ScholarGoogle Scholar
  50. Anupam Gupta, Mike Dinitz, and Kanat Tangwongsan. 2007. Private communication.Google ScholarGoogle Scholar
  51. Anupam Gupta, Robert Krauthgamer, and James R. Lee. 2003. Bounded geometries, fractals, and low--distortion embeddings. In Proceedings of the 44th IEEE Symposium on Foundations of Computer Science (FOCS’03). 534--543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Elad Hazan and Satyen Kale. 2011. Better algorithms for benign bandits. J. Mach. Learn. Res. 12 (2011), 1287--1311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Elad Hazan and Nimrod Megiddo. 2007. Online learning with prior information. In Proceedings of the 20th Conference on Learning Theory (COLT’07). 499--513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. J. Heinonen. 2001. Lectures on Analysis on Metric Spaces. Springer-Verlag, New York.Google ScholarGoogle Scholar
  55. Kirsten Hildrum, John Kubiatowicz, and Satish Rao. 2004. Object location in realistic networks. In Proceedings of the 16th ACM Symposium on Parallel Algorithms and Architectures (SPAA’04). 25--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Chien-Ju Ho, Aleksandrs Slivkins, and Jennifer Wortman Vaughan. 2016. Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. J. Artific. Intell. Res. 55 (2016), 317--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Junya Honda and Akimichi Takemura. 2010. An asymptotically optimal bandit algorithm for bounded support models. In Proceedings of the 23rd Conference on Learning Theory (COLT’10).Google ScholarGoogle Scholar
  58. D. R. Karger and M. Ruhl. 2002. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the 34th ACM Symposium on Theory of Computing (STOC’02). 63--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Jon Kleinberg, Aleksandrs Slivkins, and Tom Wexler. 2009. Triangulation and embedding using small sets of beacons. J. ACM 56, 6 (Sept. 2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Robert Kleinberg. 2004. Nearly tight bounds for the continuum-armed bandit problem. In Proceedings of the 18th Advances in Neural Information Processing Systems (NIPS’04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Robert Kleinberg. 2005. Online Decision Problems with Large Strategy Sets. Ph.D. Dissertation. MIT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Robert Kleinberg, Alexandru Niculescu-Mizil, and Yogeshwer Sharma. 2008. Regret bounds for sleeping experts and bandits. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 425--436.Google ScholarGoogle Scholar
  63. Robert Kleinberg and Aleksandrs Slivkins. 2010. Sharp dichotomies for regret minimization in metric spaces. In Proceedings of the 21st ACM-SIAM Symposium on Discrete Algorithms (SODA’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. 2008. Multi-armed bandits in metric spaces. In Proceedings of the 40th ACM Symposium on Theory of Computing (STOC’08). 681--690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Robert Kleinberg, Aleksandrs Slivkins, and Eli Upfal. 2008. Multi-Armed Bandits in Metric Spaces. Technical report. Retrieved from http://arxiv.org/abs/0809.4882.Google ScholarGoogle Scholar
  66. Robert D. Kleinberg and Frank T. Leighton. 2003. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS’03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Levente Kocsis and Csaba Szepesvari. 2006. Bandit-based Monte-Carlo planning. In Proceedings of the 17th European Conference on Machine Learning (ECML’06). 282--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Andreas Krause and Cheng Soon Ong. 2011. Contextual Gaussian process bandit optimization. In Proceedings of the 25th Advances in Neural Information Processing Systems (NIPS’11). 2447--2455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6 (1985), 4--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Tyler Lu, Dávid Pál, and Martin Pál. 2010. Showing relevant ads via Lipschitz context multi-armed bandits. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS’10).Google ScholarGoogle Scholar
  71. Stefan Magureanu, Richard Combes, and Alexandre Proutiere. 2014. Lipschitz bandits: Regret lower bound and optimal algorithms. In Proceedings of the 27th Conference on Learning Theory (COLT’14). 975--999.Google ScholarGoogle Scholar
  72. Odalric-Ambrym Maillard and Rémi Munos. 2010. Online learning in adversarial Lipschitz environments. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’10). 305--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Odalric-Ambrym Maillard and Rémi Munos. 2011. Adaptive bandits: Towards the best history-dependent strategy. In Proceedings of the 24th Conference on Learning Theory (COLT’11).Google ScholarGoogle Scholar
  74. S. Mazurkiewicz and W. Sierpinski. 1920. Contribution à la topologie des ensembles dénombrables. Fund. Math. 1 (1920), 17--27.Google ScholarGoogle ScholarCross RefCross Ref
  75. Manor Mendel and Sariel Har-Peled. 2005. Fast construction of nets in low dimensional metrics, and their applications. In Proceedings of the 21st ACM Symposium on Computational Geometry (SoCG’05). 150--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Stanislav Minsker. 2013. Estimation of extreme values and associated level sets of a regression function via selective sampling. In Proceedings of the 26th Conference on Learning Theory (COLT’13). 105--121.Google ScholarGoogle Scholar
  77. Michael Mitzenmacher and Eli Upfal. 2005. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Rémi Munos. 2011. Optimistic optimization of a deterministic function without the knowledge of its smoothness. In Proceedings of the 25th Conference on Advances in Neural Information Processing Systems (NIPS’11). 783--791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Rémi Munos. 2014. From bandits to Monte-Carlo tree search: The optimistic principle applied to optimization and planning. Found. Trends Mach. Learn. 7, 1 (2014), 1--129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Rémi Munos and Pierre-Arnaud Coquelin. 2007. Bandit algorithms for tree search. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI’07).Google ScholarGoogle Scholar
  81. Sandeep Pandey, Deepak Agarwal, Deepayan Chakrabarti, and Vanja Josifovski. 2007. Bandits for taxonomies: A model-based approach. In Proceedings of the SIAM International Conference on Data Mining (SDM’07).Google ScholarGoogle ScholarCross RefCross Ref
  82. Sandeep Pandey, Deepayan Chakrabarti, and Deepak Agarwal. 2007. Multi-armed bandit problems with dependent arms. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). 784--791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Herbert Robbins. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58 (1952), 527--535.Google ScholarGoogle ScholarCross RefCross Ref
  85. Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. A metric for distributions with applications to image databases. Int. J. Comput. Vision 40, 2 (2000), 99--121.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Manfred Schroeder. 1991. Fractal, Chaos and Power Laws: Minutes from an Infinite Paradise. W. H. Freeman and Co.Google ScholarGoogle Scholar
  87. Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Aleksandrs Slivkins. 2007. Distance estimation and object location via rings of neighbors. Distributed Computing 19, 4 (Mar. 2007), 313--333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Aleksandrs Slivkins. 2007. Towards fast decentralized construction of locality-aware overlay networks. In Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing (PODC’07). 89--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Aleksandrs Slivkins. 2011. Multi-armed bandits on implicit metric spaces. In Proceedings of the 25th Advances in Neural Information Processing Systems (NIPS’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Aleksandrs Slivkins. 2014. Contextual bandits with similarity information. J. Mach. Learn. Res. 15, 1 (2014), 2533--2568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Aleksandrs Slivkins, Filip Radlinski, and Sreenivas Gollapudi. 2013. Ranked bandits in metric spaces: Learning optimally diverse rankings over large document collections. J. Mach. Learn. Res. 14 (Feb. 2013), 399--436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Aleksandrs Slivkins and Eli Upfal. 2008. Adapting to a changing environment: the Brownian restless bandits. In Proceedings of the 21st Conference on Learning Theory (COLT’08). 343--354.Google ScholarGoogle Scholar
  94. Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. 2010. Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 1015--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Michel Talagrand. 2005. The Generic Chaining: Upper and Lower Bounds of Stochastic Processes. Springer.Google ScholarGoogle Scholar
  96. Kunal Talwar. 2004. Bypassing the embedding: Algorithms for low-dimensional metrics. In Proceedings of the 36th ACM Symposium on Theory of Computing (STOC’04). 281--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. William R. Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3--4 (1933), 285--294.Google ScholarGoogle ScholarCross RefCross Ref
  98. Michal Valko, Alexandra Carpentier, and Rémi Munos. 2013. Stochastic simultaneous optimistic optimization. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 19--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. R. R. Varshamov. 1957. Estimate of the number of signals in error correcting codes. Doklady Akadamii Nauk 177 (1957), 739--741.Google ScholarGoogle Scholar
  100. V. Vovk. 1998. A game of prediction with expert advice. J. Comput. Syst. Sci. 56, 2 (1998), 153--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Yizao Wang, Jean-Yves Audibert, and Rémi Munos. 2008. Algorithms for infinitely many-armed bandits. In Advances in Neural Information Processing Systems. MIT Press, 1729--1736. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Zizhuo Wang, Shiming Deng, and Yinyu Ye. 2014. Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. Operat. Res. 62, 2 (2014), 318--331. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bandits and Experts in Metric Spaces

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Journal of the ACM
          Journal of the ACM  Volume 66, Issue 4
          Networking, Computational Complexity, Design and Analysis of Algorithms, Real Computation, Algorithms, Online Algorithms and Computer-aided Verification
          August 2019
          299 pages
          ISSN:0004-5411
          EISSN:1557-735X
          DOI:10.1145/3338848
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 31 May 2019
          • Accepted: 1 November 2018
          • Revised: 1 May 2018
          • Received: 1 November 2015
          Published in jacm Volume 66, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format