skip to main content
10.1145/3309697.3331515acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
extended-abstract
Public Access

Learning to Control Renewal Processes with Bandit Feedback

Published:20 June 2019Publication History

ABSTRACT

We consider a bandit problem with K task types from which the controller activates one task at a time. Each task takes a random and possibly heavy-tailed completion time, and a reward is obtained only after the task is completed. The task types are independent from each other, and have distinct and unknown distributions for completion time and reward. For a given time horizon τ, the goal of the controller is to schedule tasks adaptively so as to maximize the reward collected until τ expires. In addition, we allow the controller to interrupt a task and initiate a new one. In addition to the traditional exploration-exploitation dilemma, this interrupt mechanism introduces a new one: should the controller complete the task and get the reward, or interrupt the task for a possibly shorter and more rewarding alternative? We show that for all heavy-tailed and some light-tailed completion time distributions, this interruption mechanism improves the reward linearly over time. From a learning perspective, the interrupt mechanism necessitates implicitly learning statistics beyond the mean from truncated observations. For this purpose, we propose a robust learning algorithm named UCB-BwI based on the median-of-means estimator for possibly heavy-tailed reward and completion time distributions. We show that, in a K-armed bandit setting with an arbitrary set of L possible interrupt times, UCB-BwI achieves O(Kłog(τ)+KL) regret. We also prove that the regret under any admissible policy is Ømega(Kłog(τ)), which implies that UCB-BwI is order optimal.

References

  1. Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. 2013. Bandits with knapsacks. In Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on. IEEE, 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Albert-Laszlo Barabasi. 2005. The origin of bursts and heavy tails in human dynamics. Nature, Vol. 435, 7039 (2005), 207.Google ScholarGoogle ScholarCross RefCross Ref
  3. Sébastien Bubeck, Nicolo Cesa-Bianchi, and Gábor Lugosi. 2013. Bandits with heavy tail. IEEE Transactions on Information Theory, Vol. 59, 11 (2013), 7711--7717. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brian C Dean, Michel X Goemans, and Jan Vondrdk. 2004. Approximating the stochastic knapsack problem: The benefit of adaptivity. In 45th Annual IEEE Symposium on Foundations of Computer Science. IEEE, 208--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, Vol. 6, 1 (1985), 4--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jayakrishnan Nair, Adam Wierman, and Bert Zwart. 2013. The fundamentals of heavy-tails: properties, emergence, and identification. In ACM SIGMETRICS Performance Evaluation Review, Vol. 41. ACM, 387--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yingce Xia, Haifang Li, Tao Qin, Nenghai Yu, and Tie-Yan Liu. 2015. Thompson Sampling for Budgeted Multi-Armed Bandits.. In IJCAI. 3960--3966. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning to Control Renewal Processes with Bandit Feedback

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMETRICS '19: Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems
      June 2019
      113 pages
      ISBN:9781450366786
      DOI:10.1145/3309697

      Copyright © 2019 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 June 2019

      Check for updates

      Qualifiers

      • extended-abstract

      Acceptance Rates

      SIGMETRICS '19 Paper Acceptance Rate50of317submissions,16%Overall Acceptance Rate459of2,691submissions,17%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader