skip to main content
10.1145/1772690.1772758acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

A contextual-bandit approach to personalized news article recommendation

Published:26 April 2010Publication History

ABSTRACT

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation.

In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.

The contributions of this work are three-fold. First, we propose a new, general contextual bandit algorithm that is computationally efficient and well motivated from learning theory. Second, we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic. Finally, using this offline evaluation method, we successfully applied our new algorithm to a Yahoo! Front Page Today Module dataset containing over 33 million events. Results showed a 12.5% click lift compared to a standard context-free bandit algorithm, and the advantage becomes even greater when data gets more scarce.

References

  1. N. Abe, A. W. Biermann, and P. M. Long. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37(4):263--293, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Agarwal, B.-C. Chen, and P. Elango. Explore/exploit schemes for web content optimization. In Proc. of the 9th International Conf. on Data Mining, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Agarwal, B.-C. Chen, P. Elango, N. Motgi, S.-T. Park, R. Ramakrishnan, S. Roy, and J. Zachariah. Online models for content optimization. In Advances in Neural Information Processing Systems 21, pages 17--24, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Agrawal. Sample mean based index policies with o(log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27(4):1054--1078, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  5. A. Anagnostopoulos, A. Z. Broder, E. Gabrilovich, V. Josifovski, and L. Riedel. Just-in-time contextual advertising. In Proc. of the 16th ACM Conf. on Information and Knowledge Management, pages 331--340, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397--422, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235--256, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48--77, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. A. Berry and B. Fristedt. Bandit Problems: Sequential Allocation of Experiments. Monographs on Statistics and Applied Probability. Chapman and Hall, 1985.Google ScholarGoogle Scholar
  10. P. Brusilovsky, A. Kobsa, and W. Nejdl, editors. The Adaptive Web -- Methods and Strategies of Web Personalization, volume 4321 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Burke. Hybrid systems for personalized recommendations. In B. Mobasher and S. S. Anand, editors, Intelligent Techniques for Web Personalization. Springer-Verlag, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Chu and S.-T. Park. Personalized recommendation on dynamic content using predictive bilinear models. In Proc. of the 18th International Conf. on World Wide Web, pages 691--700, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Chu, S.-T. Park, T. Beaupre, N. Motgi, A. Phadke, S. Chakraborty, and J. Zachariah. A case study of behavior-driven conjoint analysis on Yahoo!: Front Page Today Module. In Proc. of the 15th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, pages 1097--1104, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In Proc. of the 16th International World Wide Web Conf., 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological), 41:148--177, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  16. S. M. Kakade, S. Shalev-Shwartz, and A. Tewari. Efficient bandit algorithms for online multiclass prediction. In Proc. of the 25th International Conf. on Machine Learning, pages 440--447, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4--22, 1985.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Langford and T. Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In Advances in Neural Information Processing Systems 20, 2008.Google ScholarGoogle Scholar
  19. D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Mladenic. Text-learning and related intelligent agents: A survey. IEEE Intelligent Agents, pages 44--54, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S.-T. Park, D. Pennock, O. Madani, N. Good, and D. DeCoste. Naïve filterbots for robust cold-start recommendations. In Proc. of the 12th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, pages 699--705, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. G. Pavlidis, D. K. Tasoulis, and D. J. Hand. Simulation studies of multi-armed bandits with covariates. In Proceedings on the 10th International Conf. on Computer Modeling and Simulation, pages 493--498, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Precup, R. S. Sutton, and S. P. Singh. Eligibility traces for off-policy policy evaluation. In Proc. of the 17th Interational Conf. on Machine Learning, pages 759--766, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527--535, 1952.Google ScholarGoogle ScholarCross RefCross Ref
  25. J. B. Schafer, J. Konstan, and J. Riedi. Recommender systems in e-commerce. In Proc. of the 1st ACM Conf. on Electronic Commerce, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285--294, 1933.Google ScholarGoogle ScholarCross RefCross Ref
  27. T. J. Walsh, I. Szita, C. Diuk, and M. L. Littman. Exploring compact reinforcement-learning representations with linear regression. In Proc. of the 25th Conf. on Uncertainty in Artificial Intelligence, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A contextual-bandit approach to personalized news article recommendation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          WWW '10: Proceedings of the 19th international conference on World wide web
          April 2010
          1407 pages
          ISBN:9781605587998
          DOI:10.1145/1772690

          Copyright © 2010 International World Wide Web Conference Committee (IW3C2)

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 April 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,899of8,196submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        ePub

        View this article in ePub.

        View ePub