research-article

A contextual-bandit approach to personalized news article recommendation

Authors:
Lihong Li

Yahooo! Labs, Santa Clara, CA, USA

Yahooo! Labs, Santa Clara, CA, USA
View Profile

,
Wei Chu

Yahoo! Labs, Santa Clara, CA, USA

Yahoo! Labs, Santa Clara, CA, USA
View Profile

,
John Langford

Yahoo! Labs, New York, NY, USA

Yahoo! Labs, New York, NY, USA
View Profile

,
Robert E. Schapire

Princeton University, Princeton, NJ, USA

Princeton University, Princeton, NJ, USA
View Profile

WWW '10: Proceedings of the 19th international conference on World wide webApril 2010Pages 661–670https://doi.org/10.1145/1772690.1772758

Published:26 April 2010Publication History

WWW '10: Proceedings of the 19th international conference on World wide web

Pages 661–670

ABSTRACT

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two reasons. First, web service is featured with dynamically changing pools of content, rendering traditional collaborative filtering methods inapplicable. Second, the scale of most web services of practical interest calls for solutions that are both fast in learning and computation.

In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.

The contributions of this work are three-fold. First, we propose a new, general contextual bandit algorithm that is computationally efficient and well motivated from learning theory. Second, we argue that any bandit algorithm can be reliably evaluated offline using previously recorded random traffic. Finally, using this offline evaluation method, we successfully applied our new algorithm to a Yahoo! Front Page Today Module dataset containing over 33 million events. Results showed a 12.5% click lift compared to a standard context-free bandit algorithm, and the advantage becomes even greater when data gets more scarce.

References

N. Abe, A. W. Biermann, and P. M. Long. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37(4):263--293, 2003.Google ScholarDigital Library
D. Agarwal, B.-C. Chen, and P. Elango. Explore/exploit schemes for web content optimization. In Proc. of the 9th International Conf. on Data Mining, 2009. Google ScholarDigital Library
D. Agarwal, B.-C. Chen, P. Elango, N. Motgi, S.-T. Park, R. Ramakrishnan, S. Roy, and J. Zachariah. Online models for content optimization. In Advances in Neural Information Processing Systems 21, pages 17--24, 2009.Google ScholarDigital Library
R. Agrawal. Sample mean based index policies with o(log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27(4):1054--1078, 1995.Google ScholarCross Ref
A. Anagnostopoulos, A. Z. Broder, E. Gabrilovich, V. Josifovski, and L. Riedel. Just-in-time contextual advertising. In Proc. of the 16th ACM Conf. on Information and Knowledge Management, pages 331--340, 2007. Google ScholarDigital Library
P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397--422, 2002. Google ScholarDigital Library
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235--256, 2002. Google ScholarDigital Library
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48--77, 2002. Google ScholarDigital Library
D. A. Berry and B. Fristedt. Bandit Problems: Sequential Allocation of Experiments. Monographs on Statistics and Applied Probability. Chapman and Hall, 1985.Google Scholar
P. Brusilovsky, A. Kobsa, and W. Nejdl, editors. The Adaptive Web -- Methods and Strategies of Web Personalization, volume 4321 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2007. Google ScholarDigital Library
R. Burke. Hybrid systems for personalized recommendations. In B. Mobasher and S. S. Anand, editors, Intelligent Techniques for Web Personalization. Springer-Verlag, 2005. Google ScholarDigital Library
W. Chu and S.-T. Park. Personalized recommendation on dynamic content using predictive bilinear models. In Proc. of the 18th International Conf. on World Wide Web, pages 691--700, 2009. Google ScholarDigital Library
W. Chu, S.-T. Park, T. Beaupre, N. Motgi, A. Phadke, S. Chakraborty, and J. Zachariah. A case study of behavior-driven conjoint analysis on Yahoo!: Front Page Today Module. In Proc. of the 15th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, pages 1097--1104, 2009. Google ScholarDigital Library
A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: scalable online collaborative filtering. In Proc. of the 16th International World Wide Web Conf., 2007. Google ScholarDigital Library
J. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society. Series B (Methodological), 41:148--177, 1979.Google ScholarCross Ref
S. M. Kakade, S. Shalev-Shwartz, and A. Tewari. Efficient bandit algorithms for online multiclass prediction. In Proc. of the 25th International Conf. on Machine Learning, pages 440--447, 2008. Google ScholarDigital Library
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4--22, 1985.Google ScholarDigital Library
J. Langford and T. Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In Advances in Neural Information Processing Systems 20, 2008.Google Scholar
D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003. Google ScholarDigital Library
D. Mladenic. Text-learning and related intelligent agents: A survey. IEEE Intelligent Agents, pages 44--54, 1999. Google ScholarDigital Library
S.-T. Park, D. Pennock, O. Madani, N. Good, and D. DeCoste. Naïve filterbots for robust cold-start recommendations. In Proc. of the 12th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, pages 699--705, 2006. Google ScholarDigital Library
N. G. Pavlidis, D. K. Tasoulis, and D. J. Hand. Simulation studies of multi-armed bandits with covariates. In Proceedings on the 10th International Conf. on Computer Modeling and Simulation, pages 493--498, 2008. Google ScholarDigital Library
D. Precup, R. S. Sutton, and S. P. Singh. Eligibility traces for off-policy policy evaluation. In Proc. of the 17th Interational Conf. on Machine Learning, pages 759--766, 2000. Google ScholarDigital Library
H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527--535, 1952.Google ScholarCross Ref
J. B. Schafer, J. Konstan, and J. Riedi. Recommender systems in e-commerce. In Proc. of the 1st ACM Conf. on Electronic Commerce, 1999. Google ScholarDigital Library
W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285--294, 1933.Google ScholarCross Ref
T. J. Walsh, I. Szita, C. Diuk, and M. L. Littman. Exploring compact reinforcement-learning representations with linear regression. In Proc. of the 25th Conf. on Uncertainty in Artificial Intelligence, 2009. Google ScholarDigital Library

Index Terms

A contextual-bandit approach to personalized news article recommendation

Recommendations

Personalized Recommendation via Parameter-Free Contextual Bandits
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Personalized recommendation services have gained increasing popularity and attention in recent years as most useful information can be accessed online in real-time. Most online recommender systems try to address the information needs of users by virtue ...
Read More
Effects of Personalized and Aggregate Top-N Recommendation Lists on User Preference Ratings

Prior research has shown a robust effect of personalized product recommendations on user preference judgments for items. Specifically, the display of system-predicted preference ratings as item recommendations has been shown in multiple studies to bias ...
Read More
Personalized news recommendation: a review and an experimental investigation
Special issue on Community Analysis and Information Recommendation

Online news articles, as a new format of press releases, have sprung up on the Internet. With its convenience and recency, more and more people prefer to read news online instead of reading the paper-format press releases. However, a gigantic amount of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '10: Proceedings of the 19th international conference on World wide web
April 2010
1407 pages
ISBN:9781605587998
DOI:10.1145/1772690
General Chairs:
Michael Rappa
North Carolina State University, USA
,
Paul Jones
University of North Carolina at Chapel Hill, USA
,
Program Chairs:
Juliana Freire
University of Utah, USA
,
Soumen Chakrabarti
Indian Institute of Technology, India
Copyright © 2010 International World Wide Web Conference Committee (IW3C2)
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 April 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
contextual bandit
exploration/exploitation dilemma
personalization
recommender systems
web service
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,228
  Total Citations
  View Citations
- 8,817
  Total Downloads
- Downloads (Last 12 months)917
- Downloads (Last 6 weeks)137
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ePub

View this article in ePub.

View ePub

A contextual-bandit approach to personalized news article recommendation

WWW '10: Proceedings of the 19th international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Personalized Recommendation via Parameter-Free Contextual Bandits

Effects of Personalized and Aggregate Top-N Recommendation Lists on User Preference Ratings

Personalized news recommendation: a review and an experimental investigation