short-paper

Open Access

Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

Authors:
Dalin Guo

UC San Diego, USA

UC San Diego, USA
View Profile

,
Sofia Ira Ktena

Twitter, United Kingdom

Twitter, United Kingdom
View Profile

,
Pranay Kumar Myana

Twitter, United kingdom

Twitter, United kingdom
View Profile

,
Ferenc Huszar

Twitter, United Kingdom

Twitter, United Kingdom
View Profile

,
Wenzhe Shi

Twitter, United Kingdom

Twitter, United Kingdom
View Profile

,
Alykhan Tejani

Twitter, United Kingdom

Twitter, United Kingdom
View Profile

,
Michael Kneier

Twitter, USA

Twitter, USA
View Profile

,
Sourav Das

Twitter, USA

Twitter, USA
View Profile

RecSys '20: Proceedings of the 14th ACM Conference on Recommender SystemsSeptember 2020Pages 456–461https://doi.org/10.1145/3383313.3412214

Published:22 September 2020Publication History

RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems

Pages 456–461

ABSTRACT

Recommender systems trained in a continuous learning fashion are plagued by the feedback loop problem, also known as algorithmic bias. This causes a newly trained model to act greedily and favor items that have already been engaged by users. This behavior is particularly harmful in personalised ads recommendations, as it can also cause new campaigns to remain unexplored. Exploration aims to address this limitation by providing new information about the environment, which encompasses user preference, and can lead to higher long-term reward. In this work, we formulate a display advertising recommender as a contextual bandit and implement exploration techniques that require sampling from the posterior distribution of click-through-rates in a computationally tractable manner. Traditional large-scale deep learning models do not provide uncertainty estimates by default. We approximate these uncertainty measurements of the predictions by employing a bootstrapped model with multiple heads and dropout units. We benchmark a number of different models in an offline simulation environment using a publicly available dataset of user-ads engagements. We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting with large-scale production traffic, where we demonstrate a positive gain of our exploration model.

References

Shipra Agrawal and Navin Goyal. 2013. Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning. 127–135.Google ScholarDigital Library
Peter Auer. 2002. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3, Nov (2002), 397–422.Google ScholarDigital Library
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2-3 (2002), 235–256.Google ScholarDigital Library
Léon Bottou, Jonas Peters, Joaquin Quiñonero-Candela, Denis X Charles, D Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research 14, 1 (2013), 3207–3260.Google ScholarDigital Library
Allison JB Chaney, Brandon M Stewart, and Barbara E Engelhardt. 2018. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. In Proceedings of the 12th ACM Conference on Recommender Systems. 224–232.Google ScholarDigital Library
Olivier Chapelle. 2014. Modeling delayed feedback in display advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1097–1105.Google ScholarDigital Library
Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 456–464.Google ScholarDigital Library
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. ACM, 7–10.Google ScholarDigital Library
Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. 208–214.Google Scholar
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191–198.Google ScholarDigital Library
Terry Dielman, Cynthia Lowry, and Roger Pfaffenberger. 1994. A comparison of quantile estimators. Communications in Statistics-Simulation and Computation 23, 2(1994), 355–371.Google ScholarCross Ref
Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. 1050–1059.Google ScholarDigital Library
Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline a/b testing for recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 198–206.Google ScholarDigital Library
Sofia Ira Ktena, Alykhan Tejani, Lucas Theis, Pranay Kumar Myana, Deepak Dilipkumar, Ferenc Huszar, Steven Yoo, and Wenzhe Shi. 2019. Addressing delayed feedback for continuous training with neural networks in CTR prediction. In Proceedings of the 13th ACM Conference on Recommender Systems. ACM, 187–195.Google ScholarDigital Library
Tze Leung Lai and Herbert Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics 6, 1 (1985), 4–22.Google Scholar
Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. In Advances in neural information processing systems. 6402–6413.Google Scholar
Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661–670.Google ScholarDigital Library
Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining. 297–306.Google ScholarDigital Library
Feng Liu, Ruiming Tang, Xutao Li, Weinan Zhang, Yunming Ye, Haokun Chen, Huifeng Guo, and Yuzhou Zhang. 2018. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027(2018).Google Scholar
James McInerney, Benjamin Lacker, Samantha Hansen, Karl Higley, Hugues Bouchard, Alois Gruson, and Rishabh Mehrotra. 2018. Explore, exploit, and explain: personalizing explainable recommendations with bandits. In Proceedings of the 12th ACM Conference on Recommender Systems. 31–39.Google ScholarDigital Library
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.Google Scholar
RM Neal. 1995. Bayesian learning for neural networks [PhD thesis]. Toronto, Ontario, Canada: Department of Computer Science, University of Toronto (1995).Google Scholar
Ian Osband, Charles Blundell, Alexander Pritzel, and Benjamin Van Roy. 2016. Deep exploration via bootstrapped DQN. In Advances in neural information processing systems. 4026–4034.Google Scholar
Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. 2008. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th international conference on Machine learning. 784–791.Google ScholarDigital Library
Carlos Riquelme, George Tucker, and Jasper Snoek. 2018. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling. International Conference on Learning Representations, ICLR.Google Scholar
Giorgio Roffo and Alessandro Vinciarelli. 2016. Personality in computational advertising: A benchmark. In 4 th Workshop on Emotions and Personality in Personalized Systems (EMPIRE).Google Scholar
Jasper Snoek, Yaniv Ovadia, Emily Fertig, Balaji Lakshminarayanan, Sebastian Nowozin, D Sculley, Joshua Dillon, Jie Ren, and Zachary Nado. 2019. Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems. 13969–13980.Google Scholar
Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Mr Prabhat, and Ryan Adams. 2015. Scalable bayesian optimization using deep neural networks. In International conference on machine learning. 2171–2180.Google ScholarDigital Library
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929–1958.Google ScholarDigital Library
William R Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3/4 (1933), 285–294.Google ScholarCross Ref
Ruoxi Wang, Bin Fu, Gang Fu, and Mingliang Wang. 2017. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17. 1–7.Google ScholarDigital Library
Xinxi Wang, Yi Wang, David Hsu, and Ye Wang. 2014. Exploration in interactive personalized music recommendation: a reinforcement learning approach. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 11, 1 (2014), 1–22.Google ScholarDigital Library
Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR) 52, 1 (2019), 1–38.Google ScholarDigital Library
Xiangyu Zhao, Long Xia, Liang Zhang, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2018. Deep reinforcement learning for page-wise recommendations. In Proceedings of the 12th ACM Conference on Recommender Systems. 95–103.Google ScholarDigital Library
Li Zhou and Emma Brunskill. 2016. Latent contextual bandits and their application to personalized recommendations for new users. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. 3646–3653.Google Scholar

Deep Bayesian Bandits: Exploring in Online Personalized Recommendations
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

A contextual-bandit approach to personalized news article recommendation
WWW '10: Proceedings of the 19th international conference on World wide web

Personalized web services strive to adapt their services (advertisements, news articles, etc.) to individual users by making use of both content and user information. Despite a few recent advances, this problem remains challenging for at least two ...
Read More
Personalized Recommendation via Parameter-Free Contextual Bandits
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Personalized recommendation services have gained increasing popularity and attention in recent years as most useful information can be accessed online in real-time. Most online recommender systems try to address the information needs of users by virtue ...
Read More
A Hybrid Multi-criteria Semantic-Enhanced Collaborative Filtering Approach for Personalized Recommendations
WI-IAT '11: Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Recommender systems aim to assist web users to find only relevant information to their needs rather than an undifferentiated mass of information. Collaborative filtering (CF) techniques are probably the most popular and widely adopted techniques in ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems
September 2020
796 pages
ISBN:9781450375832
DOI:10.1145/3383313

Copyright © 2020 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 September 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Algorithmic bias
Contextual bandit
Recommender Systems
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate254of1,295submissions,20%
Upcoming Conference
RecSys '24

Sponsor:

sigchi

18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 2,993
  Total Downloads
- Downloads (Last 12 months)538
- Downloads (Last 6 weeks)101
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems

ABSTRACT

References

Cited By

Recommendations

A contextual-bandit approach to personalized news article recommendation

Personalized Recommendation via Parameter-Free Contextual Bandits

A Hybrid Multi-criteria Semantic-Enhanced Collaborative Filtering Approach for Personalized Recommendations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

RecSys '20: Proceedings of the 14th ACM Conference on Recommender Systems

ABSTRACT

References

Cited By

Recommendations

A contextual-bandit approach to personalized news article recommendation

Personalized Recommendation via Parameter-Free Contextual Bandits

A Hybrid Multi-criteria Semantic-Enhanced Collaborative Filtering Approach for Personalized Recommendations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media