Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning

Authors:
Jun Feng

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Heng Li

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Minlie Huang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Shichen Liu

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Wenwu Ou

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Zhirong Wang

Alibaba Group, Hangzhou, China

Alibaba Group, Hangzhou, China
View Profile

,
Xiaoyan Zhu

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

WWW '18: Proceedings of the 2018 World Wide Web ConferenceApril 2018Pages 1939–1948https://doi.org/10.1145/3178876.3186165

Published:10 April 2018Publication History

WWW '18: Proceedings of the 2018 World Wide Web Conference

Pages 1939–1948

ABSTRACT

Ranking is a fundamental and widely studied problem in scenarios such as search, advertising, and recommendation. However, joint optimization for multi-scenario ranking, which aims to improve the overall performance of several ranking strategies in different scenarios, is rather untouched. Separately optimizing each individual strategy has two limitations. The first one is lack of collaboration between scenarios meaning that each strategy maximizes its own objective but ignores the goals of other strategies, leading to a sub-optimal overall performance. The second limitation is the inability of modeling the correlation between scenarios meaning that independent optimization in one scenario only uses its own user data but ignores the context in other scenarios. In this paper, we formulate multi-scenario ranking as a fully cooperative, partially observable, multi-agent sequential decision problem. We propose a novel model named Multi-Agent Recurrent Deterministic Policy Gradient (MA-RDPG) which has a communication component for passing messages, several private actors (agents) for making actions for ranking, and a centralized critic for evaluating the overall performance of the co-working actors. Each scenario is treated as an agent (actor). Agents collaborate with each other by sharing a global action-value function (the critic) and passing messages that encodes historical information across scenarios. The model is evaluated with online settings on a large E-commerce platform. Results show that the proposed model exhibits significant improvements against baselines in terms of the overall performance.

References

Leif Azzopardi and Guido Zuccon. 2016. Advances in formal models of search and search behaviour Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval. ACM, 1--4. Google ScholarDigital Library
Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning. ACM, 89--96. Google ScholarDigital Library
Christopher J Burges, Robert Ragno, and Quoc V Le. 2007. Learning to rank with nonsmooth cost functions. In NIPS. 193--200. Google ScholarDigital Library
Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2), 2008 (2008). Google ScholarDigital Library
Lucian Busoniu, Robert Babuvska, and Bart De Schutter. 2010. Multi-agent reinforcement learning: An overview. Innovations in multi-agent systems and applications-1 Vol. 310 (2010), 183--221.Google Scholar
Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach ICML. ACM, 129--136. Google ScholarDigital Library
Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In the 21st ACM SIGIR. ACM, 335--336. Google ScholarDigital Library
Jack Clark. Retrieved 28 October2015. Google Turning Its Lucrative Web Search Over to AI Machines. Bloomberg Business (. Retrieved 28 October2015).Google Scholar
Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI Vol. 1998 (1998), 746--752. Google ScholarDigital Library
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In ACM Conference on Recommender Systems. 191--198. Google ScholarDigital Library
Carl Davidson and Raymond Deneckere. 1986. Long-run competition in capacity, short-run competition in price, and the Cournot model. The Rand Journal of Economics (1986), 404--415.Google Scholar
Jakob Foerster, Yannis M Assael, Nando de Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning Advances in Neural Information Processing Systems. 2137--2145. Google ScholarDigital Library
Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2017. Counterfactual Multi-Agent Policy Gradients. arXiv preprint arXiv:1705.08926 (2017).Google Scholar
Xiubo Geng, Tie-Yan Liu, Tao Qin, and Hang Li. 2007. Feature selection for ranking. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 407--414. Google ScholarDigital Library
Fredric C Gey. 1994. Inferring probability of relevance using the method of logistic regression Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Springer-Verlag New York, Inc., 222--231. Google ScholarDigital Library
Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. 2017. Cooperative multiagent control using deep reinforcement learning Proceedings of the Adaptive and Learning Agents workshop (at AAMAS 2017).Google Scholar
Saurabh Gupta, Sayan Pathak, and Bivas Mitra. 2015. Complementary Usage of Tips and Reviews for Location Recommendation in Yelp. Springer International Publishing. 1003--1003 pages.Google Scholar
Matthew Hausknecht and Peter Stone. 2015. Deep recurrent q-learning for partially observable mdps. (2015).Google Scholar
Jiyin He, Vera Hollink, and Arjen de Vries. 2012. Combining implicit and explicit topic representations for result diversification the 35th ACM SIGIR. ACM, 851--860. Google ScholarDigital Library
Nicolas Heess, Gregory Wayne, David Silver, Tim Lillicrap, Tom Erez, and Yuval Tassa. 2015. Learning continuous control policies by stochastic value gradients NIPS. 2944--2952. Google ScholarDigital Library
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation Vol. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
Junling Hu, Michael P Wellman, et almbox.. 1998. Multiagent reinforcement learning: theoretical framework and an algorithm. ICML, Vol. Vol. 98. 242--250. Google ScholarDigital Library
Shubhra Kanti Karmaker Santu, Parikshit Sondhi, and ChengXiang Zhai. 2017. On Application of Learning to Rank for E-Commerce Search Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 475--484. Google ScholarDigital Library
Krishnaram Kenthapadi, Krishnaram Kenthapadi, and Krishnaram Kenthapadi. 2017. LiJAR: A System for Job Application Redistribution towards Efficient Career Marketplace ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1397--1406. Google ScholarDigital Library
Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. 1008--1014. Google ScholarDigital Library
Martin Lauer and Martin Riedmiller. 2000. An algorithm for distributed reinforcement learning in cooperative multi-agent systems In Proceedings of the Seventeenth International Conference on Machine Learning. Citeseer. Google ScholarDigital Library
Hang Li. 2014. Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies Vol. 7, 3 (2014), 1--121. Google ScholarCross Ref
Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 297--306. Google ScholarDigital Library
Ping Li, Qiang Wu, and Christopher J Burges. 2008. Mcrank: Learning to rank using multiple classification and gradient boosting Advances in neural information processing systems. 897--904. Google ScholarDigital Library
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).Google Scholar
Michael L Littman. 1994. Markov games as a framework for multi-agent reinforcement learning Proceedings of the eleventh international conference on machine learning, Vol. Vol. 157. 157--163. Google ScholarDigital Library
Shichen Liu, Fei Xiao, Wenwu Ou, and Luo Si. 2017. Cascade Ranking for Operational E-commerce Search. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017. 1557--1565. Google ScholarDigital Library
Tie-Yan Liu et almbox.. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval Vol. 3, 3 (2009), 225--331. Google ScholarDigital Library
Craig Macdonald, Rodrygo LT Santos, and Iadh Ounis. 2013. The whens and hows of learning to rank for web search. Information Retrieval Vol. 16, 5 (2013), 584--628. Google ScholarDigital Library
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et almbox.. 2015. Human-level control through deep reinforcement learning. Nature Vol. 518, 7540 (2015), 529--533.Google Scholar
Liviu Panait and Sean Luke. 2005. Cooperative multi-agent learning: The state of the art. Autonomous agents and multi-agent systems Vol. 11, 3 (2005), 387--434. Google ScholarDigital Library
Tao Qin, Xu-Dong Zhang, De-Sheng Wang, Tie-Yan Liu, Wei Lai, and Hang Li. 2007. Ranking with multiple hyperplanes. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 279--286. Google ScholarDigital Library
Juan C Santamar'ıa, Richard S Sutton, and Ashwin Ram. 1997. Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive behavior Vol. 6, 2 (1997), 163--217. Google ScholarDigital Library
Bichen Shi, Georgiana Ifrim, and Neil Hurley. 2016. Learning-to-rank for real-time high-precision hashtag recommendation for streaming news. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1191--1202. Google ScholarDigital Library
Yue Shi, Martha Larson, and Alan Hanjalic. 2010. List-wise learning to rank with matrix factorization for collaborative filtering. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 269--272. Google ScholarDigital Library
Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et almbox.. 2017. Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296 (2017).Google Scholar
Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. Vol. 1. MIT press Cambridge. Google ScholarDigital Library
Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation NIPS. 1057--1063. Google ScholarDigital Library
Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. Vol. 1. IEEE, I--I.Google Scholar
Lidan Wang, Jimmy Lin, and Donald Metzler. 2010 a. Learning to efficiently rank. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 138--145. Google ScholarDigital Library
Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 105--114. Google ScholarDigital Library
Lidan Wang, Donald Metzler, and Jimmy Lin. 2010 b. Ranking under temporal constraints. In Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 79--88. Google ScholarDigital Library
Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning Vol. 8, 3--4 (1992), 279--292.Google Scholar
Long Xia, Jun Xu, Yanyan Lan, Jiafeng Guo, Wei Zeng, and Xueqi Cheng. 2017. Adapting Markov Decision Process for Search Result Diversification Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7--11, 2017. 535--544. Google ScholarDigital Library
Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly, Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, et almbox.. 2016. Ranking relevance in yahoo search. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 323--332. Google ScholarDigital Library

Index Terms

Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Reinforcement learning
        Multi-agent reinforcement learning

Recommendations

Multi-agent deep reinforcement learning: a survey
Abstract
The advances in reinforcement learning have recorded sublime success in various domains. Although the multi-agent domain has been overshadowed by its single-agent counterpart during this progress, multi-agent reinforcement learning gains rapid ...
Read More
Collaborative Reinforcement Learning Model for Sustainability of Cooperation in Sequential Social Dilemmas
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

Learning the emergence of cooperation in conflicting scenarios such as social dilemmas is a centerpiece of research. Many reinforcement learning based theories exist in the literature to address this problem. The well-known fact about RL based model's ...
Read More
Social Reinforcement Learning for Changing Environments
WI-IAT '10: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02

If we imagine a dynamic environment whose behavior may change in time we can figure out the difficulties that agents located there will have trying to solve problems related to this environment. Changes in an environment e.g. a market, can be quite ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '18: Proceedings of the 2018 World Wide Web Conference
April 2018
2000 pages
ISBN:9781450356398
General Chairs:
Pierre-Antoine Champin
Universitè Claude Bernard Lyon 1, France
,
Fabien Gandon
Inria, Université Côte d'Azur, CNRS, I3S, France
,
Lionel Médini
Université Claude Bernard Lyon 1, France
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Panagiotis G. Ipeirotis
New York University, USA
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 10 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
joint optimization
learning to rank
multi-agent learning
reinforcement learning
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '18 Paper Acceptance Rate170of1,155submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 44
  Total Citations
  View Citations
- 1,841
  Total Downloads
- Downloads (Last 12 months)230
- Downloads (Last 6 weeks)39
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning

WWW '18: Proceedings of the 2018 World Wide Web Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-agent deep reinforcement learning: a survey

Collaborative Reinforcement Learning Model for Sustainability of Cooperation in Sequential Social Dilemmas

Social Reinforcement Learning for Changing Environments