ABSTRACT
Ranking is a fundamental and widely studied problem in scenarios such as search, advertising, and recommendation. However, joint optimization for multi-scenario ranking, which aims to improve the overall performance of several ranking strategies in different scenarios, is rather untouched. Separately optimizing each individual strategy has two limitations. The first one is lack of collaboration between scenarios meaning that each strategy maximizes its own objective but ignores the goals of other strategies, leading to a sub-optimal overall performance. The second limitation is the inability of modeling the correlation between scenarios meaning that independent optimization in one scenario only uses its own user data but ignores the context in other scenarios. In this paper, we formulate multi-scenario ranking as a fully cooperative, partially observable, multi-agent sequential decision problem. We propose a novel model named Multi-Agent Recurrent Deterministic Policy Gradient (MA-RDPG) which has a communication component for passing messages, several private actors (agents) for making actions for ranking, and a centralized critic for evaluating the overall performance of the co-working actors. Each scenario is treated as an agent (actor). Agents collaborate with each other by sharing a global action-value function (the critic) and passing messages that encodes historical information across scenarios. The model is evaluated with online settings on a large E-commerce platform. Results show that the proposed model exhibits significant improvements against baselines in terms of the overall performance.
- Leif Azzopardi and Guido Zuccon. 2016. Advances in formal models of search and search behaviour Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval. ACM, 1--4. Google ScholarDigital Library
- Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning. ACM, 89--96. Google ScholarDigital Library
- Christopher J Burges, Robert Ragno, and Quoc V Le. 2007. Learning to rank with nonsmooth cost functions. In NIPS. 193--200. Google ScholarDigital Library
- Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2), 2008 (2008). Google ScholarDigital Library
- Lucian Busoniu, Robert Babuvska, and Bart De Schutter. 2010. Multi-agent reinforcement learning: An overview. Innovations in multi-agent systems and applications-1 Vol. 310 (2010), 183--221.Google Scholar
- Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach ICML. ACM, 129--136. Google ScholarDigital Library
- Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In the 21st ACM SIGIR. ACM, 335--336. Google ScholarDigital Library
- Jack Clark. Retrieved 28 October2015. Google Turning Its Lucrative Web Search Over to AI Machines. Bloomberg Business (. Retrieved 28 October2015).Google Scholar
- Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI Vol. 1998 (1998), 746--752. Google ScholarDigital Library
- Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In ACM Conference on Recommender Systems. 191--198. Google ScholarDigital Library
- Carl Davidson and Raymond Deneckere. 1986. Long-run competition in capacity, short-run competition in price, and the Cournot model. The Rand Journal of Economics (1986), 404--415.Google Scholar
- Jakob Foerster, Yannis M Assael, Nando de Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning Advances in Neural Information Processing Systems. 2137--2145. Google ScholarDigital Library
- Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2017. Counterfactual Multi-Agent Policy Gradients. arXiv preprint arXiv:1705.08926 (2017).Google Scholar
- Xiubo Geng, Tie-Yan Liu, Tao Qin, and Hang Li. 2007. Feature selection for ranking. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 407--414. Google ScholarDigital Library
- Fredric C Gey. 1994. Inferring probability of relevance using the method of logistic regression Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Springer-Verlag New York, Inc., 222--231. Google ScholarDigital Library
- Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. 2017. Cooperative multiagent control using deep reinforcement learning Proceedings of the Adaptive and Learning Agents workshop (at AAMAS 2017).Google Scholar
- Saurabh Gupta, Sayan Pathak, and Bivas Mitra. 2015. Complementary Usage of Tips and Reviews for Location Recommendation in Yelp. Springer International Publishing. 1003--1003 pages.Google Scholar
- Matthew Hausknecht and Peter Stone. 2015. Deep recurrent q-learning for partially observable mdps. (2015).Google Scholar
- Jiyin He, Vera Hollink, and Arjen de Vries. 2012. Combining implicit and explicit topic representations for result diversification the 35th ACM SIGIR. ACM, 851--860. Google ScholarDigital Library
- Nicolas Heess, Gregory Wayne, David Silver, Tim Lillicrap, Tom Erez, and Yuval Tassa. 2015. Learning continuous control policies by stochastic value gradients NIPS. 2944--2952. Google ScholarDigital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation Vol. 9, 8 (1997), 1735--1780. Google ScholarDigital Library
- Junling Hu, Michael P Wellman, et almbox.. 1998. Multiagent reinforcement learning: theoretical framework and an algorithm. ICML, Vol. Vol. 98. 242--250. Google ScholarDigital Library
- Shubhra Kanti Karmaker Santu, Parikshit Sondhi, and ChengXiang Zhai. 2017. On Application of Learning to Rank for E-Commerce Search Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 475--484. Google ScholarDigital Library
- Krishnaram Kenthapadi, Krishnaram Kenthapadi, and Krishnaram Kenthapadi. 2017. LiJAR: A System for Job Application Redistribution towards Efficient Career Marketplace ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1397--1406. Google ScholarDigital Library
- Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. 1008--1014. Google ScholarDigital Library
- Martin Lauer and Martin Riedmiller. 2000. An algorithm for distributed reinforcement learning in cooperative multi-agent systems In Proceedings of the Seventeenth International Conference on Machine Learning. Citeseer. Google ScholarDigital Library
- Hang Li. 2014. Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies Vol. 7, 3 (2014), 1--121. Google ScholarCross Ref
- Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 297--306. Google ScholarDigital Library
- Ping Li, Qiang Wu, and Christopher J Burges. 2008. Mcrank: Learning to rank using multiple classification and gradient boosting Advances in neural information processing systems. 897--904. Google ScholarDigital Library
- Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).Google Scholar
- Michael L Littman. 1994. Markov games as a framework for multi-agent reinforcement learning Proceedings of the eleventh international conference on machine learning, Vol. Vol. 157. 157--163. Google ScholarDigital Library
- Shichen Liu, Fei Xiao, Wenwu Ou, and Luo Si. 2017. Cascade Ranking for Operational E-commerce Search. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017. 1557--1565. Google ScholarDigital Library
- Tie-Yan Liu et almbox.. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval Vol. 3, 3 (2009), 225--331. Google ScholarDigital Library
- Craig Macdonald, Rodrygo LT Santos, and Iadh Ounis. 2013. The whens and hows of learning to rank for web search. Information Retrieval Vol. 16, 5 (2013), 584--628. Google ScholarDigital Library
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et almbox.. 2015. Human-level control through deep reinforcement learning. Nature Vol. 518, 7540 (2015), 529--533.Google Scholar
- Liviu Panait and Sean Luke. 2005. Cooperative multi-agent learning: The state of the art. Autonomous agents and multi-agent systems Vol. 11, 3 (2005), 387--434. Google ScholarDigital Library
- Tao Qin, Xu-Dong Zhang, De-Sheng Wang, Tie-Yan Liu, Wei Lai, and Hang Li. 2007. Ranking with multiple hyperplanes. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 279--286. Google ScholarDigital Library
- Juan C Santamar'ıa, Richard S Sutton, and Ashwin Ram. 1997. Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive behavior Vol. 6, 2 (1997), 163--217. Google ScholarDigital Library
- Bichen Shi, Georgiana Ifrim, and Neil Hurley. 2016. Learning-to-rank for real-time high-precision hashtag recommendation for streaming news. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1191--1202. Google ScholarDigital Library
- Yue Shi, Martha Larson, and Alan Hanjalic. 2010. List-wise learning to rank with matrix factorization for collaborative filtering. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 269--272. Google ScholarDigital Library
- Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et almbox.. 2017. Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296 (2017).Google Scholar
- Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. Vol. 1. MIT press Cambridge. Google ScholarDigital Library
- Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation NIPS. 1057--1063. Google ScholarDigital Library
- Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. Vol. 1. IEEE, I--I.Google Scholar
- Lidan Wang, Jimmy Lin, and Donald Metzler. 2010 a. Learning to efficiently rank. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 138--145. Google ScholarDigital Library
- Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 105--114. Google ScholarDigital Library
- Lidan Wang, Donald Metzler, and Jimmy Lin. 2010 b. Ranking under temporal constraints. In Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 79--88. Google ScholarDigital Library
- Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning Vol. 8, 3--4 (1992), 279--292.Google Scholar
- Long Xia, Jun Xu, Yanyan Lan, Jiafeng Guo, Wei Zeng, and Xueqi Cheng. 2017. Adapting Markov Decision Process for Search Result Diversification Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7--11, 2017. 535--544. Google ScholarDigital Library
- Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly, Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, et almbox.. 2016. Ranking relevance in yahoo search. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 323--332. Google ScholarDigital Library
Index Terms
- Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning
Recommendations
Multi-agent deep reinforcement learning: a survey
AbstractThe advances in reinforcement learning have recorded sublime success in various domains. Although the multi-agent domain has been overshadowed by its single-agent counterpart during this progress, multi-agent reinforcement learning gains rapid ...
Collaborative Reinforcement Learning Model for Sustainability of Cooperation in Sequential Social Dilemmas
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent SystemsLearning the emergence of cooperation in conflicting scenarios such as social dilemmas is a centerpiece of research. Many reinforcement learning based theories exist in the literature to address this problem. The well-known fact about RL based model's ...
Social Reinforcement Learning for Changing Environments
WI-IAT '10: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02If we imagine a dynamic environment whose behavior may change in time we can figure out the difficulties that agents located there will have trying to solve problems related to this environment. Changes in an environment e.g. a market, can be quite ...
Comments