skip to main content
10.1145/3178876.3186165acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Free Access

Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning

Published:10 April 2018Publication History

ABSTRACT

Ranking is a fundamental and widely studied problem in scenarios such as search, advertising, and recommendation. However, joint optimization for multi-scenario ranking, which aims to improve the overall performance of several ranking strategies in different scenarios, is rather untouched. Separately optimizing each individual strategy has two limitations. The first one is lack of collaboration between scenarios meaning that each strategy maximizes its own objective but ignores the goals of other strategies, leading to a sub-optimal overall performance. The second limitation is the inability of modeling the correlation between scenarios meaning that independent optimization in one scenario only uses its own user data but ignores the context in other scenarios. In this paper, we formulate multi-scenario ranking as a fully cooperative, partially observable, multi-agent sequential decision problem. We propose a novel model named Multi-Agent Recurrent Deterministic Policy Gradient (MA-RDPG) which has a communication component for passing messages, several private actors (agents) for making actions for ranking, and a centralized critic for evaluating the overall performance of the co-working actors. Each scenario is treated as an agent (actor). Agents collaborate with each other by sharing a global action-value function (the critic) and passing messages that encodes historical information across scenarios. The model is evaluated with online settings on a large E-commerce platform. Results show that the proposed model exhibits significant improvements against baselines in terms of the overall performance.

References

  1. Leif Azzopardi and Guido Zuccon. 2016. Advances in formal models of search and search behaviour Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval. ACM, 1--4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd international conference on Machine learning. ACM, 89--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Christopher J Burges, Robert Ragno, and Quoc V Le. 2007. Learning to rank with nonsmooth cost functions. In NIPS. 193--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2), 2008 (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Lucian Busoniu, Robert Babuvska, and Bart De Schutter. 2010. Multi-agent reinforcement learning: An overview. Innovations in multi-agent systems and applications-1 Vol. 310 (2010), 183--221.Google ScholarGoogle Scholar
  6. Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach ICML. ACM, 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In the 21st ACM SIGIR. ACM, 335--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jack Clark. Retrieved 28 October2015. Google Turning Its Lucrative Web Search Over to AI Machines. Bloomberg Business (. Retrieved 28 October2015).Google ScholarGoogle Scholar
  9. Caroline Claus and Craig Boutilier. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI Vol. 1998 (1998), 746--752. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In ACM Conference on Recommender Systems. 191--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Carl Davidson and Raymond Deneckere. 1986. Long-run competition in capacity, short-run competition in price, and the Cournot model. The Rand Journal of Economics (1986), 404--415.Google ScholarGoogle Scholar
  12. Jakob Foerster, Yannis M Assael, Nando de Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning Advances in Neural Information Processing Systems. 2137--2145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. 2017. Counterfactual Multi-Agent Policy Gradients. arXiv preprint arXiv:1705.08926 (2017).Google ScholarGoogle Scholar
  14. Xiubo Geng, Tie-Yan Liu, Tao Qin, and Hang Li. 2007. Feature selection for ranking. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 407--414. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fredric C Gey. 1994. Inferring probability of relevance using the method of logistic regression Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. Springer-Verlag New York, Inc., 222--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. 2017. Cooperative multiagent control using deep reinforcement learning Proceedings of the Adaptive and Learning Agents workshop (at AAMAS 2017).Google ScholarGoogle Scholar
  17. Saurabh Gupta, Sayan Pathak, and Bivas Mitra. 2015. Complementary Usage of Tips and Reviews for Location Recommendation in Yelp. Springer International Publishing. 1003--1003 pages.Google ScholarGoogle Scholar
  18. Matthew Hausknecht and Peter Stone. 2015. Deep recurrent q-learning for partially observable mdps. (2015).Google ScholarGoogle Scholar
  19. Jiyin He, Vera Hollink, and Arjen de Vries. 2012. Combining implicit and explicit topic representations for result diversification the 35th ACM SIGIR. ACM, 851--860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Nicolas Heess, Gregory Wayne, David Silver, Tim Lillicrap, Tom Erez, and Yuval Tassa. 2015. Learning continuous control policies by stochastic value gradients NIPS. 2944--2952. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation Vol. 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Junling Hu, Michael P Wellman, et almbox.. 1998. Multiagent reinforcement learning: theoretical framework and an algorithm. ICML, Vol. Vol. 98. 242--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shubhra Kanti Karmaker Santu, Parikshit Sondhi, and ChengXiang Zhai. 2017. On Application of Learning to Rank for E-Commerce Search Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 475--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Krishnaram Kenthapadi, Krishnaram Kenthapadi, and Krishnaram Kenthapadi. 2017. LiJAR: A System for Job Application Redistribution towards Efficient Career Marketplace ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1397--1406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. 1008--1014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Martin Lauer and Martin Riedmiller. 2000. An algorithm for distributed reinforcement learning in cooperative multi-agent systems In Proceedings of the Seventeenth International Conference on Machine Learning. Citeseer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hang Li. 2014. Learning to rank for information retrieval and natural language processing. Synthesis Lectures on Human Language Technologies Vol. 7, 3 (2014), 1--121. Google ScholarGoogle ScholarCross RefCross Ref
  28. Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 297--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ping Li, Qiang Wu, and Christopher J Burges. 2008. Mcrank: Learning to rank using multiple classification and gradient boosting Advances in neural information processing systems. 897--904. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).Google ScholarGoogle Scholar
  31. Michael L Littman. 1994. Markov games as a framework for multi-agent reinforcement learning Proceedings of the eleventh international conference on machine learning, Vol. Vol. 157. 157--163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Shichen Liu, Fei Xiao, Wenwu Ou, and Luo Si. 2017. Cascade Ranking for Operational E-commerce Search. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017. 1557--1565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tie-Yan Liu et almbox.. 2009. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval Vol. 3, 3 (2009), 225--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Craig Macdonald, Rodrygo LT Santos, and Iadh Ounis. 2013. The whens and hows of learning to rank for web search. Information Retrieval Vol. 16, 5 (2013), 584--628. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et almbox.. 2015. Human-level control through deep reinforcement learning. Nature Vol. 518, 7540 (2015), 529--533.Google ScholarGoogle Scholar
  36. Liviu Panait and Sean Luke. 2005. Cooperative multi-agent learning: The state of the art. Autonomous agents and multi-agent systems Vol. 11, 3 (2005), 387--434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Tao Qin, Xu-Dong Zhang, De-Sheng Wang, Tie-Yan Liu, Wei Lai, and Hang Li. 2007. Ranking with multiple hyperplanes. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 279--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Juan C Santamar'ıa, Richard S Sutton, and Ashwin Ram. 1997. Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive behavior Vol. 6, 2 (1997), 163--217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Bichen Shi, Georgiana Ifrim, and Neil Hurley. 2016. Learning-to-rank for real-time high-precision hashtag recommendation for streaming news. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1191--1202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yue Shi, Martha Larson, and Alan Hanjalic. 2010. List-wise learning to rank with matrix factorization for collaborative filtering. In Proceedings of the fourth ACM conference on Recommender systems. ACM, 269--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, et almbox.. 2017. Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv preprint arXiv:1706.05296 (2017).Google ScholarGoogle Scholar
  42. Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. Vol. 1. MIT press Cambridge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation NIPS. 1057--1063. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Vol. Vol. 1. IEEE, I--I.Google ScholarGoogle Scholar
  45. Lidan Wang, Jimmy Lin, and Donald Metzler. 2010 a. Learning to efficiently rank. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 138--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Lidan Wang, Jimmy Lin, and Donald Metzler. 2011. A cascade ranking model for efficient ranked retrieval Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 105--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Lidan Wang, Donald Metzler, and Jimmy Lin. 2010 b. Ranking under temporal constraints. In Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 79--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning Vol. 8, 3--4 (1992), 279--292.Google ScholarGoogle Scholar
  49. Long Xia, Jun Xu, Yanyan Lan, Jiafeng Guo, Wei Zeng, and Xueqi Cheng. 2017. Adapting Markov Decision Process for Search Result Diversification Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, August 7--11, 2017. 535--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly, Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, et almbox.. 2016. Ranking relevance in yahoo search. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 323--332. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WWW '18: Proceedings of the 2018 World Wide Web Conference
        April 2018
        2000 pages
        ISBN:9781450356398

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        International World Wide Web Conferences Steering Committee

        Republic and Canton of Geneva, Switzerland

        Publication History

        • Published: 10 April 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        WWW '18 Paper Acceptance Rate170of1,155submissions,15%Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format