skip to main content
10.1145/3269206.3271784acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Open Access

The LambdaLoss Framework for Ranking Metric Optimization

Published:17 October 2018Publication History

ABSTRACT

How to optimize ranking metrics such as Normalized Discounted Cumulative Gain (NDCG) is an important but challenging problem, because ranking metrics are either flat or discontinuous everywhere, which makes them hard to be optimized directly. Among existing approaches, LambdaRank is a novel algorithm that incorporates ranking metrics into its learning procedure. Though empirically effective, it still lacks theoretical justification. For example, the underlying loss that LambdaRank optimizes for remains unknown until now. Due to this, there is no principled way to advance the LambdaRank algorithm further. In this paper, we present LambdaLoss, a probabilistic framework for ranking metric optimization. We show that LambdaRank is a special configuration with a well-defined loss in the LambdaLoss framework, and thus provide theoretical justification for it. More importantly, the LambdaLoss framework allows us to define metric-driven loss functions that have clear connection to different ranking metrics. We show a few cases in this paper and evaluate them on three publicly available data sets. Experimental results show that our metric-driven loss functions can significantly improve the state-of-the-art learning-to-rank algorithms.

References

  1. Ricardo A. Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval .Addison-Wesley Longman Publishing Co., Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ralph A. Bradley and Milton E. Terry. 1952. The Rank Analysis of Incomplete Block Designs -- I. The Method of Paired Comparisons. Biometrika , Vol. 39 (1952), 324--345.Google ScholarGoogle Scholar
  3. Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proc. of the 22nd International Conference on Machine Learning (ICML). 89--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Christopher J.C. Burges. 2010. From RankNet to LambdaRank to LambdaMART: An Overview . Technical Report MSR-TR-2010--82. Microsoft Research.Google ScholarGoogle Scholar
  5. Christopher J. C. Burges, Robert Ragno, and Quoc Viet Le. 2006. Learning to Rank with Nonsmooth Cost Functions. In Proc. of the 20th Annual Conference on Neural Information Processing Systems (NIPS). 193--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to rank: from pairwise approach to listwise approach. In Proc. of the 24th International Conference on Machine Learning (ICML). 129--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Olivier Chapelle and Yi Chang. 2011. Yahoo! learning to rank challenge overview. In Proceedings of the Learning to Rank Challenge . 1--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Olivier Chapelle and Mingrui Wu. 2010. Gradient descent optimization of smoothed information retrieval metrics. Inf. Retr. , Vol. 13, 3 (2010), 216--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Wei Chu and Zoubin Ghahramani. 2005. Preference learning with Gaussian processes. In Proc. of the 22nd International Conference on Machine Learning (ICML). 137--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological) , Vol. 39, 1 (1977), 1--38.Google ScholarGoogle ScholarCross RefCross Ref
  11. Pinar Donmez, Krysta M. Svore, and Christopher J.C. Burges. 2009. On the Local Optimality of LambdaRank. In Proc. of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 460--467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yoav Freund and Robert E. Schapire. 1997. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. , Vol. 55, 1 (1997), 119--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics , Vol. 29, 5 (2001), 1189--1232.Google ScholarGoogle ScholarCross RefCross Ref
  14. Norbert Fuhr. 1989. Optimum polynomial retrieval functions based on the probability ranking principle. ACM Transactions on Information Systems (TOIS) , Vol. 7, 3 (1989), 183--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fredric C Gey. 1994. Inferring probability of relevance using the method of logistic regression. In Proc. of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 222--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kalervo J"a rvelin and Jaana Kek"a l"a inen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems , Vol. 20, 4 (2002), 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Thorsten Joachims. 2002. Optimizing Search Engines Using Clickthrough Data. In Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 133--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel. 2017. Unbiased Learning-to-Rank with Biased Feedback. In Proc. of the 10th ACM International Conference on Web Search and Data Mining (WSDM). 781--789. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N.L. Johnson, S. Kotz, and N. Balakrishnan. 1995. Continuous univariate distributions . Number v. 2 in Wiley series in probability and mathematical statistics: Applied probability and statistics. Wiley & Sons.Google ScholarGoogle Scholar
  20. Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. 3149--3157.Google ScholarGoogle Scholar
  21. Quoc V. Le and Alexander J. Smola. 2007. Direct Optimization of Ranking Measures. CoRR , Vol. abs/0704.3359 (2007).Google ScholarGoogle Scholar
  22. Ping Li, Christopher J. C. Burges, and Qiang Wu. 2007. McRank: Learning to Rank Using Multiple Classification and Gradient Boosting. In Proc. of the 21st Annual Conference on Neural Information Processing Systems (NIPS). 897--904. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tie-Yan Liu. 2011. Learning to Rank for Information Retrieval .Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Donald Metzler and W Bruce Croft. 2007. Linear feature-based models for information retrieval. Information Retrieval , Vol. 10, 3 (2007), 257--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Pritish Mohapatra, Michal Rolinek, C. V. Jawahar, Vladimir Kolmogorov, and M. Pawan Kumar. 2018. Efficient Optimization for Rank-based Loss Functions. In IEEE Conference on Computer Vision and Pattern Recognition .Google ScholarGoogle Scholar
  26. Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 Datasets. CoRR , Vol. abs/1306.2597 (2013).Google ScholarGoogle Scholar
  27. Tao Qin, Tie-Yan Liu, and Hang Li. 2010. A general approximation framework for direct optimization of information retrieval measures. Information Retrieval , Vol. 13, 4 (2010), 375--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Christian P. Robert and George Casella. 2005. Monte Carlo Statistical Methods (Springer Texts in Statistics) .Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zhengya Sun, Tao Qin, Qing Tao, and Jue Wang. 2009. Robust Sparse Rank Learning for Non-smooth Ranking Measures. In Proc. of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 259--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Martin Szummer and Emine Yilmaz. 2011. Semi-supervised Learning to Rank with Preference Regularization. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM). 269--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ming Tan, Tian Xia, Lily Guo, and Shaojun Wang. 2013. Direct Optimization of Ranking Measures for Learning to Rank Models. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '13). 856--864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Niek Tax, Sander Bockting, and Djoerd Hiemstra. 2015. A cross-benchmark comparison of 87 learning to rank methods. Information Processing & Management , Vol. 51, 6 (2015), 757--772. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Michael Taylor, John Guiver, Stephen Robertson, and Tom Minka. 2008. Softrank: optimizing non-smooth rank metrics. In Proc. of the 2008 International Conference on Web Search and Data Mining (WSDM). 77--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Michael Taylor, Hugo Zaragoza, Nick Craswell, Stephen Robertson, and Chris Burges. 2006. Optimisation methods for ranking functions with multiple parameters. In Proc. of the 15th ACM International Conference on Information and Knowledge Management (CIKM). 585--593. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, and Yasemin Altun. 2004. Support Vector Machine Learning for Interdependent and Structured Output Spaces. In Proc. of the 21st International Conference on Machine Learning (ICML). 104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Hamed Valizadegan, Rong Jin, Ruofei Zhang, and Jianchang Mao. 2009. Learning to Rank by Optimizing NDCG Measure. In Proc. of the 22nd International Conference on Neural Information Processing Systems (NIPS). 1883--1891. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ellen M. Voorhees and Donna K. Harman (Eds.). 1999. Proc. of The Eighth Text REtrieval Conference, TREC. Vol. Special Publication 500--246. National Institute of Standards and Technology (NIST) .Google ScholarGoogle Scholar
  38. Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork. 2018. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. In Proc. of the 11th Conference on Web Search and Data Mining (WSDM). 610--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise Approach to Learning to Rank: Theory and Algorithm. In Proceedings of the 25th International Conference on Machine Learning (ICML '08). 1192--1199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jun Xu and Hang Li. 2007. AdaRank: A Boosting Algorithm for Information Retrieval. In Proc. of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 391--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Yisong Yue, Thomas Finley, Filip Radlinski, and Thorsten Joachims. 2007. A support vector method for optimizing average precision. In Proc. of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 271--278. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The LambdaLoss Framework for Ranking Metric Optimization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
      October 2018
      2362 pages
      ISBN:9781450360142
      DOI:10.1145/3269206

      Copyright © 2018 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 October 2018

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '18 Paper Acceptance Rate147of826submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader