skip to main content
10.1145/3477495.3531791acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Curriculum Learning for Dense Retrieval Distillation

Published:07 July 2022Publication History

ABSTRACT

Recent work has shown that more effective dense retrieval models can be obtained by distilling ranking knowledge from an existing base re-ranking model. In this paper, we propose a generic curriculum learning based optimization framework called CL-DRD that controls the difficulty level of training data produced by the re-ranking (teacher) model. CL-DRD iteratively optimizes the dense retrieval (student) model by increasing the difficulty of the knowledge distillation data made available to it. In more detail, we initially provide the student model coarse-grained preference pairs between documents in the teacher's ranking, and progressively move towards finer-grained pairwise document ordering requirements. In our experiments, we apply a simple implementation of the CL-DRD framework to enhance two state-of-the-art dense retrieval models. Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.

Skip Supplemental Material Section

Supplemental Material

cl-drd.mp4

mp4

8.3 MB

References

  1. Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML'09.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Christopher J. C. Burges. 2010. From RankNet to LambdaRank to LambdaMART: An Overview.Google ScholarGoogle Scholar
  3. Daniel Fernando Campos, Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, Li Deng, and Bhaskar Mitra. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. 30th Conference on Neural Information Processing Systems, NIPS (2016).Google ScholarGoogle Scholar
  4. Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2019. Overview of the TREC 2019 Deep Learning Track. In TREC.Google ScholarGoogle Scholar
  5. Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2020. Overview of the TREC 2020 Deep Learning Track. In TREC.Google ScholarGoogle Scholar
  6. Zhuyun Dai and Jamie Callan. 2020. Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval abs/1910.10687 (2020).Google ScholarGoogle Scholar
  7. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT (2019).Google ScholarGoogle Scholar
  8. Jeffrey L. Elman. 1993. Learning and development in neural networks: the importance of starting small. Cognition 48 (1993), 71--99.Google ScholarGoogle ScholarCross RefCross Ref
  9. Luyu Gao and Jamie Callan. 2021. Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval. ArXiv abs/2108.05540 (2021).Google ScholarGoogle Scholar
  10. Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Allan Hanbury 2020. Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation. ArXiv abs/2010.02666 (2020).Google ScholarGoogle Scholar
  11. Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy J. Lin, and Allan Hanbury. 2021. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Vladimir Karpukhin, Sewon Min, Patrick Lewis, Ledell Yu Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP (2020).Google ScholarGoogle ScholarCross RefCross Ref
  13. O. Khattab and Matei A. Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. 3rd International Conference for Learning Representations, ICLR (2015).Google ScholarGoogle Scholar
  15. Kai A. Krueger and Peter Dayan. 2009. Flexible shaping: How learning in small steps helps. Cognition 110 (2009), 380--394.Google ScholarGoogle ScholarCross RefCross Ref
  16. Sheng-Chieh Lin, Jheng-Hong Yang, and Jimmy J. Lin. 2021. Distilling Dense Representations for Ranking using Tightly-Coupled Teachers. Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021) (2021).Google ScholarGoogle Scholar
  17. Y Luan, Jacob Eisenstein, Kristina Toutanova, and Michael Collins. 2021. Sparse, Dense, and Attentional Representations for Text Retrieval. Transactions of the Association for Computational Linguistics, TACL (2021).Google ScholarGoogle Scholar
  18. Sean MacAvaney, Franco Maria Nardini, R. Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. 2020. Training Curricula for Open Domain Answer Re-Ranking. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2020. Teacher--Student Curriculum Learning. IEEE Transactions on Neural Networks and Learning Systems 31 (2020), 3732--3740.Google ScholarGoogle ScholarCross RefCross Ref
  20. Rodrigo Nogueira. 2019. From doc2query to docTTTTTquery.Google ScholarGoogle Scholar
  21. Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. ArXiv abs/1901.04085 (2019).Google ScholarGoogle Scholar
  22. Gustavo Penha and Claudia Hauff. 2020. Curriculum Learning Strategies for IR: An Empirical Study on Conversation Response Ranking. In Proceedings of the 42nd European Conference on IR Research (ECIR '20). Springer-Verlag, Berlin, Heidelberg, 699--713.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Prafull Prakash, Julian Killingback, and Hamed Zamani. 2021. Learning Robust Dense Retrieval Models from Incomplete Relevance Labels. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yifan Qiao, Chenyan Xiong, Zhenghao Liu, and Zhiyuan Liu. 2019. Understanding the Behaviors of BERT in Ranking. ArXiv abs/1904.07531 (2019).Google ScholarGoogle Scholar
  25. Yingqi Qu, Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. 2021. RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL.Google ScholarGoogle ScholarCross RefCross Ref
  26. Ruiyang Ren, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Qiaoqiao She, Hua Wu, Haifeng Wang, and Ji-Rong Wen. 2021. RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP (2021).Google ScholarGoogle ScholarCross RefCross Ref
  27. Stephen Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. 1995. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC-3). Gaithersburg, MD: NIST, 109--126.Google ScholarGoogle Scholar
  28. Gerard Salton, A. Wong, and Chung-Shu Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18 (1975), 613--620.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXivabs/1910.01108 (2019).Google ScholarGoogle Scholar
  30. Keshav Santhanam, O. Khattab, Jon Saad-Falcon, Christopher Potts, and Matei A. Zaharia. 2021. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. ArXiv abs/2112.01488 (2021).Google ScholarGoogle Scholar
  31. Lilian Weng. 2020. Curriculum for Reinforcement Learning. lilianweng.github.io/lil-log (2020). https://lilianweng.github.io/lil-log/2020/01/29/curriculum-for-reinforcement-learning.htmlGoogle ScholarGoogle Scholar
  32. Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. 9th International Conference on Learning Representations, ICLR (2021).Google ScholarGoogle Scholar
  33. Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, and Shaoping Ma. 2021. Optimizing Dense Retrieval Model Training with Hard Negatives. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Curriculum Learning for Dense Retrieval Distillation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
        July 2022
        3569 pages
        ISBN:9781450387323
        DOI:10.1145/3477495

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 July 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader