short-paper

Curriculum Learning for Dense Retrieval Distillation

Authors:
Hansi Zeng

University of Massachusetts Amherst, Amherst, MA, USA

University of Massachusetts Amherst, Amherst, MA, USA
View Profile

,
Hamed Zamani

University of Massachusetts Amherst, Amherst, MA, USA

University of Massachusetts Amherst, Amherst, MA, USA
View Profile

,
Vishwa Vinay

Adobe Research, Bangalore, AA, India

Adobe Research, Bangalore, AA, India
View Profile

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information RetrievalJuly 2022Pages 1979–1983https://doi.org/10.1145/3477495.3531791

Published:07 July 2022Publication History

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1979–1983

ABSTRACT

Recent work has shown that more effective dense retrieval models can be obtained by distilling ranking knowledge from an existing base re-ranking model. In this paper, we propose a generic curriculum learning based optimization framework called CL-DRD that controls the difficulty level of training data produced by the re-ranking (teacher) model. CL-DRD iteratively optimizes the dense retrieval (student) model by increasing the difficulty of the knowledge distillation data made available to it. In more detail, we initially provide the student model coarse-grained preference pairs between documents in the teacher's ranking, and progressively move towards finer-grained pairwise document ordering requirements. In our experiments, we apply a simple implementation of the CL-DRD framework to enhance two state-of-the-art dense retrieval models. Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.

Supplemental Material

cl-drd.mp4

mp4

8.3 MB

Download

References

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML'09.Google ScholarDigital Library
Christopher J. C. Burges. 2010. From RankNet to LambdaRank to LambdaMART: An Overview.Google Scholar
Daniel Fernando Campos, Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, Li Deng, and Bhaskar Mitra. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. 30th Conference on Neural Information Processing Systems, NIPS (2016).Google Scholar
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2019. Overview of the TREC 2019 Deep Learning Track. In TREC.Google Scholar
Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2020. Overview of the TREC 2020 Deep Learning Track. In TREC.Google Scholar
Zhuyun Dai and Jamie Callan. 2020. Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval abs/1910.10687 (2020).Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT (2019).Google Scholar
Jeffrey L. Elman. 1993. Learning and development in neural networks: the importance of starting small. Cognition 48 (1993), 71--99.Google ScholarCross Ref
Luyu Gao and Jamie Callan. 2021. Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval. ArXiv abs/2108.05540 (2021).Google Scholar
Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Allan Hanbury 2020. Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation. ArXiv abs/2010.02666 (2020).Google Scholar
Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy J. Lin, and Allan Hanbury. 2021. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021).Google ScholarDigital Library
Vladimir Karpukhin, Sewon Min, Patrick Lewis, Ledell Yu Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP (2020).Google ScholarCross Ref
O. Khattab and Matei A. Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020).Google ScholarDigital Library
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. 3rd International Conference for Learning Representations, ICLR (2015).Google Scholar
Kai A. Krueger and Peter Dayan. 2009. Flexible shaping: How learning in small steps helps. Cognition 110 (2009), 380--394.Google ScholarCross Ref
Sheng-Chieh Lin, Jheng-Hong Yang, and Jimmy J. Lin. 2021. Distilling Dense Representations for Ranking using Tightly-Coupled Teachers. Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021) (2021).Google Scholar
Y Luan, Jacob Eisenstein, Kristina Toutanova, and Michael Collins. 2021. Sparse, Dense, and Attentional Representations for Text Retrieval. Transactions of the Association for Computational Linguistics, TACL (2021).Google Scholar
Sean MacAvaney, Franco Maria Nardini, R. Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. 2020. Training Curricula for Open Domain Answer Re-Ranking. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020).Google ScholarDigital Library
Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2020. Teacher--Student Curriculum Learning. IEEE Transactions on Neural Networks and Learning Systems 31 (2020), 3732--3740.Google ScholarCross Ref
Rodrigo Nogueira. 2019. From doc2query to docTTTTTquery.Google Scholar
Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. ArXiv abs/1901.04085 (2019).Google Scholar
Gustavo Penha and Claudia Hauff. 2020. Curriculum Learning Strategies for IR: An Empirical Study on Conversation Response Ranking. In Proceedings of the 42nd European Conference on IR Research (ECIR '20). Springer-Verlag, Berlin, Heidelberg, 699--713.Google ScholarDigital Library
Prafull Prakash, Julian Killingback, and Hamed Zamani. 2021. Learning Robust Dense Retrieval Models from Incomplete Relevance Labels. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021).Google ScholarDigital Library
Yifan Qiao, Chenyan Xiong, Zhenghao Liu, and Zhiyuan Liu. 2019. Understanding the Behaviors of BERT in Ranking. ArXiv abs/1904.07531 (2019).Google Scholar
Yingqi Qu, Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. 2021. RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL.Google ScholarCross Ref
Ruiyang Ren, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Qiaoqiao She, Hua Wu, Haifeng Wang, and Ji-Rong Wen. 2021. RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP (2021).Google ScholarCross Ref
Stephen Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. 1995. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC-3). Gaithersburg, MD: NIST, 109--126.Google Scholar
Gerard Salton, A. Wong, and Chung-Shu Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18 (1975), 613--620.Google ScholarDigital Library
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXivabs/1910.01108 (2019).Google Scholar
Keshav Santhanam, O. Khattab, Jon Saad-Falcon, Christopher Potts, and Matei A. Zaharia. 2021. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. ArXiv abs/2112.01488 (2021).Google Scholar
Lilian Weng. 2020. Curriculum for Reinforcement Learning. lilianweng.github.io/lil-log (2020). https://lilianweng.github.io/lil-log/2020/01/29/curriculum-for-reinforcement-learning.htmlGoogle Scholar
Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. 9th International Conference on Learning Representations, ICLR (2021).Google Scholar
Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, and Shaoping Ma. 2021. Optimizing Dense Retrieval Model Training with Hard Negatives. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021).Google ScholarDigital Library

Index Terms

Curriculum Learning for Dense Retrieval Distillation
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Retrieval models and ranking
      1. Learning to rank

Recommendations

PROD: Progressive Distillation for Dense Retrieval
WWW '23: Proceedings of the ACM Web Conference 2023

Knowledge distillation is an effective way to transfer knowledge from a strong teacher to an efficient student model. Ideally, we expect the better the teacher is, the better the student performs. However, this expectation does not always come true. It ...
Read More
Improving zero-shot retrieval using dense external expansion
Abstract
Pseudo-relevance feedback (PRF) is a classical technique to improve search engine retrieval effectiveness, by closing the vocabulary gap between users’ query formulations and the relevant documents. While PRF is typically applied on ...
Highlights
- Dense external expansion improves zero-shot retrieval performance.
- High quality ...
Read More
Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation
Advances in Information Retrieval
Abstract
Prior work on English monolingual retrieval has shown that a cross-encoder trained using a large number of relevance judgments for query-document pairs can be used as a teacher to train more efficient, but similarly effective, dual-encoder student ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2022
3569 pages
ISBN:9781450387323
DOI:10.1145/3477495
General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 July 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
curriculum learning
dense retrieval
knowledge distillation
neural ranking models
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 542
  Total Downloads
- Downloads (Last 12 months)190
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Curriculum Learning for Dense Retrieval Distillation

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

PROD: Progressive Distillation for Dense Retrieval

Improving zero-shot retrieval using dense external expansion

Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Curriculum Learning for Dense Retrieval Distillation

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

PROD: Progressive Distillation for Dense Retrieval

Improving zero-shot retrieval using dense external expansion

Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media