ABSTRACT
Recent work has shown that more effective dense retrieval models can be obtained by distilling ranking knowledge from an existing base re-ranking model. In this paper, we propose a generic curriculum learning based optimization framework called CL-DRD that controls the difficulty level of training data produced by the re-ranking (teacher) model. CL-DRD iteratively optimizes the dense retrieval (student) model by increasing the difficulty of the knowledge distillation data made available to it. In more detail, we initially provide the student model coarse-grained preference pairs between documents in the teacher's ranking, and progressively move towards finer-grained pairwise document ordering requirements. In our experiments, we apply a simple implementation of the CL-DRD framework to enhance two state-of-the-art dense retrieval models. Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.
Supplemental Material
- Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML'09.Google ScholarDigital Library
- Christopher J. C. Burges. 2010. From RankNet to LambdaRank to LambdaMART: An Overview.Google Scholar
- Daniel Fernando Campos, Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, Li Deng, and Bhaskar Mitra. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. 30th Conference on Neural Information Processing Systems, NIPS (2016).Google Scholar
- Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2019. Overview of the TREC 2019 Deep Learning Track. In TREC.Google Scholar
- Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2020. Overview of the TREC 2020 Deep Learning Track. In TREC.Google Scholar
- Zhuyun Dai and Jamie Callan. 2020. Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval abs/1910.10687 (2020).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT (2019).Google Scholar
- Jeffrey L. Elman. 1993. Learning and development in neural networks: the importance of starting small. Cognition 48 (1993), 71--99.Google ScholarCross Ref
- Luyu Gao and Jamie Callan. 2021. Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval. ArXiv abs/2108.05540 (2021).Google Scholar
- Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Allan Hanbury 2020. Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation. ArXiv abs/2010.02666 (2020).Google Scholar
- Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy J. Lin, and Allan Hanbury. 2021. Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021).Google ScholarDigital Library
- Vladimir Karpukhin, Sewon Min, Patrick Lewis, Ledell Yu Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. 2020. Dense Passage Retrieval for Open-Domain Question Answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP (2020).Google ScholarCross Ref
- O. Khattab and Matei A. Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020).Google ScholarDigital Library
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. 3rd International Conference for Learning Representations, ICLR (2015).Google Scholar
- Kai A. Krueger and Peter Dayan. 2009. Flexible shaping: How learning in small steps helps. Cognition 110 (2009), 380--394.Google ScholarCross Ref
- Sheng-Chieh Lin, Jheng-Hong Yang, and Jimmy J. Lin. 2021. Distilling Dense Representations for Ranking using Tightly-Coupled Teachers. Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021) (2021).Google Scholar
- Y Luan, Jacob Eisenstein, Kristina Toutanova, and Michael Collins. 2021. Sparse, Dense, and Attentional Representations for Text Retrieval. Transactions of the Association for Computational Linguistics, TACL (2021).Google Scholar
- Sean MacAvaney, Franco Maria Nardini, R. Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. 2020. Training Curricula for Open Domain Answer Re-Ranking. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020).Google ScholarDigital Library
- Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2020. Teacher--Student Curriculum Learning. IEEE Transactions on Neural Networks and Learning Systems 31 (2020), 3732--3740.Google ScholarCross Ref
- Rodrigo Nogueira. 2019. From doc2query to docTTTTTquery.Google Scholar
- Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. ArXiv abs/1901.04085 (2019).Google Scholar
- Gustavo Penha and Claudia Hauff. 2020. Curriculum Learning Strategies for IR: An Empirical Study on Conversation Response Ranking. In Proceedings of the 42nd European Conference on IR Research (ECIR '20). Springer-Verlag, Berlin, Heidelberg, 699--713.Google ScholarDigital Library
- Prafull Prakash, Julian Killingback, and Hamed Zamani. 2021. Learning Robust Dense Retrieval Models from Incomplete Relevance Labels. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021).Google ScholarDigital Library
- Yifan Qiao, Chenyan Xiong, Zhenghao Liu, and Zhiyuan Liu. 2019. Understanding the Behaviors of BERT in Ranking. ArXiv abs/1904.07531 (2019).Google Scholar
- Yingqi Qu, Yuchen Ding, Jing Liu, Kai Liu, Ruiyang Ren, Xin Zhao, Daxiang Dong, Hua Wu, and Haifeng Wang. 2021. RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL.Google ScholarCross Ref
- Ruiyang Ren, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Qiaoqiao She, Hua Wu, Haifeng Wang, and Ji-Rong Wen. 2021. RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP (2021).Google ScholarCross Ref
- Stephen Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. 1995. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC-3). Gaithersburg, MD: NIST, 109--126.Google Scholar
- Gerard Salton, A. Wong, and Chung-Shu Yang. 1975. A vector space model for automatic indexing. Commun. ACM 18 (1975), 613--620.Google ScholarDigital Library
- Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXivabs/1910.01108 (2019).Google Scholar
- Keshav Santhanam, O. Khattab, Jon Saad-Falcon, Christopher Potts, and Matei A. Zaharia. 2021. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. ArXiv abs/2112.01488 (2021).Google Scholar
- Lilian Weng. 2020. Curriculum for Reinforcement Learning. lilianweng.github.io/lil-log (2020). https://lilianweng.github.io/lil-log/2020/01/29/curriculum-for-reinforcement-learning.htmlGoogle Scholar
- Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2021. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. 9th International Conference on Learning Representations, ICLR (2021).Google Scholar
- Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, and Shaoping Ma. 2021. Optimizing Dense Retrieval Model Training with Hard Negatives. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021).Google ScholarDigital Library
Index Terms
- Curriculum Learning for Dense Retrieval Distillation
Recommendations
PROD: Progressive Distillation for Dense Retrieval
WWW '23: Proceedings of the ACM Web Conference 2023Knowledge distillation is an effective way to transfer knowledge from a strong teacher to an efficient student model. Ideally, we expect the better the teacher is, the better the student performs. However, this expectation does not always come true. It ...
Improving zero-shot retrieval using dense external expansion
AbstractPseudo-relevance feedback (PRF) is a classical technique to improve search engine retrieval effectiveness, by closing the vocabulary gap between users’ query formulations and the relevant documents. While PRF is typically applied on ...
Highlights- Dense external expansion improves zero-shot retrieval performance.
- High quality ...
Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation
Advances in Information RetrievalAbstractPrior work on English monolingual retrieval has shown that a cross-encoder trained using a large number of relevance judgments for query-document pairs can be used as a teacher to train more efficient, but similarly effective, dual-encoder student ...
Comments