ABSTRACT
The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon, making them interpretable. Passage representations can be pre-computed at index time to reduce query-time latency. We call our approach EPIC (Expansion via Prediction of Importance with Contextualization). We show that EPIC significantly outperforms prior importance-modeling and document expansion approaches. We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches. Specifically, EPIC achieves a MRR@10 of 0.304 on the MS-MARCO passage ranking dataset with 78ms average query latency on commodity hardware. We also find that the latency is further reduced to 68ms by pruning document representations, with virtually no difference in effectiveness.
- Nick Craswell, Bhaskar Mitra, and Daniel Campos. 2019. Overview of the TREC 2019 Deep Learning Track. In TREC.Google Scholar
- Zhuyun Dai and Jamie Callan. 2019. Context-aware sentence/passage term importance estimation for first stage retrieval. arXiv (2019).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.Google Scholar
- Laura Dietz, Ben Gamari, Jeff Dalton, and Nick Craswell. 2017. TREC Complex Answer Retrieval Overview. In TREC.Google Scholar
- Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In CIKM.Google Scholar
- Helia Hashemi, Mohammad Aliannejadi, Hamed Zamani, and W. Bruce Croft. 2019. ANTIQUE: A Non-Factoid Question Answering Benchmark. arXiv (2019).Google Scholar
- Sebastian Hofst"atter and Allan Hanbury. 2019. Let's measure run time! Extending the IR replicability infrastructure to include performance aspects. In OSIRRC@SIGIR.Google Scholar
- Sebastian Hofst"atter, Markus Zlabinger, and Allan Hanbury. 2020. Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking. arXiv (2020).Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.Google Scholar
- Sean MacAvaney. 2020. OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline. In WSDM.Google Scholar
- Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized Embeddings for Document Ranking. In SIGIR.Google ScholarDigital Library
- Bhaskar Mitra and Nick Craswell. 2019. An updated duet model for passage re-ranking. arXiv (2019).Google Scholar
- Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv (2019).Google Scholar
- Rodrigo Nogueira and Jimmy Lin. 2019. From doc2query to docTTTTTquery. (2019). https://cs.uwaterloo.ca/ jimmylin/publications/Nogueira_Lin_2019_docTTTTTquery-v2.pdfGoogle Scholar
- Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document expansion by query prediction. arXiv (2019).Google Scholar
- Colin Raffel, Noam Shazeer, Adam Kaleo Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv (2019).Google Scholar
- Nicola Tonellotto, Craig Macdonald, and Iadh Ounis. 2018. Efficient Query Processing for Scalable Web Search. Foundations and Trends in Information Retrieval, Vol. 12, 4--5 (2018), 319--492. http://dx.doi.org/10.1561/1500000057Google ScholarDigital Library
- Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible Ranking Baselines Using Lucene. J. Data and Information Quality, Vol. 10 (2018), 16:1--16:20.Google ScholarDigital Library
Index Terms
- Expansion via Prediction of Importance with Contextualization
Recommendations
Novel term weighting schemes for document representation based on ranking of terms and Fuzzy logic with semantic relationship of terms
Highlights- Ranking of terms based term-weighting scheme proposed.
- Ranking of terms and ...
AbstractWeighting and normalization are the most important factor that may affect the text representation significantly. This paper presents two novel term weighting schemes to represent text documents, namely, i). Term-weighting scheme for ...
Passage Retrieval on Structured Documents Using Graph Attention Networks
Advances in Information RetrievalAbstractPassage Retrieval systems aim at retrieving and ranking small text units according to their estimated relevance to a query. A usual practice is to consider the context a passage appears in (its containing document, neighbour passages, etc.) to ...
Context-Aware Document Term Weighting for Ad-Hoc Search
WWW '20: Proceedings of The Web Conference 2020Bag-of-words document representations play a fundamental role in modern search engines, but their power is limited by the shallow frequency-based term weighting scheme. This paper proposes HDCT, a context-aware document term weighting framework for ...
Comments