short-paper

Expansion via Prediction of Importance with Contextualization

Authors:
Sean MacAvaney

Georgetown University, Washington, DC, USA

Georgetown University, Washington, DC, USA
View Profile

,
Franco Maria Nardini

ISTI-CNR, Pisa, Italy

ISTI-CNR, Pisa, Italy
View Profile

,
Raffaele Perego

ISTI-CNR, Pisa, Italy

ISTI-CNR, Pisa, Italy
View Profile

,
Nicola Tonellotto

University of Pisa, Pisa, Italy

University of Pisa, Pisa, Italy
View Profile

,
Nazli Goharian

Georgetown University, Washington, DC, USA

Georgetown University, Washington, DC, USA
View Profile

,
Ophir Frieder

Georgetown University, Washington, DC, USA

Georgetown University, Washington, DC, USA
View Profile

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information RetrievalJuly 2020Pages 1573–1576https://doi.org/10.1145/3397271.3401262

Published:25 July 2020Publication History

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1573–1576

ABSTRACT

The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon, making them interpretable. Passage representations can be pre-computed at index time to reduce query-time latency. We call our approach EPIC (Expansion via Prediction of Importance with Contextualization). We show that EPIC significantly outperforms prior importance-modeling and document expansion approaches. We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches. Specifically, EPIC achieves a MRR@10 of 0.304 on the MS-MARCO passage ranking dataset with 78ms average query latency on commodity hardware. We also find that the latency is further reduced to 68ms by pruning document representations, with virtually no difference in effectiveness.

References

Nick Craswell, Bhaskar Mitra, and Daniel Campos. 2019. Overview of the TREC 2019 Deep Learning Track. In TREC.Google Scholar
Zhuyun Dai and Jamie Callan. 2019. Context-aware sentence/passage term importance estimation for first stage retrieval. arXiv (2019).Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL.Google Scholar
Laura Dietz, Ben Gamari, Jeff Dalton, and Nick Craswell. 2017. TREC Complex Answer Retrieval Overview. In TREC.Google Scholar
Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In CIKM.Google Scholar
Helia Hashemi, Mohammad Aliannejadi, Hamed Zamani, and W. Bruce Croft. 2019. ANTIQUE: A Non-Factoid Question Answering Benchmark. arXiv (2019).Google Scholar
Sebastian Hofst"atter and Allan Hanbury. 2019. Let's measure run time! Extending the IR replicability infrastructure to include performance aspects. In OSIRRC@SIGIR.Google Scholar
Sebastian Hofst"atter, Markus Zlabinger, and Allan Hanbury. 2020. Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking. arXiv (2020).Google Scholar
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.Google Scholar
Sean MacAvaney. 2020. OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline. In WSDM.Google Scholar
Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized Embeddings for Document Ranking. In SIGIR.Google ScholarDigital Library
Bhaskar Mitra and Nick Craswell. 2019. An updated duet model for passage re-ranking. arXiv (2019).Google Scholar
Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv (2019).Google Scholar
Rodrigo Nogueira and Jimmy Lin. 2019. From doc2query to docTTTTTquery. (2019). https://cs.uwaterloo.ca/ jimmylin/publications/Nogueira_Lin_2019_docTTTTTquery-v2.pdfGoogle Scholar
Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document expansion by query prediction. arXiv (2019).Google Scholar
Colin Raffel, Noam Shazeer, Adam Kaleo Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv (2019).Google Scholar
Nicola Tonellotto, Craig Macdonald, and Iadh Ounis. 2018. Efficient Query Processing for Scalable Web Search. Foundations and Trends in Information Retrieval, Vol. 12, 4--5 (2018), 319--492. http://dx.doi.org/10.1561/1500000057Google ScholarDigital Library
Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible Ranking Baselines Using Lucene. J. Data and Information Quality, Vol. 10 (2018), 16:1--16:20.Google ScholarDigital Library

Index Terms

Expansion via Prediction of Importance with Contextualization
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Novel term weighting schemes for document representation based on ranking of terms and Fuzzy logic with semantic relationship of terms
Highlights
- Ranking of terms based term-weighting scheme proposed.
- Ranking of terms and ...
Abstract
Weighting and normalization are the most important factor that may affect the text representation significantly. This paper presents two novel term weighting schemes to represent text documents, namely, i). Term-weighting scheme for ...
Read More
Passage Retrieval on Structured Documents Using Graph Attention Networks
Advances in Information Retrieval
Abstract
Passage Retrieval systems aim at retrieving and ranking small text units according to their estimated relevance to a query. A usual practice is to consider the context a passage appears in (its containing document, neighbour passages, etc.) to ...
Read More
Context-Aware Document Term Weighting for Ad-Hoc Search
WWW '20: Proceedings of The Web Conference 2020

Bag-of-words document representations play a fundamental role in modern search engines, but their power is limited by the shallow frequency-based term weighting scheme. This paper proposes HDCT, a context-aware document term weighting framework for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2020
2548 pages
ISBN:9781450380164
DOI:10.1145/3397271
General Chairs:
Jimmy Huang
York University, Canada
,
Yi Chang
Jilin University, China
,
Xueqi Cheng
Chinese Academy of Sciences, China
,
Program Chairs:
Jaap Kamps
University of Amsterdam, Netherlands
,
Vanessa Murdock
Amazon, U.S.A.
,
Ji-Rong Wen
Renmin University of China, China
,
Yiqun Liu
Tsinghua University, China
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 July 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
document representation
efficient ranking
neural ranking
query representation
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 332
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Expansion via Prediction of Importance with Contextualization

SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Novel term weighting schemes for document representation based on ranking of terms and Fuzzy logic with semantic relationship of terms

Passage Retrieval on Structured Documents Using Graph Attention Networks

Context-Aware Document Term Weighting for Ad-Hoc Search