research-article

Embedding-based Query Language Models

Authors:
Hamed Zamani

University of Massachusetts Amherst, Amherst, MA, USA

University of Massachusetts Amherst, Amherst, MA, USA
View Profile

,
W. Bruce Croft

University of Massachusetts Amherst, Amherst, MA, USA

University of Massachusetts Amherst, Amherst, MA, USA
View Profile

ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information RetrievalSeptember 2016Pages 147–156https://doi.org/10.1145/2970398.2970405

Published:12 September 2016Publication History

ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

Pages 147–156

ABSTRACT

Word embeddings, which are low-dimensional vector representations of vocabulary terms that capture the semantic similarity between them, have recently been shown to achieve impressive performance in many natural language processing tasks. The use of word embeddings in information retrieval, however, has only begun to be studied. In this paper, we explore the use of word embeddings to enhance the accuracy of query language models in the ad-hoc retrieval task. To this end, we propose to use word embeddings to incorporate and weight terms that do not occur in the query, but are semantically related to the query terms. We describe two embedding-based query expansion models with different assumptions. Since pseudo-relevance feedback methods that use the top retrieved documents to update the original query model are well-known to be effective, we also develop an embedding-based relevance model, an extension of the effective and robust relevance model approach. In these models, we transform the similarity values obtained by the widely-used cosine similarity with a sigmoid function to have more discriminative semantic similarity values. We evaluate our proposed methods using three TREC newswire and web collections. The experimental results demonstrate that the embedding-based methods significantly outperform competitive baselines in most cases. The embedding-based methods are also shown to be more robust than the baselines.

References

N. Abdul-jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, D. Metzler, M. D. Smucker, T. Strohman, H. Turtle, and C. Wade. UMass at TREC 2004: Novelty and HARD. In TREC '04, 2004.Google Scholar
M. ALMasri, C. Berrut, and J.-P. Chevallet. A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information. In ECIR '16, pages 709--715, 2016.Google Scholar
J. Bai, J.-Y. Nie, G. Cao, and H. Bouchard. Using Query Contexts in Information Retrieval. In SIGIR '07, pages 15--22, 2007. Google ScholarDigital Library
C. Carpineto and G. Romano. A Survey of Automatic Query Expansion in Information Retrieval. ACM Comput. Surv., 44(1):1:1--1:50, 2012. Google ScholarDigital Library
S. Clinchant and F. Perronnin. Aggregating Continuous Word Embeddings for Information Retrieval. In CVSC@ACL '13, pages 100--109, 2013.Google Scholar
K. Collins-Thompson. Reducing the Risk of Query Expansion via Robust Constrained Optimization. In CIKM '09, pages 837--846, 2009. Google ScholarDigital Library
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.Google ScholarCross Ref
P. Dhillon, D. P. Foster, and L. H. Ungar. Multi-View Learning of Word Embeddings via CCA. In NIPS '11, pages 199--207, 2011. Google ScholarDigital Library
F. Diaz, B. Mitra, and N. Craswell. Query Expansion with Locally-Trained Word Embeddings. In ACL '16, 2016.Google Scholar
D. Ganguly, D. Roy, M. Mitra, and G. J. Jones. Word Embedding Based Generalized Language Model for Information Retrieval. In SIGIR '15, pages 795--798, 2015. Google ScholarDigital Library
M. Karimzadehgan and C. Zhai. Estimation of Statistical Translation Models Based on Mutual Information for Ad Hoc Information Retrieval. In SIGIR '10, pages 323--330, 2010. Google ScholarDigital Library
T. Kenter and M. de Rijke. Short Text Similarity with Word Embeddings. In CIKM '15, pages 1411--1420, 2015. Google ScholarDigital Library
M. J. Kusner, Y. Sun, N. I. Kolkin, and K. Q. Weinberger. From Word Embeddings to Document Distances. In ICML '15, pages 957--966, 2015.Google ScholarDigital Library
J. Lafferty and C. Zhai. Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In SIGIR '01, pages 111--119, 2001. Google ScholarDigital Library
V. Lavrenko and W. B. Croft. Relevance Based Language Models. In SIGIR '01, pages 120--127, 2001. Google ScholarDigital Library
Q. V. Le and T. Mikolov. Distributed Representations of Sentences and Documents. In ICML '14, pages 1188--1196, 2014.Google Scholar
O. Levy, Y. Goldberg, and I. Dagan. Improving Distributional Similarity with Lessons Learned from Word Embeddings. TACL, 3:211--225, 2015.Google ScholarCross Ref
Y. Lv and C. Zhai. A Comparative Study of Methods for Estimating Query Language Models with Pseudo Feedback. In CIKM '09, pages 1895--1898, 2009. Google ScholarDigital Library
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed Representations of Words and Phrases and their Compositionality. In NIPS '13, pages 3111--3119, 2013. Google ScholarDigital Library
A. Montazeralghaem, H. Zamani, and A. Shakery. Axiomatic Analysis for Improving the Log-Logistic Feedback Model. In SIGIR '16, pages 765--768, 2016. Google ScholarDigital Library
J. Pennington, R. Socher, and C. Manning. GloVe: Global Vectors for Word Representation. In EMNLP '14, pages 1532--1543, 2014.Google Scholar
J. M. Ponte and W. B. Croft. A Language Modeling Approach to Information Retrieval. In SIGIR '98, pages 275--281, 1998. Google ScholarDigital Library
P. Resnik. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In IJCAI '95, pages 448--453, 1995. Google ScholarDigital Library
J. J. Rocchio. Relevance Feedback in Information Retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. 1971.Google Scholar
A. Sordoni, Y. Bengio, and J.-Y. Nie. Learning Concept Embeddings for Query Expansion by Quantum Entropy Minimization. In AAAI '14, pages 1586--1592, 2014. Google ScholarDigital Library
E. M. Voorhees. Query Expansion Using Lexical-semantic Relations. In SIGIR '94, pages 61--69, 1994. Google ScholarDigital Library
I. Vulić and M.-F. Moens. Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings. In SIGIR '15, pages 363--372, 2015. Google ScholarDigital Library
J. Xu and W. B. Croft. Query Expansion Using Local and Global Document Analysis. In SIGIR '96, pages 4--11, 1996. Google ScholarDigital Library
C. Zhai and J. Lafferty. Model-based Feedback in the Language Modeling Approach to Information Retrieval. In CIKM '01, pages 403--410, 2001. Google ScholarDigital Library
G. Zheng and J. Callan. Learning to Reweight Terms with Distributed Representations. In SIGIR '15, pages 575--584, 2015. Google ScholarDigital Library
G. Zhou, T. He, J. Zhao, and P. Hu. Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering. In ACL '15, pages 250--259, 2015.Google Scholar
G. Zuccon, B. Koopman, P. Bruza, and L. Azzopardi. Integrating and Evaluating Neural Word Embeddings in Information Retrieval. In ADCS '15, pages 12:1--12:8, 2015. Google ScholarDigital Library

Index Terms

Embedding-based Query Language Models
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
      1. Query reformulation
      2. Query representation
    2. Retrieval models and ranking
      1. Language models

Recommendations

Relevance-based Word Embedding
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Learning a high-dimensional dense representation for vocabulary terms, also known as a word embedding, has recently attracted much attention in natural language processing and information retrieval tasks. The embedding vectors are typically learned ...
Read More
Estimating Embedding Vectors for Queries
ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

The dense vector representation of vocabulary terms, also known as word embeddings, have been shown to be highly effective in many natural language processing tasks. Word embeddings have recently begun to be studied in a number of information retrieval (...
Read More
Word-embedding-based pseudo-relevance feedback for Arabic information retrieval

Pseudo-relevance feedback (PRF) is a very effective query expansion approach, which reformulates queries by selecting expansion terms from top k pseudo-relevant documents. Although standard PRF models have been proven effective to deal with vocabulary ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
September 2016
318 pages
ISBN:9781450344975
DOI:10.1145/2970398
General Chairs:
Ben Carterette
University of Delaware, USA
,
Hui Fang
University of Delaware, USA
,
Program Chairs:
Mounia Lalmas
Yahoo! Labs, UK
,
Jian-Yun Nie
University of Montreal, Canada
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 September 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
language models
pseudo-relevance feedback
query expansion
word embedding
Qualifiers
- research-article
Conference

Acceptance Rates
ICTIR '16 Paper Acceptance Rate41of79submissions,52%Overall Acceptance Rate209of482submissions,43%
More
Upcoming Conference
ICTIR '24

Sponsor:

sigir

The 2024 ACM SIGIR International Conference on the Theory of Information Retrieval

July 13, 2024

Washington DC , DC , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 71
  Total Citations
  View Citations
- 1,318
  Total Downloads
- Downloads (Last 12 months)89
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Embedding-based Query Language Models

ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Relevance-based Word Embedding

Estimating Embedding Vectors for Queries

Word-embedding-based pseudo-relevance feedback for Arabic information retrieval