skip to main content
10.1145/2970398.2970405acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
research-article

Embedding-based Query Language Models

Published:12 September 2016Publication History

ABSTRACT

Word embeddings, which are low-dimensional vector representations of vocabulary terms that capture the semantic similarity between them, have recently been shown to achieve impressive performance in many natural language processing tasks. The use of word embeddings in information retrieval, however, has only begun to be studied. In this paper, we explore the use of word embeddings to enhance the accuracy of query language models in the ad-hoc retrieval task. To this end, we propose to use word embeddings to incorporate and weight terms that do not occur in the query, but are semantically related to the query terms. We describe two embedding-based query expansion models with different assumptions. Since pseudo-relevance feedback methods that use the top retrieved documents to update the original query model are well-known to be effective, we also develop an embedding-based relevance model, an extension of the effective and robust relevance model approach. In these models, we transform the similarity values obtained by the widely-used cosine similarity with a sigmoid function to have more discriminative semantic similarity values. We evaluate our proposed methods using three TREC newswire and web collections. The experimental results demonstrate that the embedding-based methods significantly outperform competitive baselines in most cases. The embedding-based methods are also shown to be more robust than the baselines.

References

  1. N. Abdul-jaleel, J. Allan, W. B. Croft, F. Diaz, L. Larkey, X. Li, D. Metzler, M. D. Smucker, T. Strohman, H. Turtle, and C. Wade. UMass at TREC 2004: Novelty and HARD. In TREC '04, 2004.Google ScholarGoogle Scholar
  2. M. ALMasri, C. Berrut, and J.-P. Chevallet. A Comparison of Deep Learning Based Query Expansion with Pseudo-Relevance Feedback and Mutual Information. In ECIR '16, pages 709--715, 2016.Google ScholarGoogle Scholar
  3. J. Bai, J.-Y. Nie, G. Cao, and H. Bouchard. Using Query Contexts in Information Retrieval. In SIGIR '07, pages 15--22, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Carpineto and G. Romano. A Survey of Automatic Query Expansion in Information Retrieval. ACM Comput. Surv., 44(1):1:1--1:50, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Clinchant and F. Perronnin. Aggregating Continuous Word Embeddings for Information Retrieval. In CVSC@ACL '13, pages 100--109, 2013.Google ScholarGoogle Scholar
  6. K. Collins-Thompson. Reducing the Risk of Query Expansion via Robust Constrained Optimization. In CIKM '09, pages 837--846, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  8. P. Dhillon, D. P. Foster, and L. H. Ungar. Multi-View Learning of Word Embeddings via CCA. In NIPS '11, pages 199--207, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Diaz, B. Mitra, and N. Craswell. Query Expansion with Locally-Trained Word Embeddings. In ACL '16, 2016.Google ScholarGoogle Scholar
  10. D. Ganguly, D. Roy, M. Mitra, and G. J. Jones. Word Embedding Based Generalized Language Model for Information Retrieval. In SIGIR '15, pages 795--798, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Karimzadehgan and C. Zhai. Estimation of Statistical Translation Models Based on Mutual Information for Ad Hoc Information Retrieval. In SIGIR '10, pages 323--330, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Kenter and M. de Rijke. Short Text Similarity with Word Embeddings. In CIKM '15, pages 1411--1420, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. J. Kusner, Y. Sun, N. I. Kolkin, and K. Q. Weinberger. From Word Embeddings to Document Distances. In ICML '15, pages 957--966, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Lafferty and C. Zhai. Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In SIGIR '01, pages 111--119, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Lavrenko and W. B. Croft. Relevance Based Language Models. In SIGIR '01, pages 120--127, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Q. V. Le and T. Mikolov. Distributed Representations of Sentences and Documents. In ICML '14, pages 1188--1196, 2014.Google ScholarGoogle Scholar
  17. O. Levy, Y. Goldberg, and I. Dagan. Improving Distributional Similarity with Lessons Learned from Word Embeddings. TACL, 3:211--225, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  18. Y. Lv and C. Zhai. A Comparative Study of Methods for Estimating Query Language Models with Pseudo Feedback. In CIKM '09, pages 1895--1898, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed Representations of Words and Phrases and their Compositionality. In NIPS '13, pages 3111--3119, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Montazeralghaem, H. Zamani, and A. Shakery. Axiomatic Analysis for Improving the Log-Logistic Feedback Model. In SIGIR '16, pages 765--768, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Pennington, R. Socher, and C. Manning. GloVe: Global Vectors for Word Representation. In EMNLP '14, pages 1532--1543, 2014.Google ScholarGoogle Scholar
  22. J. M. Ponte and W. B. Croft. A Language Modeling Approach to Information Retrieval. In SIGIR '98, pages 275--281, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Resnik. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In IJCAI '95, pages 448--453, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. J. Rocchio. Relevance Feedback in Information Retrieval. In The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313--323. 1971.Google ScholarGoogle Scholar
  25. A. Sordoni, Y. Bengio, and J.-Y. Nie. Learning Concept Embeddings for Query Expansion by Quantum Entropy Minimization. In AAAI '14, pages 1586--1592, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. M. Voorhees. Query Expansion Using Lexical-semantic Relations. In SIGIR '94, pages 61--69, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. I. Vulić and M.-F. Moens. Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings. In SIGIR '15, pages 363--372, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Xu and W. B. Croft. Query Expansion Using Local and Global Document Analysis. In SIGIR '96, pages 4--11, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Zhai and J. Lafferty. Model-based Feedback in the Language Modeling Approach to Information Retrieval. In CIKM '01, pages 403--410, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Zheng and J. Callan. Learning to Reweight Terms with Distributed Representations. In SIGIR '15, pages 575--584, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Zhou, T. He, J. Zhao, and P. Hu. Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering. In ACL '15, pages 250--259, 2015.Google ScholarGoogle Scholar
  32. G. Zuccon, B. Koopman, P. Bruza, and L. Azzopardi. Integrating and Evaluating Neural Word Embeddings in Information Retrieval. In ADCS '15, pages 12:1--12:8, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Embedding-based Query Language Models

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval
          September 2016
          318 pages
          ISBN:9781450344975
          DOI:10.1145/2970398

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 September 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ICTIR '16 Paper Acceptance Rate41of79submissions,52%Overall Acceptance Rate209of482submissions,43%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader