skip to main content
10.1145/2806416.2806480acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Ranking Entities for Web Queries Through Text and Knowledge

Published:17 October 2015Publication History

ABSTRACT

When humans explain complex topics, they naturally talk about involved entities, such as people, locations, or events. In this paper, we aim at automating this process by retrieving and ranking entities that are relevant to understand free-text web-style queries like Argentine British relations, which typically demand a set of heterogeneous entities with no specific target type like, for instance, Falklands_-War} or Margaret-_Thatcher, as answer. Standard approaches to entity retrieval rely purely on features from the knowledge base. We approach the problem from the opposite direction, namely by analyzing web documents that are found to be query-relevant. Our approach hinges on entity linking technology that identifies entity mentions and links them to a knowledge base like Wikipedia. We use a learning-to-rank approach and study different features that use documents, entity mentions, and knowledge base entities -- thus bridging document and entity retrieval. Since established benchmarks for this problem do not exist, we use TREC test collections for document ranking and collect custom relevance judgments for entities. Experiments on TREC Robust04 and TREC Web13/14 data show that: i) single entity features, like the frequency of occurrence within the top-ranke documents, or the query retrieval score against a knowledge base, perform generally well; ii) the best overall performance is achieved when combining different features that relate an entity to the query, its document mentions, and its knowledge base representation.

References

  1. N. Balasubramanian and S. Cucerzan. Beyond ranked lists in web search: Aggregating web content into topic pages. International Journal of Semantic Computing, 4(4):509--534, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  2. K. Balog, A. P. de Vries, P. Serdyukov, P. Thomas, and T. Westerveld. Overview of the TREC 2009 entity track. In Proc. of TREC-09, 2010.Google ScholarGoogle Scholar
  3. C. Biemann and M. Riedl. Text: Now in 2D! A framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1:55--95, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  4. C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia -- A Crystallization Point for the Web of Data. Journal of Web Semantics, 7(3), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Bloehdorn, R. Basili, M. Cammisa, and A. Moschitti. Semantic kernels for text classification based on topological measures of feature similarity. In Proc. of ICDM'06, pages 808--812, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Ciglan, K. Nørvåg, and L. Hluchý. The SemSets model for ad-hoc semantic list search. In Proc. of WWW'12, pages 131--140, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Dalton and L. Dietz. A neighborhood relevance model for entity linking. In Proc. of OAIR-13, pages 149--156, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Dalton, L. Dietz, and J. Allan. Entity query feature expansion using knowledge base links. In Proc. of SIGIR-14, pages 365--374, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Dalvi, R. Kumar, B. Pang, R. Ramakrishnan, A. Tomkins, P. Bohannon, S. Keerthi, and S. Merugu. A web of concepts. In Proc. of PODS '09, pages 1--12, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Demartini, C. S. Firan, T. Iofciu, R. Krestel, and W. Nejdl. Why finding entities in Wikipedia is difficult, sometimes. Information Retrieval, 13(5):534--567, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Demartini, T. Iofciu, and A. P. de Vries. Overview of the INEX 2009 entity ranking track. In Proc. of INEX, pages 254--264, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Dietz, M. Schuhmacher, and S. Ponzetto. Queripidia: Query-specific Wikipedia construction. In Proc. of AKBC-14, 2014.Google ScholarGoogle Scholar
  13. J. Dunietz and D. Gillick. A new entity salience task with millions of training examples. In Proc. of EACL-14, pages 205--209, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  14. O. Egozi, S. Markovitch, and E. Gabrilovich. Concept-based information retrieval using Explicit Semantic Analysis. ACM Transactions on Information Systems, 29(2):8:1--8:34, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Elbassuoni, M. Ramanath, R. Schenkel, M. Sydow, and G. Weikum. Language-model-based ranking for queries on RDF-graphs. In Proc. of CIKM-09, pages 977--986, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Ferragina and U. Scaiella. Fast and accurate annotation of short texts with Wikipedia pages. IEEE Software, 29(1):70--75, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Gabrilovich, M. Ringgaard, and A. Subramanya. Facc1: Freebase annotation of ClueWeb corpora, version 1, 2013.Google ScholarGoogle Scholar
  18. R. Gupta, A. Halevy, X. Wang, S. Whang, and F. Wu. Biperpedia: An ontology for search applications. In Proc. of PVLDB-14, pages 505--516, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Gurajada, J. Kamps, A. Mishra, R. Schenkel, M. Theobald, and Q. Wang. Overview of the INEX 2013 linked data track. In Working Notes for CLEF 2013, 2013.Google ScholarGoogle Scholar
  20. J. Hoffart, Y. Altun, and G. Weikum. Discovering emerging entities with ambiguous names. In Proc. of WWW-14, pages 385--396, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Hoffart, D. Milchevski, and G. Weikum. STICS: Searching with Strings, Things, and Cats. In Proc. of SIGIR-14, pages 1247--1248, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Hovy, R. Navigli, and S. P. Ponzetto. Collaboratively built semi-structured content and Artificial Intelligence: The story so far. Artificial Intelligence, 194:2--27, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Järvelin and J. Kekäläinen. IR evaluation methods for retrieving highly relevant documents. In Proc. of SIGIR-00, pages 41--48, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Joachims. Optimizing search engines using clickthrough data. In Proc. of SIGKDD-02, pages 133--142, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Joachims. Training linear SVMs in linear time. In Proc. of SIGKDD-06, pages 217--226, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Kaptein and J. Kamps. Exploiting the category structure of Wikipedia for entity ranking. Artificial Intelligence, 194:111--129, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Kaptein, P. Serdyukov, A. P. de Vries, and J. Kamps. Entity ranking using Wikipedia as a pivot. In Proc. of CIKM-10, pages 69--78, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. I. Levenshtein. Binary codes capable of correcting spurious insertions and deletions of ones. Problems of Information Transmission, 1:8--17, 1965.Google ScholarGoogle Scholar
  29. T.-Y. Liu. Learning to rank for information retrieval. Springer-Verlag, Berlin, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  30. D. Metzler and W. Bruce Croft. Linear feature-based models for information retrieval. Information Retrieval, 10(3):257--274, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Metzler and W. B. Croft. A markov random field model for term dependencies. In Proc. of SIGIR-05, pages 472--479, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Pehcevski, A.-M. Vercoustre, and J. A. Thom. Exploiting locality of Wikipedia links in entity ranking. In Proc. of ECIR-08, pages 258--269, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Pennington, R. Socher, and C. D. Manning. GloVe: Global vectors for word representation. In Proc. of EMNLP-2014, pages 1532--1543, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  34. J. Pound, P. Mika, and H. Zaragoza. Ad-hoc object retrieval in the web of data. In Proc. of WWW-10, pages 771--780, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Raviv, D. Carmel, and O. Kurland. A ranking framework for entity oriented search using markov random fields. In Proc. of JIWES '12, pages 1--6, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Schuhmacher and S. P. Ponzetto. Knowledge-based graph document modeling. In Proc. of WSDM-14, pages 543--552, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers to non-factoid questions from web collections. Computational Linguistics, 37(2):351--383, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Tonon, G. Demartini, and P. Cudré-Mauroux. Combining inverted indices and structured search for ad-hoc object retrieval. In Proc. of SIGIR-12, pages 125--134. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. Unger, L. Bühmann, J. Lehmann, A.-C. Ngonga Ngomo, D. Gerber, and P. Cimiano. Template-based question answering over RDF data. In Proc. of WWW-12, pages 639--648, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. E. M. Voorhees. The TREC robust retrieval track. In ACM SIGIR Forum, volume 39, pages 11--20, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. E. M. Voorhees and D. K. Harman. TREC: Experiment and Evaluation in Information Retrieval. The MIT Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. W. Wu, H. Li, H. Wang, and K. Zhu. Probase: A probabilistic taxonomy for text understanding. In Proc. of SIGMOD-12, pages 481--492, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. N. Zhiltsov and E. Agichtein. Improving entity search over linked data by modeling latent semantics. In Proc. of CIKM-13, pages 1253--1256, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Ranking Entities for Web Queries Through Text and Knowledge

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
        October 2015
        1998 pages
        ISBN:9781450337946
        DOI:10.1145/2806416

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 October 2015

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CIKM '15 Paper Acceptance Rate165of646submissions,26%Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader