research-article

Ranking Entities for Web Queries Through Text and Knowledge

Authors:
Michael Schuhmacher

University of Mannheim, Mannheim, Germany

University of Mannheim, Mannheim, Germany
View Profile

,
Laura Dietz

University of Mannheim, Mannheim, Germany

University of Mannheim, Mannheim, Germany
View Profile

,
Simone Paolo Ponzetto

University of Mannheim, Mannheim, Germany

University of Mannheim, Mannheim, Germany
View Profile

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementOctober 2015Pages 1461–1470https://doi.org/10.1145/2806416.2806480

Published:17 October 2015Publication History

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Pages 1461–1470

ABSTRACT

When humans explain complex topics, they naturally talk about involved entities, such as people, locations, or events. In this paper, we aim at automating this process by retrieving and ranking entities that are relevant to understand free-text web-style queries like Argentine British relations, which typically demand a set of heterogeneous entities with no specific target type like, for instance, Falklands_-War} or Margaret-_Thatcher, as answer. Standard approaches to entity retrieval rely purely on features from the knowledge base. We approach the problem from the opposite direction, namely by analyzing web documents that are found to be query-relevant. Our approach hinges on entity linking technology that identifies entity mentions and links them to a knowledge base like Wikipedia. We use a learning-to-rank approach and study different features that use documents, entity mentions, and knowledge base entities -- thus bridging document and entity retrieval. Since established benchmarks for this problem do not exist, we use TREC test collections for document ranking and collect custom relevance judgments for entities. Experiments on TREC Robust04 and TREC Web13/14 data show that: i) single entity features, like the frequency of occurrence within the top-ranke documents, or the query retrieval score against a knowledge base, perform generally well; ii) the best overall performance is achieved when combining different features that relate an entity to the query, its document mentions, and its knowledge base representation.

References

N. Balasubramanian and S. Cucerzan. Beyond ranked lists in web search: Aggregating web content into topic pages. International Journal of Semantic Computing, 4(4):509--534, 2010.Google ScholarCross Ref
K. Balog, A. P. de Vries, P. Serdyukov, P. Thomas, and T. Westerveld. Overview of the TREC 2009 entity track. In Proc. of TREC-09, 2010.Google Scholar
C. Biemann and M. Riedl. Text: Now in 2D! A framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1:55--95, 2013.Google ScholarCross Ref
C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. DBpedia -- A Crystallization Point for the Web of Data. Journal of Web Semantics, 7(3), 2009. Google ScholarDigital Library
S. Bloehdorn, R. Basili, M. Cammisa, and A. Moschitti. Semantic kernels for text classification based on topological measures of feature similarity. In Proc. of ICDM'06, pages 808--812, 2006. Google ScholarDigital Library
M. Ciglan, K. Nørvåg, and L. Hluchý. The SemSets model for ad-hoc semantic list search. In Proc. of WWW'12, pages 131--140, 2012. Google ScholarDigital Library
J. Dalton and L. Dietz. A neighborhood relevance model for entity linking. In Proc. of OAIR-13, pages 149--156, 2013. Google ScholarDigital Library
J. Dalton, L. Dietz, and J. Allan. Entity query feature expansion using knowledge base links. In Proc. of SIGIR-14, pages 365--374, 2014. Google ScholarDigital Library
N. Dalvi, R. Kumar, B. Pang, R. Ramakrishnan, A. Tomkins, P. Bohannon, S. Keerthi, and S. Merugu. A web of concepts. In Proc. of PODS '09, pages 1--12, 2009. Google ScholarDigital Library
G. Demartini, C. S. Firan, T. Iofciu, R. Krestel, and W. Nejdl. Why finding entities in Wikipedia is difficult, sometimes. Information Retrieval, 13(5):534--567, 2010. Google ScholarDigital Library
G. Demartini, T. Iofciu, and A. P. de Vries. Overview of the INEX 2009 entity ranking track. In Proc. of INEX, pages 254--264, 2009. Google ScholarDigital Library
L. Dietz, M. Schuhmacher, and S. Ponzetto. Queripidia: Query-specific Wikipedia construction. In Proc. of AKBC-14, 2014.Google Scholar
J. Dunietz and D. Gillick. A new entity salience task with millions of training examples. In Proc. of EACL-14, pages 205--209, 2014.Google ScholarCross Ref
O. Egozi, S. Markovitch, and E. Gabrilovich. Concept-based information retrieval using Explicit Semantic Analysis. ACM Transactions on Information Systems, 29(2):8:1--8:34, 2011. Google ScholarDigital Library
S. Elbassuoni, M. Ramanath, R. Schenkel, M. Sydow, and G. Weikum. Language-model-based ranking for queries on RDF-graphs. In Proc. of CIKM-09, pages 977--986, 2009. Google ScholarDigital Library
P. Ferragina and U. Scaiella. Fast and accurate annotation of short texts with Wikipedia pages. IEEE Software, 29(1):70--75, 2012. Google ScholarDigital Library
E. Gabrilovich, M. Ringgaard, and A. Subramanya. Facc1: Freebase annotation of ClueWeb corpora, version 1, 2013.Google Scholar
R. Gupta, A. Halevy, X. Wang, S. Whang, and F. Wu. Biperpedia: An ontology for search applications. In Proc. of PVLDB-14, pages 505--516, 2014. Google ScholarDigital Library
S. Gurajada, J. Kamps, A. Mishra, R. Schenkel, M. Theobald, and Q. Wang. Overview of the INEX 2013 linked data track. In Working Notes for CLEF 2013, 2013.Google Scholar
J. Hoffart, Y. Altun, and G. Weikum. Discovering emerging entities with ambiguous names. In Proc. of WWW-14, pages 385--396, 2014. Google ScholarDigital Library
J. Hoffart, D. Milchevski, and G. Weikum. STICS: Searching with Strings, Things, and Cats. In Proc. of SIGIR-14, pages 1247--1248, 2014. Google ScholarDigital Library
E. Hovy, R. Navigli, and S. P. Ponzetto. Collaboratively built semi-structured content and Artificial Intelligence: The story so far. Artificial Intelligence, 194:2--27, 2013. Google ScholarDigital Library
K. Järvelin and J. Kekäläinen. IR evaluation methods for retrieving highly relevant documents. In Proc. of SIGIR-00, pages 41--48, 2000. Google ScholarDigital Library
T. Joachims. Optimizing search engines using clickthrough data. In Proc. of SIGKDD-02, pages 133--142, 2002. Google ScholarDigital Library
T. Joachims. Training linear SVMs in linear time. In Proc. of SIGKDD-06, pages 217--226, 2006. Google ScholarDigital Library
R. Kaptein and J. Kamps. Exploiting the category structure of Wikipedia for entity ranking. Artificial Intelligence, 194:111--129, 2013. Google ScholarDigital Library
R. Kaptein, P. Serdyukov, A. P. de Vries, and J. Kamps. Entity ranking using Wikipedia as a pivot. In Proc. of CIKM-10, pages 69--78, 2010. Google ScholarDigital Library
V. I. Levenshtein. Binary codes capable of correcting spurious insertions and deletions of ones. Problems of Information Transmission, 1:8--17, 1965.Google Scholar
T.-Y. Liu. Learning to rank for information retrieval. Springer-Verlag, Berlin, 2011.Google ScholarCross Ref
D. Metzler and W. Bruce Croft. Linear feature-based models for information retrieval. Information Retrieval, 10(3):257--274, 2007. Google ScholarDigital Library
D. Metzler and W. B. Croft. A markov random field model for term dependencies. In Proc. of SIGIR-05, pages 472--479, 2005. Google ScholarDigital Library
J. Pehcevski, A.-M. Vercoustre, and J. A. Thom. Exploiting locality of Wikipedia links in entity ranking. In Proc. of ECIR-08, pages 258--269, 2008. Google ScholarDigital Library
J. Pennington, R. Socher, and C. D. Manning. GloVe: Global vectors for word representation. In Proc. of EMNLP-2014, pages 1532--1543, 2014.Google ScholarCross Ref
J. Pound, P. Mika, and H. Zaragoza. Ad-hoc object retrieval in the web of data. In Proc. of WWW-10, pages 771--780, 2010. Google ScholarDigital Library
H. Raviv, D. Carmel, and O. Kurland. A ranking framework for entity oriented search using markov random fields. In Proc. of JIWES '12, pages 1--6, 2012. Google ScholarDigital Library
M. Schuhmacher and S. P. Ponzetto. Knowledge-based graph document modeling. In Proc. of WSDM-14, pages 543--552, 2014. Google ScholarDigital Library
M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers to non-factoid questions from web collections. Computational Linguistics, 37(2):351--383, 2011. Google ScholarDigital Library
A. Tonon, G. Demartini, and P. Cudré-Mauroux. Combining inverted indices and structured search for ad-hoc object retrieval. In Proc. of SIGIR-12, pages 125--134. ACM, 2012. Google ScholarDigital Library
C. Unger, L. Bühmann, J. Lehmann, A.-C. Ngonga Ngomo, D. Gerber, and P. Cimiano. Template-based question answering over RDF data. In Proc. of WWW-12, pages 639--648, 2012. Google ScholarDigital Library
E. M. Voorhees. The TREC robust retrieval track. In ACM SIGIR Forum, volume 39, pages 11--20, 2005. Google ScholarDigital Library
E. M. Voorhees and D. K. Harman. TREC: Experiment and Evaluation in Information Retrieval. The MIT Press, 2005. Google ScholarDigital Library
W. Wu, H. Li, H. Wang, and K. Zhu. Probase: A probabilistic taxonomy for text understanding. In Proc. of SIGMOD-12, pages 481--492, 2012. Google ScholarDigital Library
N. Zhiltsov and E. Agichtein. Improving entity search over linked data by modeling latent semantics. In Proc. of CIKM-13, pages 1253--1256, 2013. Google ScholarDigital Library

Index Terms

Ranking Entities for Web Queries Through Text and Knowledge
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Semantic networks
2. Information systems
  1. Information retrieval

Recommendations

Entity query feature expansion using knowledge base links
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

Recent advances in automatic entity linking and knowledge base construction have resulted in entity annotations for document and query collections. For example, annotations of entities from large general purpose knowledge bases, such as Freebase and the ...
Read More
Ranking related entities for web search queries
WWW '11: Proceedings of the 20th international conference companion on World wide web

Entity ranking is a recent paradigm that refers to retrieving and ranking related objects and entities from different structured sources in various scenarios. Entities typically have associated categories and relationships with other entities. In this ...
Read More
Discovering and disambiguating named entities in text
SIGMOD'13 PhD Symposium: Proceedings of the 2013 SIGMOD/PODS Ph.D. symposium

Disambiguating named entities in natural language texts maps ambiguous names to canonical entities registered in a knowledge base such as DBpedia, Freebase, or YAGO. Knowing the specific entity is an important asset for several other tasks, e.g. entity-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
October 2015
1998 pages
ISBN:9781450337946
DOI:10.1145/2806416
General Chairs:
James Bailey
The University of Melbourne
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Charu C. Aggarwal
IBM
,
Maarten de Rijke
University of Amsterdam
,
Ravi Kumar
Google
,
Vanessa Murdock
Microsoft
,
Timos Sellis
RMIT University
,
Jeffrey Xu Yu
Chinese University of Hong Kong
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
entities
entity ranking
information retrieval
knowledge bases
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '15 Paper Acceptance Rate165of646submissions,26%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 37
  Total Citations
  View Citations
- 571
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Ranking Entities for Web Queries Through Text and Knowledge

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Entity query feature expansion using knowledge base links

Ranking related entities for web search queries

Discovering and disambiguating named entities in text