Article

Broad expertise retrieval in sparse data environments

Authors:
Krisztian Balog

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

,
Toine Bogers

Tilburg University, Tilburg, Netherlands

Tilburg University, Tilburg, Netherlands
View Profile

,
Leif Azzopardi

University of Glasgow, Glasgow, United Kingdom

University of Glasgow, Glasgow, United Kingdom
View Profile

,
Maarten de Rijke

University of Amsterdam, Amsterdam, Netherlands

University of Amsterdam, Amsterdam, Netherlands
View Profile

,
Antal van den Bosch

Tilburg University, Tilburg, Netherlands

Tilburg University, Tilburg, Netherlands
View Profile

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalJuly 2007Pages 551–558https://doi.org/10.1145/1277741.1277836

Published:23 July 2007Publication History

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 551–558

ABSTRACT

Expertise retrieval has been largely unexplored on data other than the W3C collection. At the same time, many intranets of universities and other knowledge-intensive organisations offer examples of relatively small but clean multilingual expertise data, covering broad ranges of expertise areas. We first present two main expertise retrieval tasks, along with a set of baseline approaches based on generative language modeling, aimed at finding expertise relations between topics and people. For our experimental evaluation, we introduce (and release) a new test set based on a crawl of a university site. Using this test set, we conduct two series of experiments. The first is aimed at determining the effectiveness of baseline expertise retrieval methods applied to the new test set. The second is aimed at assessing refined models that exploit characteristic features of the new test set, such as the organizational structure of the university, and the hierarchical structure of the topics in the test set. Expertise retrieval models are shown to be robust with respect to environments smaller than the W3C collection, and current techniques appear to be generalizable to other settings.

References

L. Azzopardi. Incorporating Context in the Language Modeling Framework for ad hoc Information Retrieval. PhD thesis, University of Paisley, 2005.Google Scholar
K. Balog and M. de Rijke. Finding similar experts. In This volume, 2007.Google Scholar
K. Balog and M. de Rijke. Determining expert profiles (with an application to expert finding). In IJCAI '07: Proc. 20th Intern. Joint Conf. on Artificial Intelligence, pages 2657--2662, 2007. Google ScholarDigital Library
K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. In SIGIR '06: Proc. 29th annual intern. ACM SIGIR conf. on Research and development in information retrieval, pages 43--50, 2006. Google ScholarDigital Library
I. Becerra-Fernandez. The role of artificial intelligence technologies in the implementation of people-finder knowledge management systems. In AAAI Workshop on Bringing Knowledge to Business Processes, March 2000.Google ScholarDigital Library
C. S. Campbell, P. P. Maglio, A. Cozzi, and B. Dom. Expertise identification using email communications. In CIKM '03: Proc. twelfth intern. conf. on Information and knowledge management, pages 528--531, 2003. Google ScholarDigital Library
G. Cao, J.-Y. Nie, and J. Bai. Integrating word relationships into language models. In SIGIR '05: Proc. 28th annual intern. ACM SIGIR conf. on Research and development in information retrieval, pages 298--305, 2005. Google ScholarDigital Library
T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, 1991. Google ScholarDigital Library
N. Craswell, D. Hawking, A. M. Vercoustre, and P.Wilkins. P@noptic expert: Searching for experts not just for documents. In Ausweb, 2001.Google Scholar
N. Craswell, A. de Vries, and I. Soboroff. Overview of the TREC-2005 Enterprise Track. In The Fourteenth Text REtrieval Conf. Proc. (TREC 2005), 2006.Google Scholar
T. H. Davenport and L. Prusak. Working Knowledge: How Organizations Manage What They Know. Harvard Business School Press, Boston, MA, 1998. Google ScholarDigital Library
T. Dunning. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1):61--74, 1993. Google ScholarDigital Library
E. Filatova and J. Prager. Tell me what you do and I'll tell you what you are: Learning occupation-related activities for biographies. In HLT/EMNLP, 2005. Google ScholarDigital Library
V. Lavrenko and W. B. Croft. Relevance based language models. In SIGIR '01: Proc. 24th annual intern. ACM SIGIR conf. on Research and development in information retrieval, pages 120--127, 2001. Google ScholarDigital Library
V. Lavrenko, M. Choquette, and W. B. Croft. Cross-lingual relevance models. In SIGIR '02: Proc. 25th annual intern. ACM SIGIR conf. on Research and development in information retrieval, pages 175--182, 2002. Google ScholarDigital Library
C. Macdonald and I. Ounis. Voting for candidates: adapting data fusion techniques for an expert search task. In CIKM '06: Proc. 15th ACM intern. conf. on Information and knowledge management, pages 387--396, 2006. Google ScholarDigital Library
C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. The MIT Press, 1999. Google ScholarDigital Library
A. Mockus and J. D. Herbsleb. Expertise browser: a quantitative approach to identifying expertise. In ICSE '02: Proc. 24th Intern. Conf. on Software Engineering, pages 503--512, 2002. Google ScholarDigital Library
D. Petkova and W. B. Croft. Hierarchical language models for expert finding in enterprise corpora. In Proc. ICTAI 2006, pages 599--608, 2006. Google ScholarDigital Library
I. Soboroff, A. de Vries, and N. Craswell. Overview of the TREC 2006 Enterprise Track. In TREC 2006 Working Notes, 2006.Google Scholar
T. Tao, X. Wang, Q. Mei, and C. Zhai. Language model information retrieval with document expansion. In HLT-NAACL 2006, 2006. Google ScholarDigital Library
TREC. Enterprise track, 2005. URL: http://www.ins.cwi.nl/projects/trec-ent/wiki/.Google Scholar
G. van Noord. TextCat Language Guesser. URL: http://www.let.rug.nl/~vannoord/TextCat/.Google Scholar
W3C. The W3C test collection, 2005. URL: http://research.microsoft.com/users/nickcr/w3c-summary.html.Google Scholar

Index Terms

Broad expertise retrieval in sparse data environments
1. Information systems

Recommendations

Automated Expertise Retrieval: A Taxonomy-Based Survey and Open Issues

Understanding people’s expertise is not a trivial task since it is time-consuming when manually executed. Automated approaches have become a topic of research in recent years in various scientific fields, such as information retrieval, databases, and ...
Read More
A study of the relationship between ad hoc retrieval and expert finding in enterprise environment
WIDM '08: Proceedings of the 10th ACM workshop on Web information and data management

Ad hoc retrieval returns a ranked list of documents in response to a search query, while expert finding returns a ranked list of people in response to an expertise request in the form of a search query, e.g., "information retrieval". In current state of ...
Read More
Referral based expertise search system in a time evolving social network
COMPUTE '10: Proceedings of the Third Annual ACM Bangalore Conference

To solve some difficult problems that requires procedural knowledge, people often seek the advice of experts who have got competence in that problem domain. This paper focuses on locating and determining an expert in a particular knowledge domain. In ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
July 2007
946 pages
ISBN:9781595935977
DOI:10.1145/1277741
General Chairs:
Wessel Kraaij
TNO, The Netherlands
,
Arjen P. de Vries
CWI, The Netherlands
,
Program Chairs:
Charles L. A. Clarke
University of Waterloo, Canada
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Noriko Kando
National Institute of Informatics, Japan
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
expert finding
expertise search
intranet search
language models
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 87
  Total Citations
  View Citations
- 834
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Broad expertise retrieval in sparse data environments

SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automated Expertise Retrieval: A Taxonomy-Based Survey and Open Issues

A study of the relationship between ad hoc retrieval and expert finding in enterprise environment

Referral based expertise search system in a time evolving social network