research-article

A general evaluation measure for document organization tasks

Authors:
Enrique Amigó

UNED, Madrid, Spain

UNED, Madrid, Spain
View Profile

,
Julio Gonzalo

UNED, Madrid, Spain

UNED, Madrid, Spain
View Profile

,
Felisa Verdejo

UNED, Madrid, Spain

UNED, Madrid, Spain
View Profile

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalJuly 2013Pages 643–652https://doi.org/10.1145/2484028.2484081

Published:28 July 2013Publication History

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pages 643–652

ABSTRACT

A number of key Information Access tasks -- Document Retrieval, Clustering, Filtering, and their combinations -- can be seen as instances of a generic {\em document organization} problem that establishes priority and relatedness relationships between documents (in other words, a problem of forming and ranking clusters). As far as we know, no analysis has been made yet on the evaluation of these tasks from a global perspective. In this paper we propose two complementary evaluation measures -- Reliability and Sensitivity -- for the generic Document Organization task which are derived from a proposed set of formal constraints (properties that any suitable measure must satisfy).

In addition to be the first measures that can be applied to any mixture of ranking, clustering and filtering tasks, Reliability and Sensitivity satisfy more formal constraints than previously existing evaluation metrics for each of the subsumed tasks. Besides their formal properties, its most salient feature from an empirical point of view is their strictness: a high score according to the harmonic mean of Reliability and Sensitivity ensures a high score with any of the most popular evaluation metrics in all the Document Retrieval, Clustering and Filtering datasets used in our experiments.

References

E. Amigó, J. Artiles, J. Gonzalo, D. Spina, B. Liu, and A. Corujo. WePS3 Evaluation Campaign: Overview of the On-line Reputation Management Task. In 2nd Web People Search Evaluation Workshop (WePS 2010), CLEF 2010 Conference, Padova Italy, 2010.Google Scholar
E. Amigó, J. Gonzalo, J. Artiles, and F. Verdejo. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr., 12:461--486, August 2009. Google ScholarDigital Library
E. Amigó, J. Gonzalo, and F. Verdejo. A comparison of evaluation metrics for document filtering. In Proceedings of CLEF'11, CLEF'11, pages 38--49, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarDigital Library
C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '04, pages 25--32, New York, NY, USA, 2004. ACM. Google ScholarDigital Library
B. Carterette. System effectiveness, user models, and user utility: a conceptual framework for investigation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, SIGIR '11, pages 903--912, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
B. Carterette, E. Kanoulas, and E. Yilmaz. Simulating simple user behavior for system effectiveness evaluation. In CIKM, pages 611--620, 2011. Google ScholarDigital Library
O. Chapelle and Y. Zhang. A dynamic bayesian network click model for web search ranking. In WWW, pages 1--10, 2009. Google ScholarDigital Library
J. M. Cigarrán, A. Peñas, J. Gonzalo, and F. Verdejo. Automatic selection of noun phrases as document descriptors in an fca-based information retrieval system. In ICFCA, pages 49--63, 2005. Google ScholarDigital Library
C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008. Google ScholarDigital Library
J. Cohen. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement, 20(1):37, 1960.Google ScholarCross Ref
G. Cormack and T. Lynam. Trec 2005 spam track overview. In Proceedings of the fourteenth Text Retrieval Conference 8TREC 2005), 2005.Google Scholar
G. V. Cormack and T. R. Lynam. TREC 2005 Spam Track Overview. In Proceedings of the fourteenth Text REtrieval Conference (TREC-2005), 2005.Google Scholar
M. Halkidi, Y. Batistakis, and M. Vazirgiannis. On Clustering Validation Techniques. Journal of Intelligent Information Systems, 17(2--3):107--145, 2001. Google ScholarDigital Library
M. A. Hearst and J. O. Pedersen. Reexamining the cluster hypothesis: Scatter/gather on retrieval results. pages 76--84, 1996. Google ScholarDigital Library
B. Hu, Y. Zhang, W. Chen, G. Wang, and Q. Yang. Characterizing search intent diversity into click models. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 17--26, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
K. J\"arvelin and J. Kek\"al\"ainen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20:422--446, October 2002. Google ScholarDigital Library
B. Krishnamurthy, P. Gill, and M. Arlitt. A few chirps about twitter. In WOSP '08: Proceedings of the first workshop on Online social networks, pages 19--24, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
A. Leuski. Evaluating document clustering for interactive information retrieval. In CIKM, pages 33--40, 2001. Google ScholarDigital Library
M. Meila. Comparing clusterings. In Proceedings of COLT 03, 2003.Google Scholar
T. M. Mitchell. Machine learning. McGraw Hill, New York, 1997. Google ScholarDigital Library
A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst., 27(1):2:1--2:27, Dec. 2008. Google ScholarDigital Library
A. Rosenberg and J. Hirschberg. V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of EMNLP-CoNLL 2007, pages 410--420, 2007.Google Scholar
M. D. Smucker and C. L. Clarke. Time-based calibration of effectiveness measures. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '12, pages 95--104, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
S. Vargas and P. Castells. Rank and relevance in novelty and diversity metrics for recommender systems. In 5th ACM Conference on Recommender Systems (RecSys 2011), pages 109--116, Chicago, Illinois, October 2011. Google ScholarDigital Library
E. M. Voorhees. The trec-8 question answering track report. In In Proceedings of TREC-8, pages 77--82, 1999.Google Scholar

Index Terms

A general evaluation measure for document organization tasks
1. Hardware
  1. Robustness

Recommendations

Measurement of clustering effectiveness for document collections
Abstract
Clustering of the contents of a document corpus is used to create sub-corpora with the intention that they are expected to consist of documents that are related to each other. However, while clustering is used in a variety of ways in document ...
Read More
A new suffix tree similarity measure for document clustering
WWW '07: Proceedings of the 16th international conference on World Wide Web

In this paper, we propose a new similarity measure to compute the pairwise similarity of text-based documents based on suffix tree document model. By applying the new suffix tree similarity measure in Group-average Agglomerative Hierarchical Clustering (...
Read More
Pairwise-adaptive dissimilarity measure for document clustering

This paper introduces a novel pairwise-adaptive dissimilarity measure for large high dimensional document datasets that improves the unsupervised clustering quality and speed compared to the original cosine dissimilarity measure. This measure ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
July 2013
1188 pages
ISBN:9781450320344
DOI:10.1145/2484028
General Chairs:
Gareth J.F. Jones
Dublin City University, Ireland
,
Páraic Sheridan
Dublin City University, Ireland
,
Program Chairs:
Diane Kelly
University of North Carolina, Chapel Hill, USA
,
Maarten de Rijke
University of Amsterdam, The Netherlands
,
Tetsuya Sakai
Microsoft Research Asia, China
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 July 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
effectiveness measures
Qualifiers
- research-article
Conference

Acceptance Rates
SIGIR '13 Paper Acceptance Rate73of366submissions,20%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 65
  Total Citations
  View Citations
- 634
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A general evaluation measure for document organization tasks

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Measurement of clustering effectiveness for document collections

A new suffix tree similarity measure for document clustering

Pairwise-adaptive dissimilarity measure for document clustering