Article

Discovering word senses from text

Authors:
Patrick Pantel

University of Alberta, Edmonton, Alberta, Canada

University of Alberta, Edmonton, Alberta, Canada
View Profile

,
Dekang Lin

University of Alberta, Edmonton, Alberta, Canada

University of Alberta, Edmonton, Alberta, Canada
View Profile

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2002Pages 613–619https://doi.org/10.1145/775047.775138

Published:23 July 2002Publication History

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 613–619

ABSTRACT

Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning words to their most similar clusters. After assigning an element to a cluster, we remove their overlapping features from the element. This allows CBC to discover the less frequent senses of a word and to avoid discovering duplicate senses. Each cluster that a word belongs to represents one of its senses. We also present an evaluation methodology for automatically measuring the precision and recall of discovered senses.

References

Cutting, D. R.; Karger, D.; Pedersen, J.; and Tukey, J. W. 1992. Scatter/Gather: A cluster-based approach to browsing large document collections. In roceedings of SIGI - 2. pp. 318--329. Copenhagen, Denmark. Google ScholarDigital Library
Guha, S.; Rastogi, R.; and Kyuseok, S. 1999. ROCK: A robust clustering algorithm for categorical attributes. In roceedings of ICD. pp. 512--521. Sydney, Australia. Google Scholar
Harris, Z. 1985. Distributional structure. In: Katz, J. J. (ed.) he hilosophy of inguistics. New York: Oxford University Press. pp. 26--47.Google Scholar
Hindle, D. 1990. Noun classification from predicate-argument structures. In roceedings of C - 0. pp. 268--275. Pittsburgh, PA. Google ScholarDigital Library
Hutchins, J. and Sommers, H. 1992. Introduction to achine ranslation,. Academic Press.Google Scholar
Jain, A. K.; Murty, M. N.; and Flynn, P. J. 1999. Data clustering: A review. ACM Computing Surveys 31(3):264--323. Google ScholarDigital Library
Karypis, G.; Han, E.-H.; and Kumar, V. 1999. Chameleon: A hierarchical clustering algorithm using dynamic modeling. I Computer Special lssue on Data nalysis and ining 32(8):68--75. Google ScholarDigital Library
Landauer, T. K., and Dumais, S. T. 1997. A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. sychological eview 104:211--240.Google Scholar
Landes, S.; Leacock, C,; and Tengi, R. I. 1998. Building semantic concordances. In ord et n lectronic e ical Database, edited by C. Fellbaum. pp. 199--216. MIT Press.Google Scholar
Lin, D. 1994. Principar - an efficient, broad-coverage, principle-based parser. roceedings of C I G-. pp. 42--48. Kyoto, Japan.Google Scholar
Lin, D. 1997. Using syntactic dependency as local context to resolve word sense ambiguity. In roceedings of C-. pp. 64--71. Madrid, Spain.Google Scholar
Lin, D. 1998. Automatic retrieval and clustering of similar words. roceedings of C I G C -. pp. 768--774. Montreal, Canada. Google ScholarDigital Library
Lin, D. and Pantel, P. 2001. Induction of semantic classes from natural language text. In roceedings of SIGKDD-01. pp. 317--322. San Francisco, CA. Google ScholarDigital Library
Manning, C. D. and Schütze, H. 1999. Foundations of Statistical atural anguage recessing. MIT Press. Google ScholarDigital Library
Miller, G. 1990. WordNet: An online lexical database. International ournal of e icography, 1990.Google Scholar
Pasca, M. and Harabagiu, S. 2001. The informative role of WordNet in Open-Domain Question Answering. In roceedings of C -01 orkshop on ord et and ther e ical esources, pp. 138--143. Pittsburgh, PA. Google ScholarDigital Library
Salton, G. and McGill, M. J. 1983. Introduction to odern Information etrieval. McGraw Hill. Google ScholarDigital Library
Shaw Jr, W. M.; Burgin, R.; and Howell, P. 1997. Performance standards and evaluations in IR test collections: Cluster-based retrieval methods. Information recessing and anagement 33:1--14, 1997. Google ScholarDigital Library
Steinbach, M.; Karypis, G.; and Kumar, V. 2000. A comparison of document clustering techniques, echnical eport 00-0. Department of Computer Science and Engineering, University of Minnesota.Google Scholar
Voorhees, E. M. 1998. Using WordNet for text retrieval. In ord et n lectronic e ical Database, edited by C. Fellbaum. pp. 285--303. MIT Press.Google Scholar

Index Terms

Discovering word senses from text
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Finding predominant word senses in untagged text
ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics

In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The problem with using the predominant, or first sense heuristic, aside from the ...
Read More
Discovering corpus-specific word senses
EACL '03: Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2

This paper presents an unsupervised algorithm which automatically discovers word senses from text. The algorithm is based on a graph model representing words and relationships between them. Sense clusters are iteratively computed by clustering the local ...
Read More
Word Sense Discovery for Web Information Retrieval
ICDMW '08: Proceedings of the 2008 IEEE International Conference on Data Mining Workshops

Word meaning disambiguation has always been an important problem in many computer science tasks, such as information retrieval and extraction. One of the problems,faced in automatic word sense discovery, is the number of different senses a word can ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
July 2002
719 pages
ISBN:158113567X
DOI:10.1145/775047
Conference Chair:
Osmar R. Zaïane
University of Alberta, Canada
,
General Chair:
Randy Goebel
University of Alberta, Canada
,
Program Chairs:
David Hand
Imperial College, UK
,
Daniel Keim
AT&T
,
Raymond Ng
University of British Columbia, Canada
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
evaluation
machine learning
word sense discovery
Qualifiers
- Article
Conference

Acceptance Rates
KDD '02 Paper Acceptance Rate44of307submissions,14%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 264
  Total Citations
  View Citations
- 1,928
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Discovering word senses from text

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Finding predominant word senses in untagged text

Discovering corpus-specific word senses

Word Sense Discovery for Web Information Retrieval