skip to main content
10.1145/775047.775138acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Discovering word senses from text

Published:23 July 2002Publication History

ABSTRACT

Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning words to their most similar clusters. After assigning an element to a cluster, we remove their overlapping features from the element. This allows CBC to discover the less frequent senses of a word and to avoid discovering duplicate senses. Each cluster that a word belongs to represents one of its senses. We also present an evaluation methodology for automatically measuring the precision and recall of discovered senses.

References

  1. Cutting, D. R.; Karger, D.; Pedersen, J.; and Tukey, J. W. 1992. Scatter/Gather: A cluster-based approach to browsing large document collections. In roceedings of SIGI - 2. pp. 318--329. Copenhagen, Denmark. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Guha, S.; Rastogi, R.; and Kyuseok, S. 1999. ROCK: A robust clustering algorithm for categorical attributes. In roceedings of ICD. pp. 512--521. Sydney, Australia. Google ScholarGoogle Scholar
  3. Harris, Z. 1985. Distributional structure. In: Katz, J. J. (ed.) he hilosophy of inguistics. New York: Oxford University Press. pp. 26--47.Google ScholarGoogle Scholar
  4. Hindle, D. 1990. Noun classification from predicate-argument structures. In roceedings of C - 0. pp. 268--275. Pittsburgh, PA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hutchins, J. and Sommers, H. 1992. Introduction to achine ranslation,. Academic Press.Google ScholarGoogle Scholar
  6. Jain, A. K.; Murty, M. N.; and Flynn, P. J. 1999. Data clustering: A review. ACM Computing Surveys 31(3):264--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Karypis, G.; Han, E.-H.; and Kumar, V. 1999. Chameleon: A hierarchical clustering algorithm using dynamic modeling. I Computer Special lssue on Data nalysis and ining 32(8):68--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Landauer, T. K., and Dumais, S. T. 1997. A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. sychological eview 104:211--240.Google ScholarGoogle Scholar
  9. Landes, S.; Leacock, C,; and Tengi, R. I. 1998. Building semantic concordances. In ord et n lectronic e ical Database, edited by C. Fellbaum. pp. 199--216. MIT Press.Google ScholarGoogle Scholar
  10. Lin, D. 1994. Principar - an efficient, broad-coverage, principle-based parser. roceedings of C I G-. pp. 42--48. Kyoto, Japan.Google ScholarGoogle Scholar
  11. Lin, D. 1997. Using syntactic dependency as local context to resolve word sense ambiguity. In roceedings of C-. pp. 64--71. Madrid, Spain.Google ScholarGoogle Scholar
  12. Lin, D. 1998. Automatic retrieval and clustering of similar words. roceedings of C I G C -. pp. 768--774. Montreal, Canada. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lin, D. and Pantel, P. 2001. Induction of semantic classes from natural language text. In roceedings of SIGKDD-01. pp. 317--322. San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Manning, C. D. and Schütze, H. 1999. Foundations of Statistical atural anguage recessing. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Miller, G. 1990. WordNet: An online lexical database. International ournal of e icography, 1990.Google ScholarGoogle Scholar
  16. Pasca, M. and Harabagiu, S. 2001. The informative role of WordNet in Open-Domain Question Answering. In roceedings of C -01 orkshop on ord et and ther e ical esources, pp. 138--143. Pittsburgh, PA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Salton, G. and McGill, M. J. 1983. Introduction to odern Information etrieval. McGraw Hill. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Shaw Jr, W. M.; Burgin, R.; and Howell, P. 1997. Performance standards and evaluations in IR test collections: Cluster-based retrieval methods. Information recessing and anagement 33:1--14, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Steinbach, M.; Karypis, G.; and Kumar, V. 2000. A comparison of document clustering techniques, echnical eport 00-0. Department of Computer Science and Engineering, University of Minnesota.Google ScholarGoogle Scholar
  20. Voorhees, E. M. 1998. Using WordNet for text retrieval. In ord et n lectronic e ical Database, edited by C. Fellbaum. pp. 285--303. MIT Press.Google ScholarGoogle Scholar

Index Terms

  1. Discovering word senses from text

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
        July 2002
        719 pages
        ISBN:158113567X
        DOI:10.1145/775047

        Copyright © 2002 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 July 2002

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        KDD '02 Paper Acceptance Rate44of307submissions,14%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader