Skip to main content
Log in

Unsupervised WSD by Finding the Predominant Sense Using Context as a Dynamic Thesaurus

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

We present and analyze an unsupervised method for Word Sense Disambiguation (WSD). Our work is based on the method presented by McCarthy et al. in 2004 for finding the predominant sense of each word in the entire corpus. Their maximization algorithm allows weighted terms (similar words) from a distributional thesaurus to accumulate a score for each ambiguous word sense, i.e., the sense with the highest score is chosen based on votes from a weighted list of terms related to the ambiguous word. This list is obtained using the distributional similarity method proposed by Lin Dekang to obtain a thesaurus. In the method of McCarthy et al., every occurrence of the ambiguous word uses the same thesaurus, regardless of the context where the ambiguous word occurs. Our method accounts for the context of a word when determining the sense of an ambiguous word by building the list of distributed similar words based on the syntactic context of the ambiguous word. We obtain a top precision of 77.54% of accuracy versus 67.10% of the original method tested on SemCor. We also analyze the effect of the number of weighted terms in the tasks of finding the Most Frecuent Sense (MFS) and WSD, and experiment with several corpora for building the Word Space Model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Schütze H. Dimensions of meaning. In Proc. ACM/IEEE Conference on Supercomputing (Supercomputing 1992), Mannheim, Germany, June, 1992, pp.787–796.

  2. Karlgren J, Sahlgren M. From Words to Understanding. Foundations of Real-World Intelligence, Stanford: CSLI Publications, 2001, pp.294–308.

    Google Scholar 

  3. McCarthy D, Koeling R, Weeds J et al. Finding predominant word senses in untagged text. In Proc. the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, 2004.

  4. Lin D. Automatic retrieval and clustering of similar words. In Proc. the 17th Int. Conf. Computational Linguistics, Montreal, Canada, Aug. 10-14, 1998, pp.768–774.

  5. Kilgarriff A, Rosenzweig J. English SENSEVAL: Report and results. In Proc. LREC, Athens, May-June 2000.

  6. Patwardhan S, Banerjee S, Pedersen T. Using measures of semantic relatedness for word sense disambiguation. In Proc. the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, Mexico, 2003, pp.241–257.

  7. Sahlgren M. The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces [Ph.D. Dissertation]. Department of Linguistics, Stockholm University, 2006.

  8. Lin D. Dependency-based evaluation of MINIPAR. In Proc. Workshop on the Evaluation of Parsing Systems at LREC, Granada, Spain, 1998, pp.317–330.

  9. Hays D. Dependency theory: A formalism and some observations. Language, 1964, 40(4): 511–525.

    Article  Google Scholar 

  10. Mel'čuk I A. Dependency Syntax: Theory and Practice. State University of New York Press, Albany, N.Y., 1987.

  11. Pedersen T, Patwardhan S, Michelizzi J.WordNet::Similarity: Measuring the relatedness of concepts. In Proc. the Nineteenth National Conference on Arti¯cial Intelligence (AAAI-2004), San Jose, CA, 2004, pp.1024–1025.

  12. Miller G. Introduction to WordNet: An On-line Lexical Database. Princeton Univesity, 1993.

  13. Miller G. WordNet: An on-line lexical database. International Journal of Lexicography, 1990, 3(4): 235–244.

    Article  Google Scholar 

  14. Resnik P. Using information content to evaluate semantic similarity in a taxonomy. In Proc. the 14th Intern-tional Joint Conference on Artificial Intelligence, Montreal, Canada, Aug. 20-25, 1995, pp.448–453.

  15. Jiang J J, Conrath D W. Semantic similarity based on corpus statistics and lexical taxonomy. In Proc. International Conference on Research in Computational Linguistics, Taiwan, China, Sept. 1997, pp.19–33.

  16. Leacock C, Chodorow M. Combining Local Context and WordNet Similarity for Word Sense Identification. Word-Net: An Electronic Lexical Database, Fellbaum C (ed.), 1998, pp.265–283.

  17. Tejada J, Gelbukh A, Calvo H. Unsupervised WSD with a dynamic thesaurus. In Proc. the 11th International Conference on Text, Speech and Dialogue (TSD 2008), Brno, Czech, Sept. 8-12, 2008, pp.201–210.

  18. Tejada J, Gelbukh A, Calvo H. An innovative two-stage WSD unsupervised method. SEPLN Journal, March 2008, 40: 99–105.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Tejada-Cárcamo.

Additional information

Supported by the Mexican Government (SNI, SIP-IPN, COFAA-IPN, and PIFI-IPN), CONACYT and the Japanese Government.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tejada-Cárcamo, J., Calvo, H., Gelbukh, A. et al. Unsupervised WSD by Finding the Predominant Sense Using Context as a Dynamic Thesaurus. J. Comput. Sci. Technol. 25, 1030–1039 (2010). https://doi.org/10.1007/s11390-010-9385-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-010-9385-2

Keywords

Navigation