Unsupervised WSD by Finding the Predominant Sense Using Context as a Dynamic Thesaurus

Tejada-Cárcamo, Javier; Calvo, Hiram; Gelbukh, Alexander; Hara, Kazuo

doi:10.1007/s11390-010-9385-2

Unsupervised WSD by Finding the Predominant Sense Using Context as a Dynamic Thesaurus

Regular Paper
Published: 10 September 2010

Volume 25, pages 1030–1039, (2010)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Javier Tejada-Cárcamo¹,
Hiram Calvo^2,3,
Alexander Gelbukh² &
…
Kazuo Hara³

93 Accesses
4 Citations
Explore all metrics

Abstract

We present and analyze an unsupervised method for Word Sense Disambiguation (WSD). Our work is based on the method presented by McCarthy et al. in 2004 for finding the predominant sense of each word in the entire corpus. Their maximization algorithm allows weighted terms (similar words) from a distributional thesaurus to accumulate a score for each ambiguous word sense, i.e., the sense with the highest score is chosen based on votes from a weighted list of terms related to the ambiguous word. This list is obtained using the distributional similarity method proposed by Lin Dekang to obtain a thesaurus. In the method of McCarthy et al., every occurrence of the ambiguous word uses the same thesaurus, regardless of the context where the ambiguous word occurs. Our method accounts for the context of a word when determining the sense of an ambiguous word by building the list of distributed similar words based on the syntactic context of the ambiguous word. We obtain a top precision of 77.54% of accuracy versus 67.10% of the original method tested on SemCor. We also analyze the effect of the number of weighted terms in the tasks of finding the Most Frecuent Sense (MFS) and WSD, and experiment with several corpora for building the Word Space Model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Path and Information Content-Based Structural Word Sense Disambiguation

Semantic Unsupervised Learning for Word Sense Disambiguation

A New Approach to the Supervised Word Sense Disambiguation

References

Schütze H. Dimensions of meaning. In Proc. ACM/IEEE Conference on Supercomputing (Supercomputing 1992), Mannheim, Germany, June, 1992, pp.787–796.
Karlgren J, Sahlgren M. From Words to Understanding. Foundations of Real-World Intelligence, Stanford: CSLI Publications, 2001, pp.294–308.
Google Scholar
McCarthy D, Koeling R, Weeds J et al. Finding predominant word senses in untagged text. In Proc. the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, 2004.
Lin D. Automatic retrieval and clustering of similar words. In Proc. the 17th Int. Conf. Computational Linguistics, Montreal, Canada, Aug. 10-14, 1998, pp.768–774.
Kilgarriff A, Rosenzweig J. English SENSEVAL: Report and results. In Proc. LREC, Athens, May-June 2000.
Patwardhan S, Banerjee S, Pedersen T. Using measures of semantic relatedness for word sense disambiguation. In Proc. the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, Mexico, 2003, pp.241–257.
Sahlgren M. The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces [Ph.D. Dissertation]. Department of Linguistics, Stockholm University, 2006.
Lin D. Dependency-based evaluation of MINIPAR. In Proc. Workshop on the Evaluation of Parsing Systems at LREC, Granada, Spain, 1998, pp.317–330.
Hays D. Dependency theory: A formalism and some observations. Language, 1964, 40(4): 511–525.
Article Google Scholar
Mel'čuk I A. Dependency Syntax: Theory and Practice. State University of New York Press, Albany, N.Y., 1987.
Pedersen T, Patwardhan S, Michelizzi J.WordNet::Similarity: Measuring the relatedness of concepts. In Proc. the Nineteenth National Conference on Arti¯cial Intelligence (AAAI-2004), San Jose, CA, 2004, pp.1024–1025.
Miller G. Introduction to WordNet: An On-line Lexical Database. Princeton Univesity, 1993.
Miller G. WordNet: An on-line lexical database. International Journal of Lexicography, 1990, 3(4): 235–244.
Article Google Scholar
Resnik P. Using information content to evaluate semantic similarity in a taxonomy. In Proc. the 14th Intern-tional Joint Conference on Artificial Intelligence, Montreal, Canada, Aug. 20-25, 1995, pp.448–453.
Jiang J J, Conrath D W. Semantic similarity based on corpus statistics and lexical taxonomy. In Proc. International Conference on Research in Computational Linguistics, Taiwan, China, Sept. 1997, pp.19–33.
Leacock C, Chodorow M. Combining Local Context and WordNet Similarity for Word Sense Identification. Word-Net: An Electronic Lexical Database, Fellbaum C (ed.), 1998, pp.265–283.
Tejada J, Gelbukh A, Calvo H. Unsupervised WSD with a dynamic thesaurus. In Proc. the 11th International Conference on Text, Speech and Dialogue (TSD 2008), Brno, Czech, Sept. 8-12, 2008, pp.201–210.
Tejada J, Gelbukh A, Calvo H. An innovative two-stage WSD unsupervised method. SEPLN Journal, March 2008, 40: 99–105.
Google Scholar

Download references

Author information

Authors and Affiliations

San Pablo Catholic University, Arequipa, Peru
Javier Tejada-Cárcamo
Center for Computing Research, National Polytechnic Institute, Mexico City, 07738, Mexico
Hiram Calvo & Alexander Gelbukh
Nara Institute of Science and Technology, Takayama, Ikoma, Nara, 630-0192, Japan
Hiram Calvo & Kazuo Hara

Authors

Javier Tejada-Cárcamo
View author publications
You can also search for this author in PubMed Google Scholar
Hiram Calvo
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar
Kazuo Hara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javier Tejada-Cárcamo.

Additional information

Supported by the Mexican Government (SNI, SIP-IPN, COFAA-IPN, and PIFI-IPN), CONACYT and the Japanese Government.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tejada-Cárcamo, J., Calvo, H., Gelbukh, A. et al. Unsupervised WSD by Finding the Predominant Sense Using Context as a Dynamic Thesaurus. J. Comput. Sci. Technol. 25, 1030–1039 (2010). https://doi.org/10.1007/s11390-010-9385-2

Download citation

Received: 12 June 2009
Revised: 23 June 2010
Published: 10 September 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s11390-010-9385-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised WSD by Finding the Predominant Sense Using Context as a Dynamic Thesaurus

Abstract

Access this article

Similar content being viewed by others

Path and Information Content-Based Structural Word Sense Disambiguation

Semantic Unsupervised Learning for Word Sense Disambiguation

A New Approach to the Supervised Word Sense Disambiguation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised WSD by Finding the Predominant Sense Using Context as a Dynamic Thesaurus

Abstract

Access this article

Similar content being viewed by others

Path and Information Content-Based Structural Word Sense Disambiguation

Semantic Unsupervised Learning for Word Sense Disambiguation

A New Approach to the Supervised Word Sense Disambiguation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation