Improving Unsupervised WSD with a Dynamic Thesaurus

Tejada-Cárcamo, Javier; Calvo, Hiram; Gelbukh, Alexander

doi:10.1007/978-3-540-87391-4_27

Javier Tejada-Cárcamo^1,2,
Hiram Calvo¹ &
Alexander Gelbukh¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5246))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

959 Accesses
1 Citations

Abstract

The method proposed by Diana McCarthy et al. [1] obtains the predominant sense for an ambiguous word based on a weighted list of terms related to the ambiguous word. This list of terms is obtained using the distributional similarity method proposed by Lin [2] to obtain a thesaurus. In that method, every occurrence of the ambiguous word uses the same thesaurus, regardless of the context where it occurs. Every different word to be disambiguated uses the same thesaurus. In this paper we explore a different method that accounts for the context of a word when determining the most frequent sense of an ambiguous word. In our method the list of distributed similar words is built based on the syntactic context of the ambiguous word. We attain a precision of 69.86%, which is 7% higher than the supervised baseline of using the MFS of 90% SemCor against the remaining 10% of SemCor.

Work done under partial support of Mexican Government (CONACyT, SNI), IPN (PIFI, SIP). The authors wish to thank Rada Mihalcea for her useful comments and discussion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

McCarthy, D., et al.: Finding predominant senses in untagged text. In: Proceedings of the 42^nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain (2004)
Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of COLING-ACL 1998, Montreal, Canada (1998)
Google Scholar
Gelbukh, A.: Using measures of semantic relatedness for word sense disambiguation. In: Patwardhan, S., Banerjee, S., Pedersen, T. (eds.) Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City (2003)
Google Scholar
Lin, D.: Dependency-based Evaluation of MINIPAR. In: Workshop on the Evaluation of Parsing Systems, Granada, Spain (1998)
Google Scholar
Hays, D.: Dependency theory: a formalism and some observations. Language 40, 511–525 (1964)
Article Google Scholar
Mel’čuk, I.A.: Dependency syntax; theory and practice. State University of New York Press, Albany (1987)
Google Scholar
Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet:Similarity – Measuring the Relatedness of Concepts. In: Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-2004), San Jose, CA, pp. 1024–1025 (2004)
Google Scholar
Miller, G.: Introduction to WordNet: An On-line Lexical Database. Princeton Univesity (1993)
Google Scholar
Miller, G.: WordNet: an On-Line Lexical Database. International Journal of Lexicography (1990)
Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14^th International Joint Conference on Artificial Intelligence, Montreal, pp. 448–453 (1995)
Google Scholar
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: International Conference on Research in Computational Linguistics, Taiwan (1997)
Google Scholar
Fellbaum, C.: Combining local context and WordNet similarity for word sense identification. In: Leacock, C., Chodorow, M. (eds.) WordNet: An electronic lexical database, pp. 265–283 (1998)
Google Scholar
Wilks, Y., Stevenson, M.: The Grammar of Sense: Is word-sense tagging much more than part-of-speech tagging, Sheffield Department of Computer Science (1996)
Google Scholar
Sahlgren, M.: The Word-Space Model Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. dissertation, Department of Linguistics, Stockholm University (2006)
Google Scholar
Kaplan, A.: An experimental study of ambiguity and context. Mechanical Translation 2, 39–46 (1955)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico City, 07738, México
Javier Tejada-Cárcamo, Hiram Calvo & Alexander Gelbukh
Sociedad Peruana de Computación, Arequipa, Perú
Javier Tejada-Cárcamo

Authors

Javier Tejada-Cárcamo
View author publications
You can also search for this author in PubMed Google Scholar
Hiram Calvo
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Gelbukh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Petr Sojka Aleš Horák Ivan Kopeček Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tejada-Cárcamo, J., Calvo, H., Gelbukh, A. (2008). Improving Unsupervised WSD with a Dynamic Thesaurus. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2008. Lecture Notes in Computer Science(), vol 5246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87391-4_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-87391-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87390-7
Online ISBN: 978-3-540-87391-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics