State of the art versus classical clustering for unsupervised word sense disambiguation

Popescu, Marius; Hristea, Florentina

doi:10.1007/s10462-010-9193-7

State of the art versus classical clustering for unsupervised word sense disambiguation

Published: 29 December 2010

Volume 35, pages 241–264, (2011)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Marius Popescu¹ &
Florentina Hristea¹

195 Accesses
8 Citations
Explore all metrics

Abstract

This paper ultimately discusses the importance of the clustering method used in unsupervised word sense disambiguation. It illustrates the fact that a powerful clustering technique can make up for lack of external knowledge of all types. It argues that feature selection does not always improve disambiguation results, especially when using an advanced, state of the art method, hereby exemplified by spectral clustering. Disambiguation results obtained when using spectral clustering in the case of the main parts of speech (nouns, adjectives, verbs) are compared to those of the classical clustering method given by the Naïve Bayes model. In the case of unsupervised word sense disambiguation with an underlying Naïve Bayes model feature selection performed in two completely different ways is surveyed. The type of feature selection providing the best results (WordNet-based feature selection) is equally being used in the case of spectral clustering. The conclusion is that spectral clustering without feature selection (but using its own feature weighting) produces superior disambiguation results in the case of all parts of speech.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agirre, E, Edmonds, P (eds) (2006) Word sense disambiguation. Algorithms and applications. Springer, The Netherlands
Google Scholar
Banerjee S, Pedersen T (2002) An adapted Lesk algorithm for word sense disambiguation using WordNet. In: Proceedings of the third international conference on intelligent text processing and computational linguistics, Mexico City, February 17–23, pp 136–145
Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the eighteenth international joint conference on artificial intelligence, Acapulco, Mexico, pp 805–810
Bruce R, Wiebe J (1994) Word sense disambiguation using decomposable models. In: Proceedings of the 32nd meeting of the association for computational linguistics, Las Cruces, New Mexico, pp 139–146
Bruce R, Wiebe J, Pedersen T (1996) The measure of a model. In: Proceedings of the conference on empirical methods in natural language processing, Philadelphia, PA, pp 101–112
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39(1): 1–38
MathSciNet MATH Google Scholar
Fellbaum, C (eds) (1998) WordNet: an electronic lexical database. The MIT Press, Cambridge
MATH Google Scholar
Gale WA, Church KW, Yarowsky D (1992) A method for disambiguating word senses in a large corpus. Comput Humanit 26(5–6): 415–439
Article Google Scholar
Han H, Zha H, Giles L (2005) Name disambiguation in author citations using a K-way spectral clustering method. In: Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries, Denver, CO, USA, pp 334–343
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction. 2. Springer, The Netherlands
Google Scholar
Hristea F (2009) Recent advances concerning the usage of the Naïve Bayes Model in unsupervised word sense disambiguation. Int Rev Comput Softw 4(1): 58–67
Google Scholar
Hristea F, Popescu M (2009) Adjective sense disambiguation at the border between unsupervised and knowledge-based techniques. Fundam Inform 91(3–4): 547–562
MathSciNet MATH Google Scholar
Hristea F, Popescu M, Dumitrescu M (2008) Performing word sense disambiguation at the border between unsupervised and knowledge-based techniques. Artif Intell Rev 30(1–4): 67–86
Article Google Scholar
Leacock C, Towell G, Voorhees E (1993) Corpus-based statistical sense resolution. In: Proceedings of the ARPA workshop on human language technology, Princeton, New Jersey pp 260–265
Lesk M (1986) Automatic sense disambiguation: how to tell a pine cone from an ice cream cone. In: Proceedings of the 1986 SIGDOC conference, New York, Association for Computing Machinery, pp 24–26
Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4): 395–416
Article MathSciNet Google Scholar
Maier M, Hein M, von Luxburg U (2009) Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters. Theor Comput Sci 410(19): 1749–1764
Article MATH Google Scholar
Manning C, Schütze H (1999) Foundations of statistical natural language processing. The MIT Press, Cambridge
MATH Google Scholar
Màrquez L, Exsudero G, Martínez D, Rigau G (2006) Supervised corpus-based methods for wsd. In: Word sense disambiguation: algorithms and applications, text, speech and language technology, vol 33, Springer, Dordrecht, pp 167–216
Miller GA (1990) Nouns in WordNet: a lexical inheritance system. Int J Lexicogr 3(4): 245–264
Article Google Scholar
Miller GA (1995) WordNet: a lexical database. Commun ACM 38(11): 39–41
Article Google Scholar
Miller GA, Beckwith R, Fellbaum C, Gross D, Miller K (1990) WordNet: an on-line lexical database. J Lexicogr 3(4): 234–244
Google Scholar
Pedersen T (2006) Unsupervised corpus-based methods for WSD. In: Agirre E, Edmonds P (eds) Word sense disambiguation algorithms and applications. Springer, the Netherlands, pp 133–166
Chapter Google Scholar
Pedersen T, Bruce R (1997) Distinguishing word senses in untagged text. In: Proceedings of the second conference on empirical methods in natural language processing (EMNLP-2), Providence, Rhode Island pp 197–207
Pedersen T, Bruce R (1998) Knowledge lean word-sense disambiguation. In: Proceedings of the 15th national conference on artificial intelligence, Madison, Wisconsin, pp 800–805
Schütze H (1998) Automatic word-sense discrimination. Comput Linguist 24(1): 97–123
Google Scholar

Download references

Author information

Authors and Affiliations

Academiei 14, Str., Sector 1, C.P. 010014, Bucharest, Romania
Marius Popescu & Florentina Hristea

Authors

Marius Popescu
View author publications
You can also search for this author in PubMed Google Scholar
Florentina Hristea
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marius Popescu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Popescu, M., Hristea, F. State of the art versus classical clustering for unsupervised word sense disambiguation. Artif Intell Rev 35, 241–264 (2011). https://doi.org/10.1007/s10462-010-9193-7

Download citation

Published: 29 December 2010
Issue Date: March 2011
DOI: https://doi.org/10.1007/s10462-010-9193-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

State of the art versus classical clustering for unsupervised word sense disambiguation

Abstract

Access this article

Similar content being viewed by others

The Naïve Bayes Model in the Context of Word Sense Disambiguation

Cross-Lingual Word Sense Clustering for Sense Disambiguation

WSD-TIC: Word Sense Disambiguation Using Taxonomic Information Content

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

State of the art versus classical clustering for unsupervised word sense disambiguation

Abstract

Access this article

Similar content being viewed by others

The Naïve Bayes Model in the Context of Word Sense Disambiguation

Cross-Lingual Word Sense Clustering for Sense Disambiguation

WSD-TIC: Word Sense Disambiguation Using Taxonomic Information Content

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation