Abstract
Measuring semantic nearness of documents is important for accurate information retrieval, automated text categorization and classification. Inspired by the observation that text documents contain semantically coherent set of ideas/topics, this paper presents the design and experimental evaluation of a method to represent a text document as a set of concepts. Based on this, we propose a method to measure semantic nearness of texts. Our method makes use of WordNet which is a lexico-semantic network of words. We bypass word sense disambiguation. In order to show the effectiveness of our representation of texts, we compare experimental results of text classification and clustering with the results of classification and clustering with standard techniques.
Chapter PDF
Similar content being viewed by others
References
Lin, D.: Information Theoretic definition of similarity. In: Proc. 15th International Conf. on Machine Learning (1998)
Fellbaum, C.: WordNet, An Electronic Lexical Database. MIT press, Cambridge (1999)
Francis, Kucera: Computational Analysis of present day American English. Brown University press (1967)
Resnik, P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research (JAIR) 11, 95–130 (1999)
Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using Linear Algebra for Intelligent Information Retrieval. SIAM Review 37(4) (1995)
Jiang, Conrath: Semantic similarity based on Corpus statistics and lexical Taxonomy. In: Proceedings of International Conference Research on Computational Linguistics (1997)
Scott, S., Matwin, S.: Text classification using WordNet hypernyms. In: Proc. of the COLING/ACL Workshop on Usage of WordNet in Natural Language Processing Systems (1998)
Scott, S., Matwin, S.: WordNet improves text document clustering. In: Proc. of the Semantic Web Workshop at SIGIR-2003 (2003)
Rada, R., Milli, H., Bicknell, E., Blettner, M.: Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man and Cybernetics 1(9), 17–30 (1989)
Lesk, M.: Automatic sense disambiguation: How to tell a pine cone from an ice-cream cone. In: Proc. of the 1986 ACM SIGDOC conference, New York, pp. 24–26 (1986)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pandya, A., Bhattacharyya, P. (2005). Text Similarity Measurement Using Concept Representation of Texts. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2005. Lecture Notes in Computer Science, vol 3776. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11590316_109
Download citation
DOI: https://doi.org/10.1007/11590316_109
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30506-4
Online ISBN: 978-3-540-32420-1
eBook Packages: Computer ScienceComputer Science (R0)