ABSTRACT
Semantic similarity is a central concept that extends across numerous fields such as artificial intelligence, natural language processing, cognitive science and psychology. Accurate measurement of semantic similarity between words is essential for various tasks such as, document clustering, information retrieval, and synonym extraction. We propose a novel model of semantic similarity using the semantic relations that exist among words. Given two words, first, we represent the semantic relations that hold between those words using automatically extracted lexical pattern clusters. Next, the semantic similarity between the two words is computed using a Mahalanobis distance measure. We compare the proposed similarity measure against previously proposed semantic similarity measures on Miller-Charles benchmark dataset and WordSimilarity-353 collection. The proposed method outperforms all existing web-based semantic similarity measures, achieving a Pearson correlation coefficient of 0.867 on the Millet-Charles dataset.
- M. Berland and E. Charniak. 1999. Finding parts in very large corpora. In Proc. of ACL'99, pages 57--64. Google ScholarDigital Library
- D. Bollegala, Y. Matsuo, and M. Ishizuka. 2007. Measuring semantic similarity between words using web search engines. In Proc. of WWW'07, pages 757--766. Google ScholarDigital Library
- H. Chen, M. Lin, and Y. Wei. 2006. Novel association measures using web search with double checking. In Proc. of the COLING/ACL '06, pages 1009--1016. Google ScholarDigital Library
- R. L. Cilibrasi and P. M. B. Vitanyi. 2007. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370--383. Google ScholarDigital Library
- J. Curran. 2002. Ensemble menthods for automatic thesaurus extraction. In Proc. of EMNLP. Google ScholarDigital Library
- B. Falkenhainer, K. D. Forbus, and D. Gentner. 1989. Structure mapping engine: Algorithm and examples. Artificial Intelligence, 41:1--63. Google ScholarDigital Library
- L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. 2002. Placing search in context: The concept revisited. ACM TOIS, 20:116--131. Google ScholarDigital Library
- E. Gabrilovich and S. Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proc. of IJCAI'07, pages 1606--1611. Google ScholarDigital Library
- R. L. Goldstone. 1994. The role of similarity in categorization: providing a groundwork. Cognition, 52:125--157.Google ScholarCross Ref
- U. Hahn, N. Chater, and L. B. Richardson. 2003. Similarity as transformation. Cognition, 87:1--32.Google ScholarCross Ref
- Z. Harris. 1954. Distributional structure. Word, 10:146--162.Google ScholarCross Ref
- M. A. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proc. of 14th COLING, pages 539--545. Google ScholarDigital Library
- G. Hirst and D. St-Onge. 1997. Lexical chains as representations of context for the detection and correction of malapropisms.Google Scholar
- M. Jarmasz. 1993. Roget's thesaurus as a lexical resource for natural language processing. Master's thesis, University of Ottawa.Google Scholar
- J. J. Jiang and D. W. Conrath. 1998. Semantic similarity based on corpus statistics and lexical taxonomy. In Proc. of ROCLING'98.Google Scholar
- C. L. Krumhansl. 1978. Concerning the applicability of geometric models to similarity data: The interrelationship between similarity and spatial density. Psychological Review, 85:445--463.Google ScholarCross Ref
- C. Leacock and M. Chodorow. 1998. Combining Local Context and WordNet Similarity for Word Sense Identification. MIT.Google Scholar
- M. Li, X. Chen, X. Li, B. Ma, and P. M. B. Vitanyi. 2004. The similarity metric. IEEE Transactions on Information Theory, 50(12):3250--3264. Google ScholarDigital Library
- D. Lin. 1998a. Automatic retreival and clustering of similar words. In Proc. of the 17th COLING, pages 768--774. Google ScholarDigital Library
- D. Lin. 1998b. An information-theoretic definition of similarity. In Proc. of the 15th ICML, pages 296--304. Google ScholarDigital Library
- G. Miller and W. Charles. 1998. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1--28.Google ScholarCross Ref
- J. Pei, J. Han, B. Mortazavi-Asi, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. 2004. Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11):1424--1440. Google ScholarDigital Library
- R. Rada, H. Mili, E. Bichnell, and M. Blettner. 1989. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 9(1):17--30.Google ScholarCross Ref
- P. Resnik. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proc. of IJCAI'95. Google ScholarDigital Library
- M. Sahami and T. Heilman. 2006. A web-based kernel function for measuring the similarity of short text snippets. In Proc. of WWW'06. Google ScholarDigital Library
- V. Schickel-Zuber and B. Faltings. 2007. Oss: A semantic similarity function based on hierarchical ontologies. In Proc. of IJCAI'07, pages 551--556. Google ScholarDigital Library
- M. Strube and S. P. Ponzetto. 2006. Wikirelate! computing semantic relatedness using wikipedia. In Proc. of AAAI' 06. Google ScholarDigital Library
- J. B. Tenenbaum. 1999. Bayesian modeling of human concept learning. In NIPS'99. Google ScholarDigital Library
- A. Tversky. 1977. Features of similarity. Psychological Review, 84:327--652.Google ScholarCross Ref
- D. McLean Y. Li, Zuhair A. Bandar. 2003. An approch for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15(4):871--882. Google ScholarDigital Library
Index Terms
- A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web
Recommendations
Measuring Semantic Similarity between Words Using HowNet
ICCSIT '08: Proceedings of the 2008 International Conference on Computer Science and Information TechnologySemantic similarity between words is a fundamental issue for many natural language processing applications. The difficulty lies in that how to develop a computational method that is capable of generating satisfactory results close to how humans ...
Measuring Semantic Relatedness between Words Using Lexical Context
CIS '11: Proceedings of the 2011 Seventh International Conference on Computational Intelligence and SecuritySemantic relatedness measurement between words is always a hot issue interested by many researchers. It can be applied to various tasks of NLP and IR with a big challenge. We propose a method for measuring semantic relatedness between words using lexical ...
Comments