skip to main content
10.5555/1699571.1699617dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
research-article
Free Access

A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web

Published:06 August 2009Publication History

ABSTRACT

Semantic similarity is a central concept that extends across numerous fields such as artificial intelligence, natural language processing, cognitive science and psychology. Accurate measurement of semantic similarity between words is essential for various tasks such as, document clustering, information retrieval, and synonym extraction. We propose a novel model of semantic similarity using the semantic relations that exist among words. Given two words, first, we represent the semantic relations that hold between those words using automatically extracted lexical pattern clusters. Next, the semantic similarity between the two words is computed using a Mahalanobis distance measure. We compare the proposed similarity measure against previously proposed semantic similarity measures on Miller-Charles benchmark dataset and WordSimilarity-353 collection. The proposed method outperforms all existing web-based semantic similarity measures, achieving a Pearson correlation coefficient of 0.867 on the Millet-Charles dataset.

References

  1. M. Berland and E. Charniak. 1999. Finding parts in very large corpora. In Proc. of ACL'99, pages 57--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Bollegala, Y. Matsuo, and M. Ishizuka. 2007. Measuring semantic similarity between words using web search engines. In Proc. of WWW'07, pages 757--766. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Chen, M. Lin, and Y. Wei. 2006. Novel association measures using web search with double checking. In Proc. of the COLING/ACL '06, pages 1009--1016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. L. Cilibrasi and P. M. B. Vitanyi. 2007. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Curran. 2002. Ensemble menthods for automatic thesaurus extraction. In Proc. of EMNLP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Falkenhainer, K. D. Forbus, and D. Gentner. 1989. Structure mapping engine: Algorithm and examples. Artificial Intelligence, 41:1--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. 2002. Placing search in context: The concept revisited. ACM TOIS, 20:116--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Gabrilovich and S. Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proc. of IJCAI'07, pages 1606--1611. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. L. Goldstone. 1994. The role of similarity in categorization: providing a groundwork. Cognition, 52:125--157.Google ScholarGoogle ScholarCross RefCross Ref
  10. U. Hahn, N. Chater, and L. B. Richardson. 2003. Similarity as transformation. Cognition, 87:1--32.Google ScholarGoogle ScholarCross RefCross Ref
  11. Z. Harris. 1954. Distributional structure. Word, 10:146--162.Google ScholarGoogle ScholarCross RefCross Ref
  12. M. A. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proc. of 14th COLING, pages 539--545. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Hirst and D. St-Onge. 1997. Lexical chains as representations of context for the detection and correction of malapropisms.Google ScholarGoogle Scholar
  14. M. Jarmasz. 1993. Roget's thesaurus as a lexical resource for natural language processing. Master's thesis, University of Ottawa.Google ScholarGoogle Scholar
  15. J. J. Jiang and D. W. Conrath. 1998. Semantic similarity based on corpus statistics and lexical taxonomy. In Proc. of ROCLING'98.Google ScholarGoogle Scholar
  16. C. L. Krumhansl. 1978. Concerning the applicability of geometric models to similarity data: The interrelationship between similarity and spatial density. Psychological Review, 85:445--463.Google ScholarGoogle ScholarCross RefCross Ref
  17. C. Leacock and M. Chodorow. 1998. Combining Local Context and WordNet Similarity for Word Sense Identification. MIT.Google ScholarGoogle Scholar
  18. M. Li, X. Chen, X. Li, B. Ma, and P. M. B. Vitanyi. 2004. The similarity metric. IEEE Transactions on Information Theory, 50(12):3250--3264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Lin. 1998a. Automatic retreival and clustering of similar words. In Proc. of the 17th COLING, pages 768--774. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Lin. 1998b. An information-theoretic definition of similarity. In Proc. of the 15th ICML, pages 296--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Miller and W. Charles. 1998. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1--28.Google ScholarGoogle ScholarCross RefCross Ref
  22. J. Pei, J. Han, B. Mortazavi-Asi, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. 2004. Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11):1424--1440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Rada, H. Mili, E. Bichnell, and M. Blettner. 1989. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 9(1):17--30.Google ScholarGoogle ScholarCross RefCross Ref
  24. P. Resnik. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proc. of IJCAI'95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Sahami and T. Heilman. 2006. A web-based kernel function for measuring the similarity of short text snippets. In Proc. of WWW'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. V. Schickel-Zuber and B. Faltings. 2007. Oss: A semantic similarity function based on hierarchical ontologies. In Proc. of IJCAI'07, pages 551--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Strube and S. P. Ponzetto. 2006. Wikirelate! computing semantic relatedness using wikipedia. In Proc. of AAAI' 06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. B. Tenenbaum. 1999. Bayesian modeling of human concept learning. In NIPS'99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Tversky. 1977. Features of similarity. Psychological Review, 84:327--652.Google ScholarGoogle ScholarCross RefCross Ref
  30. D. McLean Y. Li, Zuhair A. Bandar. 2003. An approch for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15(4):871--882. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image DL Hosted proceedings
            EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
            August 2009
            616 pages
            ISBN:9781932432626

            Publisher

            Association for Computational Linguistics

            United States

            Publication History

            • Published: 6 August 2009

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate73of234submissions,31%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader