research-article

Free Access

A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web

Authors:
Danushka Bollegala

The University of Tokyo, Hongo, Tokyo, Japan

The University of Tokyo, Hongo, Tokyo, Japan
View Profile

,
Yutaka Matsuo

The University of Tokyo, Hongo, Tokyo, Japan

The University of Tokyo, Hongo, Tokyo, Japan
View Profile

,
Mitsuru Ishizuka

The University of Tokyo, Hongo, Tokyo, Japan

The University of Tokyo, Hongo, Tokyo, Japan
View Profile

EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2August 2009Pages 803–812

Published:06 August 2009Publication History

EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Pages 803–812

ABSTRACT

Semantic similarity is a central concept that extends across numerous fields such as artificial intelligence, natural language processing, cognitive science and psychology. Accurate measurement of semantic similarity between words is essential for various tasks such as, document clustering, information retrieval, and synonym extraction. We propose a novel model of semantic similarity using the semantic relations that exist among words. Given two words, first, we represent the semantic relations that hold between those words using automatically extracted lexical pattern clusters. Next, the semantic similarity between the two words is computed using a Mahalanobis distance measure. We compare the proposed similarity measure against previously proposed semantic similarity measures on Miller-Charles benchmark dataset and WordSimilarity-353 collection. The proposed method outperforms all existing web-based semantic similarity measures, achieving a Pearson correlation coefficient of 0.867 on the Millet-Charles dataset.

References

M. Berland and E. Charniak. 1999. Finding parts in very large corpora. In Proc. of ACL'99, pages 57--64. Google ScholarDigital Library
D. Bollegala, Y. Matsuo, and M. Ishizuka. 2007. Measuring semantic similarity between words using web search engines. In Proc. of WWW'07, pages 757--766. Google ScholarDigital Library
H. Chen, M. Lin, and Y. Wei. 2006. Novel association measures using web search with double checking. In Proc. of the COLING/ACL '06, pages 1009--1016. Google ScholarDigital Library
R. L. Cilibrasi and P. M. B. Vitanyi. 2007. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370--383. Google ScholarDigital Library
J. Curran. 2002. Ensemble menthods for automatic thesaurus extraction. In Proc. of EMNLP. Google ScholarDigital Library
B. Falkenhainer, K. D. Forbus, and D. Gentner. 1989. Structure mapping engine: Algorithm and examples. Artificial Intelligence, 41:1--63. Google ScholarDigital Library
L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. 2002. Placing search in context: The concept revisited. ACM TOIS, 20:116--131. Google ScholarDigital Library
E. Gabrilovich and S. Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proc. of IJCAI'07, pages 1606--1611. Google ScholarDigital Library
R. L. Goldstone. 1994. The role of similarity in categorization: providing a groundwork. Cognition, 52:125--157.Google ScholarCross Ref
U. Hahn, N. Chater, and L. B. Richardson. 2003. Similarity as transformation. Cognition, 87:1--32.Google ScholarCross Ref
Z. Harris. 1954. Distributional structure. Word, 10:146--162.Google ScholarCross Ref
M. A. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proc. of 14th COLING, pages 539--545. Google ScholarDigital Library
G. Hirst and D. St-Onge. 1997. Lexical chains as representations of context for the detection and correction of malapropisms.Google Scholar
M. Jarmasz. 1993. Roget's thesaurus as a lexical resource for natural language processing. Master's thesis, University of Ottawa.Google Scholar
J. J. Jiang and D. W. Conrath. 1998. Semantic similarity based on corpus statistics and lexical taxonomy. In Proc. of ROCLING'98.Google Scholar
C. L. Krumhansl. 1978. Concerning the applicability of geometric models to similarity data: The interrelationship between similarity and spatial density. Psychological Review, 85:445--463.Google ScholarCross Ref
C. Leacock and M. Chodorow. 1998. Combining Local Context and WordNet Similarity for Word Sense Identification. MIT.Google Scholar
M. Li, X. Chen, X. Li, B. Ma, and P. M. B. Vitanyi. 2004. The similarity metric. IEEE Transactions on Information Theory, 50(12):3250--3264. Google ScholarDigital Library
D. Lin. 1998a. Automatic retreival and clustering of similar words. In Proc. of the 17th COLING, pages 768--774. Google ScholarDigital Library
D. Lin. 1998b. An information-theoretic definition of similarity. In Proc. of the 15th ICML, pages 296--304. Google ScholarDigital Library
G. Miller and W. Charles. 1998. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1--28.Google ScholarCross Ref
J. Pei, J. Han, B. Mortazavi-Asi, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. 2004. Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11):1424--1440. Google ScholarDigital Library
R. Rada, H. Mili, E. Bichnell, and M. Blettner. 1989. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 9(1):17--30.Google ScholarCross Ref
P. Resnik. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proc. of IJCAI'95. Google ScholarDigital Library
M. Sahami and T. Heilman. 2006. A web-based kernel function for measuring the similarity of short text snippets. In Proc. of WWW'06. Google ScholarDigital Library
V. Schickel-Zuber and B. Faltings. 2007. Oss: A semantic similarity function based on hierarchical ontologies. In Proc. of IJCAI'07, pages 551--556. Google ScholarDigital Library
M. Strube and S. P. Ponzetto. 2006. Wikirelate! computing semantic relatedness using wikipedia. In Proc. of AAAI' 06. Google ScholarDigital Library
J. B. Tenenbaum. 1999. Bayesian modeling of human concept learning. In NIPS'99. Google ScholarDigital Library
A. Tversky. 1977. Features of similarity. Psychological Review, 84:327--652.Google ScholarCross Ref
D. McLean Y. Li, Zuhair A. Bandar. 2003. An approch for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15(4):871--882. Google ScholarDigital Library

Index Terms

A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
      2. Unsupervised learning
        Cluster analysis
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Lexical Semantic Similarity: Word Similarity in Semantic Networks and Distributional Structures
Read More
Measuring Semantic Similarity between Words Using HowNet
ICCSIT '08: Proceedings of the 2008 International Conference on Computer Science and Information Technology

Semantic similarity between words is a fundamental issue for many natural language processing applications. The difficulty lies in that how to develop a computational method that is capable of generating satisfactory results close to how humans ...
Read More
Measuring Semantic Relatedness between Words Using Lexical Context
CIS '11: Proceedings of the 2011 Seventh International Conference on Computational Intelligence and Security

Semantic relatedness measurement between words is always a hot issue interested by many researchers. It can be applied to various tasks of NLP and IR with a big challenge. We propose a method for measuring semantic relatedness between words using lexical ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
August 2009
616 pages
ISBN:9781932432626
Program Chairs:
Philipp Koehn
University of Edinburgh
,
Rada Mihalcea
University of North Texas
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 6 August 2009
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate73of234submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 455
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web

EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

ABSTRACT

References

Cited By

Index Terms

Recommendations

Lexical Semantic Similarity: Word Similarity in Semantic Networks and Distributional Structures

Measuring Semantic Similarity between Words Using HowNet

Measuring Semantic Relatedness between Words Using Lexical Context

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web

EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

ABSTRACT

References

Cited By

Index Terms

Recommendations

Lexical Semantic Similarity: Word Similarity in Semantic Networks and Distributional Structures

Measuring Semantic Similarity between Words Using HowNet

Measuring Semantic Relatedness between Words Using Lexical Context

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media