skip to main content
research-article

Evolution of Semantic Similarity—A Survey

Published:18 February 2021Publication History
Skip Abstract Section

Abstract

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.

References

  1. Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca, and Aitor Soroa. 2009. A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Citeseer, 19.Google ScholarGoogle ScholarCross RefCross Ref
  2. Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, et al. 2015. Semeval-2015 task 2: Semantic textual similarity, English, Spanish, and pilot on interpretability. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 252--263.Google ScholarGoogle ScholarCross RefCross Ref
  3. Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2014. Semeval-2014 task 10: Multilingual semantic textual similarity. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14). 81--91.Google ScholarGoogle ScholarCross RefCross Ref
  4. Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez Agirre, Rada Mihalcea, German Rigau Claramunt, and Janyce Wiebe. 2016. Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). ACL (Association for Computational Linguistics), 497--511.Google ScholarGoogle ScholarCross RefCross Ref
  5. Eneko Agirre, Daniel Cer, Mona Diab, and Aitor Gonzalez-Agirre. 2012. Semeval-2012 task 6: A pilot on semantic textual similarity. In * SEM 2012: The First Joint Conference on Lexical and Computational Semantics--Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval’12). 385--393.Google ScholarGoogle Scholar
  6. Eneko Agirre, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, and Weiwei Guo. 2013. * SEM 2013 shared task: Semantic textual similarity. In Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics (* SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity. 32--43.Google ScholarGoogle Scholar
  7. Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 5, 3 (2015), 379--389.Google ScholarGoogle ScholarCross RefCross Ref
  8. Berna Altinel and Murat Can Ganiz. 2018. Semantic text classification: A survey of past and recent advances. Inf. Proc. Manag. 54, 6 (2018), 1129--1153. DOI:https://doi.org/10.1016/j.ipm.2018.08.001Google ScholarGoogle ScholarCross RefCross Ref
  9. Samir Amir, Adrian Tanasescu, and Djamel A. Zighed. 2017. Sentence similarity based on semantic kernels for intelligent text retrieval. J. Intell. Inf. Syst. 48, 3 (2017), 675--689.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15).Google ScholarGoogle Scholar
  11. Satanjeev Banerjee and Ted Pedersen. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 3. 805--810.Google ScholarGoogle Scholar
  12. Daniel Bär, Chris Biemann, Iryna Gurevych, and Torsten Zesch. 2012. UKP: Computing semantic textual similarity by combining multiple content similarity measures. In * SEM 2012: The First Joint Conference on Lexical and Computational Semantics--Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval’12). 435--440.Google ScholarGoogle Scholar
  13. Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta. 2009. The wacky wide web: A collection of very large linguistically processed web-crawled corpora. Lang. Res. Eval. 43, 3 (2009), 209--226.Google ScholarGoogle ScholarCross RefCross Ref
  14. Marco Baroni, Georgiana Dinu, and Germán Kruszewski. 2014. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 238--247.Google ScholarGoogle ScholarCross RefCross Ref
  15. Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A pretrained language model for scientific text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 3606--3611.Google ScholarGoogle ScholarCross RefCross Ref
  16. Fabio Benedetti, Domenico Beneventano, Sonia Bergamaschi, and Giovanni Simonini. 2019. Computing inter-document similarity with context semantic analysis. Inf. Syst. 80 (2019), 136--147. DOI:https://doi.org/10.1016/j.is.2018.02.009Google ScholarGoogle ScholarCross RefCross Ref
  17. Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia-a crystallization point for the web of data. J. Web Seman. 7, 3 (2009), 154--165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 5 (2017), 135--146.Google ScholarGoogle ScholarCross RefCross Ref
  19. Antoine Bordes, Sumit Chopra, and Jason Weston. 2014. Question answering with subgraph embeddings. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 615--620.Google ScholarGoogle ScholarCross RefCross Ref
  20. Jose Camacho-Collados and Mohammad Taher Pilehvar. 2018. From word to sense embeddings: A survey on vector representations of meaning. J. Artif. Intell. Res. 63 (2018), 743--788.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. José Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2015. Nasari: A novel approach to a semantically aware representation of items. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 567--577.Google ScholarGoogle ScholarCross RefCross Ref
  22. José Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2016. Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artif. Intell. 240 (2016), 36--64. DOI:https://doi.org/10.1016/j.artint.2016.07.005Google ScholarGoogle ScholarCross RefCross Ref
  23. Nicola Cancedda, Eric Gaussier, Cyril Goutte, and Jean-Michel Renders. 2003. Word-sequence kernels. J. Mach. Learn. Res. 3, Feb. (2003), 1059--1082.Google ScholarGoogle Scholar
  24. Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, and Lucia Specia. 2017. SemEval-2017 Task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval’17). 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  25. Rudi L. Cilibrasi and Paul M. B. Vitanyi. 2007. The Google similarity distance. IEEE Trans. Knowl. Data Eng. 19, 3 (2007), 370--383.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michael Collins and Nigel Duffy. 2002. Convolution kernels for natural language. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 625--632.Google ScholarGoogle Scholar
  27. Michael Collins and Nigel Duffy. 2002. New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In Proceedings of the 40th Meeting of the Association for Computational Linguistics. 263--270.Google ScholarGoogle Scholar
  28. Danilo Croce, Simone Filice, Giuseppe Castellucci, and Roberto Basili. 2017. Deep learning in semantic kernel spaces. In Proceedings of the 55th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 345--354.Google ScholarGoogle ScholarCross RefCross Ref
  29. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186.Google ScholarGoogle Scholar
  30. Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing search in context: The concept revisited. In Proceedings of the 10th International Conference on World Wide Web. 406--414.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Evgeniy Gabrilovich, Shaul Markovitch, et al. 2007. Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 7. 1606--1611.Google ScholarGoogle Scholar
  32. Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2013. PPDB: The paraphrase database. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 758--764.Google ScholarGoogle Scholar
  33. Jian-Bo Gao, Bao-Wen Zhang, and Xiao-Hua Chen. 2015. A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Eng. Applic. Artif. Intell. 39 (2015), 80--88. DOI:https://doi.org/10.1016/j.engappai.2014.11.009Google ScholarGoogle ScholarCross RefCross Ref
  34. Daniela Gerz, Ivan Vulić, Felix Hill, Roi Reichart, and Anna Korhonen. 2016. SimVerb-3500: A large-scale evaluation set of verb similarity. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2173--2182.Google ScholarGoogle ScholarCross RefCross Ref
  35. Goran Glavaš, Marc Franco-Salvador, Simone P. Ponzetto, and Paolo Rosso. 2018. A resource-light method for cross-lingual semantic textual similarity. Knowl.-based Syst. 143 (2018), 1--9. DOI:https://doi.org/10.1016/j.knosys.2017.11.041Google ScholarGoogle Scholar
  36. James Gorman and James R. Curran. 2006. Scaling distributional similarity to large corpora. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Meeting of the Association for Computational Linguistics. 361--368.Google ScholarGoogle Scholar
  37. Mohamed Ali Hadj Taieb, Torsten Zesch, and Mohamed Ben Aouicha. 2019. A survey of semantic relatedness evaluation datasets and procedures. Artif. Intell. Rev. (23 Dec. 2019). DOI:https://doi.org/10.1007/s10462-019-09796-3Google ScholarGoogle Scholar
  38. Basma Hassan, Samir E. Abdelrahman, Reem Bahgat, and Ibrahim Farag. 2019. UESTS: An unsupervised ensemble semantic textual similarity method. IEEE Access 7 (2019), 85462--85482.Google ScholarGoogle ScholarCross RefCross Ref
  39. Hua He and Jimmy Lin. 2016. Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 937--948. DOI:https://doi.org/10.18653/v1/N16-1108Google ScholarGoogle ScholarCross RefCross Ref
  40. Felix Hill, Roi Reichart, and Anna Korhonen. 2015. Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Comput. Ling. 41, 4 (2015), 665--695.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, and Gerhard Weikum. 2013. YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194 (2013), 28--61.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Harneet Kaur Janda, Atish Pawar, Shan Du, and Vijay Mago. 2019. Syntactic, semantic and sentiment analysis: The joint effect on automated essay evaluation. IEEE Access 7 (2019), 108486--108503.Google ScholarGoogle ScholarCross RefCross Ref
  43. Jay J. Jiang and David W. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the 10th International Conference on Research on Computational Linguistics. 19--33.Google ScholarGoogle Scholar
  44. Yuncheng Jiang, Wen Bai, Xiaopei Zhang, and Jiaojiao Hu. 2017. Wikipedia-based information content and semantic similarity computation. Inf. Proc. Manag. 53, 1 (2017), 248--265. DOI:https://doi.org/10.1016/j.ipm.2016.09.001Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yuncheng Jiang, Xiaopei Zhang, Yong Tang, and Ruihua Nie. 2015. Feature-based approaches to semantic similarity assessment of concepts using Wikipedia. Inf. Proc. Manag. 51, 3 (2015), 215--234.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2019. TinyBERT: Distilling BERT for natural language understanding. Arxiv Preprint Arxiv:1909.10351 (2019).Google ScholarGoogle Scholar
  47. Tomoyuki Kajiwara and Mamoru Komachi. 2016. Building a monolingual parallel corpus for text simplification using sentence similarity based on alignment between word embeddings. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). 1147--1158.Google ScholarGoogle Scholar
  48. Sun Kim, Nicolas Fiorini, W. John Wilbur, and Zhiyong Lu. 2017. Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. J. Biomed. Inf. 75 (2017), 122--127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1746--1751.Google ScholarGoogle ScholarCross RefCross Ref
  50. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  51. Thomas K. Landauer and Susan T. Dumais. 1997. A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104, 2 (1997), 211.Google ScholarGoogle ScholarCross RefCross Ref
  52. Thomas K. Landauer, Peter W. Foltz, and Darrell Laham. 1998. An introduction to latent semantic analysis. Discour. Proc. 25, 2--3 (1998), 259--284.Google ScholarGoogle ScholarCross RefCross Ref
  53. Juan J. Lastra-Díaz and Ana García-Serrano. 2015. A new family of information content models with an experimental survey on WordNet. Knowl.-based Syst. 89 (2015), 509--526.Google ScholarGoogle Scholar
  54. Juan J. Lastra-Díaz, Ana García-Serrano, Montserrat Batet, Miriam Fernández, and Fernando Chirigati. 2017. HESML: A scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Inf. Syst. 66 (2017), 97--118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, and Eneko Agirre. 2019. A reproducible survey on word embeddings and ontology-based methods for word similarity: Linear combinations outperform the state of the art. Eng. Applic. Artif. Intell. 85 (2019), 645--665. DOI:https://doi.org/10.1016/j.engappai.2019.07.010Google ScholarGoogle ScholarCross RefCross Ref
  56. Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188--1196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Yuquan Le, Zhi-Jie Wang, Zhe Quan, Jiawei He, and Bin Yao. 2018. ACV-tree: A new method for sentence similarity modeling. In Proceedings of the International Joint Conference on Artificial Intelligence. 4137--4143.Google ScholarGoogle ScholarCross RefCross Ref
  58. Ming Che Lee. 2011. A novel sentence similarity measure for semantic-based expert systems. Exp. Syst. Applic. 38, 5 (2011), 6392--6399.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Omer Levy and Yoav Goldberg. 2014. Dependency-based word embeddings. In Proceedings of the 52nd Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 302--308.Google ScholarGoogle ScholarCross RefCross Ref
  60. Omer Levy and Yoav Goldberg. 2014. Neural word embedding as implicit matrix factorization. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2177--2185.Google ScholarGoogle Scholar
  61. Peipei Li, Haixun Wang, Kenny Q. Zhu, Zhongyuan Wang, and Xindong Wu. 2013. Computing term similarity by large probabilistic isA knowledge. In Proceedings of the 22nd ACM International Conference on Information 8 Knowledge Management. 1401--1410.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Yuhua Li, Zuhair A. Bandar, and David McLean. 2003. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15, 4 (2003), 871--882.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett. 2006. Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18, 8 (2006), 1138--1150.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Dekang Lin et al. 1998. An information-theoretic definition of similarity. In Proceedings of the International Conference on Machine Learning (ICML’98). 296--304.Google ScholarGoogle Scholar
  65. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. Arxiv Preprint Arxiv:1907.11692 (2019).Google ScholarGoogle Scholar
  66. I. Lopez-Gazpio, M. Maritxalar, A. Gonzalez-Agirre, G. Rigau, L. Uria, and E. Agirre. 2017. Interpretable semantic textual similarity: Finding and explaining differences between sentences. Knowl.-based Syst. 119 (2017), 186--199. DOI:https://doi.org/10.1016/j.knosys.2016.12.013Google ScholarGoogle Scholar
  67. I. Lopez-Gazpio, M. Maritxalar, M. Lapata, and E. Agirre. 2019. Word n-gram attention models for sentence similarity and inference. Exp. Syst. Applic. 132 (2019), 1--11. DOI:https://doi.org/10.1016/j.eswa.2019.04.054Google ScholarGoogle ScholarCross RefCross Ref
  68. Kevin Lund and Curt Burgess. 1996. Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Meth. Instrum. Comput. 28, 2 (1996), 203--208.Google ScholarGoogle ScholarCross RefCross Ref
  69. M. Marelli, S. Menini, M. Baroni, L. Bentivogli, R. Bernardi, and R. Zamparelli. 2014. A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). European Language Resources Association (ELRA), Reykjavik, Iceland, 216--223. http://www.lrec-conf.org/proceedings/lrec2014/pdf/363_Paper.pdf.Google ScholarGoogle Scholar
  70. Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. 2017. Learned in translation: Contextualized word vectors. In Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran 8 Associates Inc., 6297--6308.Google ScholarGoogle Scholar
  71. Bridget T. McInnes, Ying Liu, Ted Pedersen, Genevieve B. Melton, and Serguei V. Pakhomov. 2013. UMLS: Similarity: Measuring the relatedness and similarity of biomedical concepts. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 28.Google ScholarGoogle Scholar
  72. Christopher Meek, Yang Yi, and Yih Wen-tau. 2018. WIKIQA: A challenge dataset for open-domain question answering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing September 2015. 2013--2018. https://doi.org/10.18653/v1/D15-1237Google ScholarGoogle Scholar
  73. Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. context2vec: Learning generic context embedding with bidirectional LSTM. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 51--61.Google ScholarGoogle ScholarCross RefCross Ref
  74. Rada Mihalcea and Andras Csomai. 2007. Wikify! Linking documents to encyclopedic knowledge. In Proceedings of the 16th ACM Conference on Information and Knowledge Management. 233--242.Google ScholarGoogle Scholar
  75. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. Arxiv Preprint Arxiv:1301.3781 (2013).Google ScholarGoogle Scholar
  76. Tomáš Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic regularities in continuous space word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 746--751.Google ScholarGoogle Scholar
  77. George A. Miller. 1995. WordNet: A lexical database for English. Commun. ACM 38, 11 (1995), 39--41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. George A. Miller and Walter G. Charles. 1991. Contextual correlates of semantic similarity. Lang. Cog. Proc. 6, 1 (1991), 1--28.Google ScholarGoogle ScholarCross RefCross Ref
  79. Andriy Mnih and Koray Kavukcuoglu. 2013. Learning word embeddings efficiently with noise-contrastive estimation. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 2265--2273.Google ScholarGoogle Scholar
  80. Muhidin Mohamed and Mourad Oussalah. 2019. SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis. Inf. Proc. Manag. 56, 4 (2019), 1356--1372.Google ScholarGoogle ScholarCross RefCross Ref
  81. Saif M. Mohammad and Graeme Hirst. 2012. Distributional measures of semantic distance: A survey. Arxiv Preprint Arxiv:1203.1858 (2012).Google ScholarGoogle Scholar
  82. Alessandro Moschitti. 2006. Efficient convolution kernels for dependency and constituent syntactic trees. In Proceedings of the European Conference on Machine Learning. Springer, 318--329.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Alessandro Moschitti. 2008. Kernel methods, syntax and semantics for relational text categorization. In Proceedings of the 17th ACM Conference on Information and Knowledge Management. 253--262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Alessandro Moschitti, Daniele Pighin, and Roberto Basili. 2008. Tree kernels for semantic role labeling. Comput. Ling. 34, 2 (2008), 193--224.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Alessandro Moschitti and Silvia Quarteroni. 2008. Kernels on linguistic structures for answer extraction. In Proceedings of the Conference of the Association for Computational Linguistics: Human Language Technologies, Short Papers. 113--116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Alessandro Moschitti, Silvia Quarteroni, Roberto Basili, and Suresh Manandhar. 2007. Exploiting syntactic and shallow semantic kernels for question answer classification. In Proceedings of the 45th Meeting of the Association of Computational Linguistics. 776--783.Google ScholarGoogle Scholar
  87. Alessandro Moschitti and Fabio Massimo Zanzotto. 2007. Fast and effective kernels for relational learning from texts. In Proceedings of the 24th International Conference on Machine Learning. 649--656.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Roberto Navigli and Simone Paolo Ponzetto. 2012. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193 (2012).Google ScholarGoogle Scholar
  89. Douglas L. Nelson, Cathy L. McEvoy, and Thomas A. Schreiber. 2004. The University of South Florida free association, rhyme, and word fragment norms. Behav. Res. Meth. Instrum. Comput. 36, 3 (2004), 402--407.Google ScholarGoogle ScholarCross RefCross Ref
  90. Joakim Nivre. 2006. Inductive Dependency Parsing. Springer.Google ScholarGoogle Scholar
  91. Matteo Pagliardini, Prakhar Gupta, and Martin Jaggi. 2018. Unsupervised learning of sentence embeddings using compositional n-gram features. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 528--540.Google ScholarGoogle ScholarCross RefCross Ref
  92. Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2249--2255.Google ScholarGoogle ScholarCross RefCross Ref
  93. A. Pawar and V. Mago. 2019. Challenging the boundaries of unsupervised learning for semantic similarity. IEEE Access 7 (2019), 16291--16308.Google ScholarGoogle ScholarCross RefCross Ref
  94. Ted Pedersen, Serguei V. S. Pakhomov, Siddharth Patwardhan, and Christopher G. Chute. 2007. Measures of semantic similarity and relatedness in the biomedical domain. J. Biomed. Inf. 40, 3 (2007), 288--299.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543.Google ScholarGoogle Scholar
  96. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2227--2237.Google ScholarGoogle ScholarCross RefCross Ref
  97. Mohammad Taher Pilehvar and Jose Camacho-Collados. 2019. WiC: the word-in-context dataset for evaluating context-sensitive meaning representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 1267--1273.Google ScholarGoogle Scholar
  98. Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. 2013. Align, disambiguate and walk: A unified approach for measuring semantic similarity. In Proceedings of the 51st Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1341--1351.Google ScholarGoogle Scholar
  99. Mohammad Taher Pilehvar and Roberto Navigli. 2015. From senses to texts: An all-in-one graph-based approach for measuring semantic similarity. Artif. Intell. 228 (2015), 95--128. DOI:https://doi.org/10.1016/j.artint.2015.07.005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Rong Qu, Yongyi Fang, Wen Bai, and Yuncheng Jiang. 2018. Computing semantic similarity based on novel models of semantic representation using Wikipedia. Inf. Proc. Manag. 54, 6 (2018), 1002--1021. DOI:https://doi.org/10.1016/j.ipm.2018.07.002.Google ScholarGoogle ScholarCross RefCross Ref
  101. Z. Quan, Z. Wang, Y. Le, B. Yao, K. Li, and J. Yin. 2019. An efficient framework for sentence similarity modeling. IEEE/ACM Trans. Aud. Speech Lang. Proc. 27, 4 (Apr. 2019), 853--865. DOI:https://doi.org/10.1109/TASLP.2019.2899494.Google ScholarGoogle Scholar
  102. Roy Rada, Hafedh Mili, Ellen Bicknell, and Maria Blettner. 1989. Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cyber. 19, 1 (1989), 17--30.Google ScholarGoogle ScholarCross RefCross Ref
  103. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. Arxiv Preprint Arxiv:1910.10683 (2019).Google ScholarGoogle Scholar
  104. Philip Resnik. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence. 448--453.Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. M. Andrea Rodríguez and Max J. Egenhofer. 2003. Determining semantic similarity among entity classes from different ontologies. IEEE Trans. Knowl. Data Eng. 15, 2 (2003), 442--456.Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Terry Ruas, William Grosky, and Akiko Aizawa. 2019. Multi-sense embeddings through a word sense disambiguation process. Exp. Syst. Applic. 136 (2019), 288--303. DOI:https://doi.org/10.1016/j.eswa.2019.06.026Google ScholarGoogle ScholarCross RefCross Ref
  107. Herbert Rubenstein and John B. Goodenough. 1965. Contextual correlates of synonymy. Commun. ACM 8, 10 (1965), 627--633.Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. David Sánchez, Montserrat Batet, and David Isern. 2011. Ontology-based information content computation. Knowl.-based Syst. 24, 2 (2011), 297--303.Google ScholarGoogle Scholar
  109. Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. Arxiv Preprint Arxiv:1910.01108 (2019).Google ScholarGoogle Scholar
  110. Frane Šarić, Goran Glavaš, Mladen Karan, Jan Šnajder, and Bojana Dalbelo Bašić. 2012. Takelab: Systems for measuring semantic text similarity. In * SEM 2012: The 1st Joint Conference on Lexical and Computational Semantics--Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval’12). 441--448.Google ScholarGoogle Scholar
  111. Tobias Schnabel, Igor Labutov, David Mimno, and Thorsten Joachims. 2015. Evaluation methods for unsupervised word embeddings. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 298--307.Google ScholarGoogle ScholarCross RefCross Ref
  112. Aliaksei Severyn and Alessandro Moschitti. 2012. Structural relationships for large-scale learning of answer re-ranking. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 741--750.Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Aliaksei Severyn, Massimo Nicosia, and Alessandro Moschitti. 2013. Learning semantic textual similarity with structural representations. In Proceedings of the 51st Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 714--718.Google ScholarGoogle Scholar
  114. Yang Shao. 2017. HCTI at SemEval-2017 task 1: Use convolutional neural network to evaluate semantic textual similarity. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval’17). 130--133.Google ScholarGoogle ScholarCross RefCross Ref
  115. John Shawe-Taylor, Nello Cristianini et al. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press.Google ScholarGoogle Scholar
  116. Carina Silberer and Mirella Lapata. 2014. Learning grounded meaning representations with autoencoders. In Proceedings of the 52nd Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 721--732.Google ScholarGoogle ScholarCross RefCross Ref
  117. Roberta A. Sinoara, Jose Camacho-Collados, Rafael G. Rossi, Roberto Navigli, and Solange O. Rezende. 2019. Knowledge-enhanced document embeddings for text classification. Knowl.-based Syst. 163 (2019), 955--971. DOI:https://doi.org/10.1016/j.knosys.2018.10.026Google ScholarGoogle Scholar
  118. Gizem Soǧancıoǧlu, Hakime Öztürk, and Arzucan Özgür. 2017. BIOSSES: A semantic sentence similarity estimation system for the biomedical domain. Bioinformatics 33, 14 (07 2017), i49--i58. arXiv: Retrieved from https://academic.oup.com/bioinformatics/article-pdf/33/14/i49/2515 7316/btx238.pdf.Google ScholarGoogle Scholar
  119. Md Arafat Sultan, Steven Bethard, and Tamara Sumner. 2014. DLS@ CU: Sentence similarity from word alignment. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14). 241--246.Google ScholarGoogle ScholarCross RefCross Ref
  120. Md Arafat Sultan, Steven Bethard, and Tamara Sumner. 2015. DLS@ CU: Sentence similarity from word alignment and semantic vector composition. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 148--153.Google ScholarGoogle ScholarCross RefCross Ref
  121. Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A continual pre-training framework for language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence. 8968--8975.Google ScholarGoogle ScholarCross RefCross Ref
  122. David Sánchez and Montserrat Batet. 2013. A semantic similarity method based on information content exploiting multiple ontologies. Exp. Syst. Applic. 40, 4 (2013), 1393--1399. DOI:https://doi.org/10.1016/j.eswa.2012.08.049Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. David Sánchez, Montserrat Batet, David Isern, and Aida Valls. 2012. Ontology-based semantic similarity: A new feature-based approach. Exp. Syst. Applic. 39, 9 (2012), 7718--7728. DOI:https://doi.org/10.1016/j.eswa.2012.01.082Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1556--1566.Google ScholarGoogle Scholar
  125. Junfeng Tian, Zhiheng Zhou, Man Lan, and Yuanbin Wu. 2017. ECNU at SemEval-2017 task 1: Leverage kernel-based traditional NLP features and neural networks to build a universal model for multilingual and cross-lingual semantic textual similarity. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval’17). 191--197.Google ScholarGoogle ScholarCross RefCross Ref
  126. Nguyen Huy Tien, Nguyen Minh Le, Yamasaki Tomohiro, and Izuha Tatsuya. 2019. Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity. Inf. Proc. Manag. 56, 6 (2019), 102090. DOI:https://doi.org/10.1016/j.ipm.2019.102090Google ScholarGoogle ScholarCross RefCross Ref
  127. Julien Tissier, Christophe Gravier, and Amaury Habrard. 2017. Dict2vec: Learning Word embeddings using lexical dictionaries. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 254--263.Google ScholarGoogle ScholarCross RefCross Ref
  128. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the International Conference on Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  129. Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the jeopardy model? A quasi-synchronous grammar for QA. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 22--32.Google ScholarGoogle Scholar
  130. Zhiguo Wang, Haitao Mi, and Abraham Ittycheriah. 2016. Sentence similarity learning by lexical decomposition and composition. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16). 1340--1349. arXiv:1602.07019.Google ScholarGoogle Scholar
  131. Zhibiao Wu and Martha Palmer. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 133--138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 5753--5763.Google ScholarGoogle Scholar
  133. G. Zhu and C. A. Iglesias. 2017. Computing semantic similarity of concepts in knowledge graphs. IEEE Trans. Knowl. Data Eng. 29, 1 (Jan. 2017), 72--85. DOI:https://doi.org/10.1109/TKDE.2016.2610428Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. Will Y. Zou, Richard Socher, Daniel Cer, and Christopher D. Manning. 2013. Bilingual word embeddings for phrase-based machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1393--1398.Google ScholarGoogle Scholar

Index Terms

  1. Evolution of Semantic Similarity—A Survey

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Computing Surveys
            ACM Computing Surveys  Volume 54, Issue 2
            March 2022
            800 pages
            ISSN:0360-0300
            EISSN:1557-7341
            DOI:10.1145/3450359
            Issue’s Table of Contents

            Copyright © 2021 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 18 February 2021
            • Accepted: 1 November 2020
            • Revised: 1 September 2020
            • Received: 1 April 2020
            Published in csur Volume 54, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format