survey

A Brief Overview of Universal Sentence Representation Methods: A Linguistic View

Authors:
Ruiqi Li

KU Leuven, Belgium and National University of Defense Technology, Changsha, China

KU Leuven, Belgium and National University of Defense Technology, Changsha, China
View Profile

,
Xiang Zhao

National University of Defense Technology, Changsha, China

National University of Defense Technology, Changsha, China
View Profile

,
Marie-Francine Moens

KU Leuven, Celestijnenlaan, Heverlee, Belgium

KU Leuven, Celestijnenlaan, Heverlee, Belgium
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 55 Issue 3Article No.: 56pp 1–42https://doi.org/10.1145/3482853

Published:26 March 2022Publication History

ACM Computing Surveys

Abstract

How to transfer the semantic information in a sentence to a computable numerical embedding form is a fundamental problem in natural language processing. An informative universal sentence embedding can greatly promote subsequent natural language processing tasks. However, unlike universal word embeddings, a widely accepted general-purpose sentence embedding technique has not been developed. This survey summarizes the current universal sentence-embedding methods, categorizes them into four groups from a linguistic view, and ultimately analyzes their reported performance. Sentence embeddings trained from words in a bottom-up manner are observed to have different, nearly opposite, performance patterns in downstream tasks compared to those trained from logical relationships between sentences. By comparing differences of training schemes in and between groups, we analyze possible essential reasons for different performance patterns. We additionally collect incentive strategies handling sentences from other models and propose potentially inspiring future research directions.

REFERENCES

[1] Adi Yossi, Kermany Einat, Belinkov Yonatan, Lavi Ofer, and Goldberg Yoav. 2016. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. Corr abs/1608.04207 (2016).Google Scholar
[2] Agirre Eneko, Banea Carmen, Cardie Claire, Cer Daniel M., Diab Mona T., Gonzalez-Agirre Aitor, Guo Weiwei, Lopez-Gazpio Iñigo, Maritxalar Montse, Mihalcea Rada, Rigau German, Uria Larraitz, and Wiebe Janyce. 2015. SemEval-2015 task 2: Semantic textual similarity, English, Spanish and pilot on interpretability. In Proceedings of the 9th International Workshop on Semantic Evaluation. 252–263.Google ScholarCross Ref
[3] Agirre Eneko, Banea Carmen, Cardie Claire, Cer Daniel M., Diab Mona T., Gonzalez-Agirre Aitor, Guo Weiwei, Mihalcea Rada, Rigau German, and Wiebe Janyce. 2014. SemEval-2014 task 10: Multilingual semantic textual similarity. In Proceedings of the 8th International Workshop on Semantic Evaluation. 81–91.Google ScholarCross Ref
[4] Agirre Eneko, Banea Carmen, Cer Daniel M., Diab Mona T., Gonzalez-Agirre Aitor, Mihalcea Rada, Rigau German, and Wiebe Janyce. 2016. SemEval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In Proceedings of the 10th International Workshop on Semantic Evaluation. 497–511.Google ScholarCross Ref
[5] Agirre Eneko, Cer Daniel M., Diab Mona T., and Gonzalez-Agirre Aitor. 2012. SemEval-2012 task 6: a pilot on semantic textual similarity. In Proceedings of the 6th International Workshop on Semantic Evaluation. 385–393.Google Scholar
[6] Agirre Eneko, Cer Daniel M., Diab Mona T., Gonzalez-Agirre Aitor, and Guo Weiwei. 2013. *SEM 2013 shared task: Semantic textual similarity. In Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics. 32–43.Google Scholar
[7] Arora Sanjeev, Li Yuanzhi, Liang Yingyu, Ma Tengyu, and Risteski Andrej. 2016. A latent variable model approach to PMI-based word embeddings. Trans. Assoc. Comput. Ling. 4 (2016), 385–399.Google ScholarCross Ref
[8] Arora Sanjeev, Liang Yingyu, and Ma Tengyu. 2017. A simple but tough-to-beat baseline for sentence embeddings. In Proceedings of 5th International Conference on Learning Representations.Google Scholar
[9] Ba Lei Jimmy, Kiros Ryan, and Hinton Geoffrey E.. 2016. Layer normalization. Corr abs/1607.06450 (2016).Google Scholar
[10] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations.Google Scholar
[11] Bansal Mohit, Gimpel Kevin, and Livescu Karen. 2014. Tailoring continuous word representations for dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 809–815.Google ScholarCross Ref
[12] Baxter Jonathan. 2000. A model of inductive bias learning. J. Artif. Intell. Res. 12 (2000), 149–198.Google ScholarCross Ref
[13] Bojanowski Piotr, Grave Edouard, Joulin Armand, and Mikolov Tomas. 2017. Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 5 (2017), 135–146.Google ScholarCross Ref
[14] Bowman Samuel R., Angeli Gabor, Potts Christopher, and Manning Christopher D.. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 632–642.Google ScholarCross Ref
[15] Camblin C. Christine, Gordon Peter C., and Swaab Tamara Y.. 2007. The interplay of discourse congruence and lexical association during sentence processing: Evidence from ERPs and eye tracking. J. Mem. Lang. 56, 1 (2007), 103–128.Google ScholarCross Ref
[16] Cer Daniel, Diab Mona, Agirre Eneko, Lopez-Gazpio Inigo, and Specia Lucia. 2017. SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation. 1–14.Google ScholarCross Ref
[17] Cer Daniel, Yang Yinfei, Kong Sheng-yi, Hua Nan, Limtiaco Nicole, John Rhomni St., Constant Noah, Guajardo-Cespedes Mario, Yuan Steve, Tar Chris, Strope Brian, and Kurzweil Ray. 2018. Universal sentence encoder for English. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 169–174.Google ScholarCross Ref
[18] Charles Spearman. 1904. The proof and measurement of association between two things. Amer. J. Psychol. 15, 1 (1904), 72–101.Google ScholarCross Ref
[19] Chen Liang-Chieh, Papandreou George, Kokkinos Iasonas, Murphy Kevin, and Yuille Alan L.. 2015. Semantic image segmentation with deep convolutional nets and fully connected CRFs. In Proceedings of 3rd International Conference on Learning Representations.Google Scholar
[20] Chen Liang-Chieh, Papandreou George, Kokkinos Iasonas, Murphy Kevin, and Yuille Alan L.. 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2018), 834–848.Google ScholarCross Ref
[21] Chen Yubo, Xu Liheng, Liu Kang, Zeng Daojian, and Zhao Jun. 2015. Event extraction via dynamic multi-pooling convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. 167–176.Google ScholarCross Ref
[22] Chern Chiou-Lan. 1993. Chinese students word-solving strategies in reading in English. Sec. Lang. Read. Vocab. Learn. (1993), 67–85.Google Scholar
[23] Cho Kyunghyun, Merrienboer Bart van, Gülçehre Çaglar, Bahdanau Dzmitry, Bougares Fethi, Schwenk Holger, and Bengio Yoshua. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1724–1734.Google ScholarCross Ref
[24] Clarke D. F. and Nation Isp. 1980. Guessing the meanings of words from context. System 8, 3 (1980), 211–220.Google ScholarCross Ref
[25] Collell Guillem and Moens Marie-Francine. 2016. Is an image worth more than a thousand words? On the fine-grain semantic differences between visual and linguistic representations. In Proceedings of the 26th International Conference on Computational Linguistics. 2807–2817.Google Scholar
[26] Collell Guillem, Zhang Ted, and Moens Marie-Francine. 2017. Imagined visual representations as multimodal embeddings. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 4378–4384.Google Scholar
[27] Collobert Ronan and Weston Jason. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. 160–167.Google ScholarDigital Library
[28] Collobert Ronan, Weston Jason, Bottou Léon, Karlen Michael, Kavukcuoglu Koray, and Kuksa Pavel P.. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 (2011), 2493–2537.Google ScholarCross Ref
[29] Conneau Alexis and Kiela Douwe. 2018. SentEval: An evaluation toolkit for universal sentence representations. In Proceedings of the 11th International Conference on Language Resources and Evaluation.Google Scholar
[30] Conneau Alexis, Kiela Douwe, Schwenk Holger, Barrault Loïc, and Bordes Antoine. 2017. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 670–680.Google ScholarCross Ref
[31] Conneau Alexis, Kruszewski Germán, Lample Guillaume, Barrault Loïc, and Baroni Marco. 2018. What you can cram into a single \$&!#* vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2126–2136.Google ScholarCross Ref
[32] Costa Fabrizio, Frasconi Paolo, Lombardo Vincenzo, and Soda Giovanni. 2003. Towards incremental parsing of natural language using recursive neural networks. Appl. Intell. 19, 1–2 (2003), 9–25.Google ScholarDigital Library
[33] Dai Andrew M., Olah Christopher, and Le Quoc V.. 2015. Document embedding with paragraph vectors. arxiv:1507.07998.Google Scholar
[34] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186.Google Scholar
[35] Dolan Bill, Quirk Chris, and Brockett Chris. 2004. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics.Google ScholarDigital Library
[36] Du Jingfei, Grave Edouard, Gunel Beliz, Chaudhary Vishrav, Celebi Onur, Auli Michael, Stoyanov Ves, and Conneau Alexis. 2020. Self-training improves pre-training for natural language understanding. arxiv:2010.02194.Google Scholar
[37] Elman Jeffrey L.. 1990. Finding structure in time. Cogn. Sci. 14, 2 (1990), 179–211.Google ScholarCross Ref
[38] Ethayarajh Kawin. 2019. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 55–65.Google ScholarCross Ref
[39] Ettinger Allyson. 2020. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Trans. Assoc. Comput. Ling. 8 (2020), 34–48.Google ScholarCross Ref
[40] Fang Lanting, Luo Yong, Feng Kaiyu, Zhao Kaiqi, and Hu Aiqun. 2019. Knowledge-enhanced ensemble learning for word embeddings. In Proceedings of the World Wide Web Conference. 427–437.Google ScholarDigital Library
[41] Fresno California State University. 2018. LibGuides: Literature Review: Transition Words.Google Scholar
[42] Fries Peter H.. 1981. On the status of theme in English: Arguments from discourse. In Forum Linguisticum, Vol. 6. Helmut Buske Verlag (Papers in Textlinguistics 45) Hamburg, 1–38.Google Scholar
[43] Ganitkevitch Juri, Durme Benjamin Van, and Callison-Burch Chris. 2013. PPDB: the paraphrase database. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 758–764.Google Scholar
[44] Garrett Merrill F.. 1989. Processes in language production. Ling.: Cambr. Surv. 3 (1989), 69–96.Google Scholar
[45] Gehring Jonas, Auli Michael, Grangier David, Yarats Denis, and Dauphin Yann N.. 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning. 1243–1252.Google ScholarDigital Library
[46] Goldberg Yoav. 2019. Assessing BERT’s syntactic abilities. arxiv:1901.05287.Google Scholar
[47] Goller Christoph and Kuchler Andreas. 1996. Learning task-dependent distributed representations by backpropagation through structure. Neural Netw. 1 (1996), 347–352.Google Scholar
[48] Guo Guibing, Zhai Songlin, Yuan Fajie, Liu Yuan, and Wang Xingwei. 2018. VSE-ens: Visual-semantic embeddings with efficient negative sampling. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 290–297.Google Scholar
[49] Hamilton William L., Ying Zhitao, and Leskovec Jure. 2017. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems Conference. 1024–1034.Google Scholar
[50] Hao Yu, Liu Xien, Wu Ji, and Lv Ping. 2019. Exploiting sentence embedding for medical question answering. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Conference, and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence. 938–945.Google ScholarDigital Library
[51] Henderson Matthew, Al-Rfou Rami, Strope Brian, Sung Yun-Hsuan, Lukács László, Guo Ruiqi, Kumar Sanjiv, Miklos Balint, and Kurzweil Ray. 2017. Efficient natural language response suggestion for smart reply. Corr abs/1705.00652.Google Scholar
[52] Hewitt John and Manning Christopher D.. 2019. A structural probe for finding syntax in word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4129–4138.Google Scholar
[53] Hill Felix, Cho Kyunghyun, and Korhonen Anna. 2016. Learning distributed representations of sentences from unlabelled data. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1367–1377.Google ScholarCross Ref
[54] Hinton Geoffrey E.. 1984. Distributed representations. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations. MIT Press, 77–109.Google Scholar
[55] Hochreiter Sepp and Schmidhuber Jürgen. 1996. LSTM can solve hard long time lag problems. In Proceedings of the Advances in Neural Information Processing Systems Conference. 473–479.Google Scholar
[56] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.Google ScholarDigital Library
[57] Hu Minqing and Liu Bing. 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 168–177.Google ScholarDigital Library
[58] Huckin Thomas, Bloch Joel, et al. 1993. Strategies for inferring word-meanings in context: A cognitive model. In Second Language Reading And Vocabulary Learning. Ablex Publishing Corporation, 153–178.Google Scholar
[59] Iyyer Mohit, Manjunatha Varun, Boyd-Graber Jordan L., and III Hal Daumé. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. 1681–1691.Google ScholarCross Ref
[60] Jawahar Ganesh, Sagot Benoît, and Seddah Djamé. 2019. What does BERT learn about the structure of language? In Proceedings of the 57th Conference of the Association for Computational Linguistics. 3651–3657.Google ScholarCross Ref
[61] Jernite Yacine, Bowman Samuel R., and Sontag David. 2017. Discourse-based objectives for fast unsupervised sentence representation learning. Corr abs/1705.00557 (2017).Google Scholar
[62] Johnson Rie and Zhang Tong. 2017. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 562–570.Google ScholarCross Ref
[63] Kalchbrenner Nal, Grefenstette Edward, and Blunsom Phil. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 655–665.Google ScholarCross Ref
[64] Karl Pearson. 1895. note on regression and inheritance in the case of two parents. Proc. Roy. Societ. Lond. Series I 58 (1895), 240–242.Google Scholar
[65] Kim Taeuk, Choi Jihun, Edmiston Daniel, and Lee Sang-goo. 2020. Are pre-trained language models aware of phrases? Simple but strong baselines for grammar induction. In Proceedings of the International Conference on Learning Representations. https://openreview.net/pdf?id=H1xPR3NtPB.Google Scholar
[66] Kim Yoon. 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1746–1751.Google ScholarCross Ref
[67] Kingma Diederik P. and Ba Jimmy. 2015. Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations.Google Scholar
[68] Kipf Thomas N. and Welling Max. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations.Google Scholar
[69] Kiros Jamie and Chan William. 2018. InferLite: Simple universal sentence representations from natural language inference data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 4868–4874.Google ScholarCross Ref
[70] Kiros Ryan, Salakhutdinov Ruslan, and Zemel Richard S.. 2014. Unifying visual-semantic embeddings with multimodal neural language models. Corr abs/1411.2539 (2014).Google Scholar
[71] Kiros Ryan, Zhu Yukun, Salakhutdinov Ruslan, Zemel Richard S., Urtasun Raquel, Torralba Antonio, and Fidler Sanja. 2015. Skip-thought vectors. In Proceedings of the Advances in Neural Information Processing Systems Conference. 3294–3302.Google Scholar
[72] Kudo Taku. 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 66–75.Google ScholarCross Ref
[73] Kudo Taku and Richardson John. 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 66–71.Google ScholarCross Ref
[74] Kukich Karen. 1992. Techniques for automatically correcting words in text. ACM Comput. Surv. 24, 4 (1992), 377–439.Google ScholarDigital Library
[75] Kuperberg G. R., Caplan D., Eddy M., Cotton J., and Holcomb P. J.. 2011. Electrophysiological correlates of processing causal relationships between sentences. J. Cogn. Neurosci. 23, 5 (2011), 1230–1246.Google Scholar
[76] Lai Siwei, Xu Liheng, Liu Kang, and Zhao Jun. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2267–2273.Google ScholarDigital Library
[77] Lan Zhenzhong, Chen Mingda, Goodman Sebastian, Gimpel Kevin, Sharma Piyush, and Soricut Radu. 2019. ALBERT: a lite BERT for self-supervised learning of language representations. arxiv:1909.11942.Google Scholar
[78] Le Quoc V. and Mikolov Tomas. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. 1188–1196.Google ScholarDigital Library
[79] Li Bohan, Zhou Hao, He Junxian, Wang Mingxuan, Yang Yiming, and Li Lei. 2020. On the sentence embeddings from pre-trained language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 9119–9130.Google ScholarCross Ref
[80] Li Jiwei, Luong Thang, Jurafsky Dan, and Hovy Eduard H.. 2015. When are tree structures necessary for deep learning of representations? In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2304–2314.Google ScholarCross Ref
[81] Li Xin and Roth Dan. 2002. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics.Google ScholarDigital Library
[82] Li Yikang, Duan Nan, Zhou Bolei, Chu Xiao, Ouyang Wanli, Wang Xiaogang, and Zhou Ming. 2018. Visual question generation as dual task of visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6116–6124.Google ScholarCross Ref
[83] Lin Tsung-Yi, Maire Michael, Belongie Serge J., Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision. 740–755.Google ScholarCross Ref
[84] Lin Zhouhan, Feng Minwei, Santos Cícero Nogueira dos, Yu Mo, Xiang Bing, Zhou Bowen, and Bengio Yoshua. 2017. A structured self-attentive sentence embedding. Corr abs/1703.03130 (2017).Google Scholar
[85] Liu Tianyu, Wang Kexiang, Sha Lei, Chang Baobao, and Sui Zhifang. 2018. Table-to-text generation by structure-aware Seq2seq learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th Innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence. 4881–4888.Google Scholar
[86] Liu Yinhan, Ott Myle, Goyal Naman, Du Jingfei, Joshi Mandar, Chen Danqi, Levy Omer, Lewis Mike, Zettlemoyer Luke, and Stoyanov Veselin. 2019. RoBERTa: a robustly optimized BERT pretraining approach. Corr abs/1907.11692.Google Scholar
[87] Liu Yang, Sun Chengjie, Lin Lei, and Wang Xiaolong. 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. Corr abs/1605.09090.Google Scholar
[88] Logeswaran Lajanugen and Lee Honglak. 2018. An efficient framework for learning sentence representations. In Proceedings of the 6th International Conference on Learning Representations Conference.Google Scholar
[89] Marelli Marco, Menini Stefano, Baroni Marco, Bentivogli Luisa, Bernardi Raffaella, and Zamparelli Roberto. 2014. A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of the 9th International Conference on Language Resources and Evaluation. 216–223.Google Scholar
[90] Markov A. A.. 1913. Essai d’une recherche statistique sur le texte du roman “Eugene Onegin” illustrant la liaison des epreuve en chain (‘Example of a statistical investigation of the text of ‘Eugene Onegin’ illustrating the dependence between samples in chain”). Izvistia Imperatorskoi Akademii Nauk (Bulletin De l’académie impériale Des Sciences De st.-pétersbourg) 7 (1913), 153–162. English translation by Morris Halle, 1956.Google Scholar
[91] Medved Marek and Horák Ales. 2018. Sentence and word embedding employed in open question-answering. In Proceedings of the 10th International Conference on Agents and Artificial Intelligence. 486–492.Google ScholarCross Ref
[92] Mikolov Tomas, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations.Google Scholar
[93] Mikolov Tomás, Le Quoc V., and Sutskever Ilya. 2013. Exploiting similarities among languages for machine translation. Retrieved from http://arxiv.org/abs/1309.4168.Google Scholar
[94] Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado Gregory S., and Dean Jeffrey. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems Conference. 3111–3119.Google Scholar
[95] Monti Federico, Boscaini Davide, Masci Jonathan, Rodolà Emanuele, Svoboda Jan, and Bronstein Michael M.. 2017. Geometric deep learning on graphs and manifolds using mixture model CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5425–5434.Google ScholarCross Ref
[96] Mrkšić Nikola, Vulić Ivan, Séaghdha Diarmuid Ó., Leviant Ira, Reichart Roi, Gašić Milica, Korhonen Anna, and Young Steve. 2017. Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints. Trans. Assoc. Comput. Ling. 5 (2017), 309–324.Google ScholarCross Ref
[97] Newell Alan F., Langer Stefan, and Hickey Marianne. 1998. The role of natural language processing in alternative and augmentative communication. Nat. Lang. Eng. 4, 1 (1998), 1–16.Google ScholarDigital Library
[98] Nguyen Thien Huu and Grishman Ralph. 2015. Event detection and domain adaptation with convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. 365–371.Google ScholarCross Ref
[99] Nie Allen, Bennett Erin D., and Goodman Noah D.. 2017. DisSent: Sentence representation learning from explicit discourse relations. Corr abs/1710.04334.Google Scholar
[100] Nieuwland Mante S. and Berkum Jos J. A. Van. 2006. When peanuts fall in love: N400 evidence for the power of discourse. J. Cogn. Neurosci. 18, 7 (2006), 1098–1111.Google ScholarDigital Library
[101] Nikolaev Fedor and Kotov Alexander. [n.d.]. Joint word and entity embeddings for entity retrieval from a knowledge graph. In Proceedings of the 42nd European Conference on IR Research: Advances in Information Retrieval.Google Scholar
[102] Paczynski Martin, Ditman Tali, Okano Kana, and Kuperberg Gina R.. 2007. Drawing inferences during discourse comprehension: An ERP study. Retrieved on 10 Nov., 2021 from https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.505.3466&rep=rep1&type=pdf.Google Scholar
[103] Pang Bo and Lee Lillian. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 271–278.Google ScholarDigital Library
[104] Pang Bo and Lee Lillian. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. 115–124.Google ScholarDigital Library
[105] Peldszus Andreas and Stede Manfred. 2015. An annotated corpus of argumentative microtexts. In Proceedings of the 1st Conference on Argumentation.Google Scholar
[106] Pennington Jeffrey, Socher Richard, and Manning Christopher D.. 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1532–1543.Google ScholarCross Ref
[107] Perone Christian S., Silveira Roberto, and Paula Thomas S.. 2018. Evaluation of sentence embeddings in downstream and linguistic probing tasks. Corr abs/1806.06259.Google Scholar
[108] Peters Matthew E., Neumann Mark, Iyyer Mohit, Gardner Matt, Clark Christopher, Lee Kenton, and Zettlemoyer Luke. 2018. Deep contextualized word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2227–2237.Google ScholarCross Ref
[109] Pollack Jordan B.. 1990. Recursive distributed representations. Artif. Intell. 46, 1–2 (1990), 77–105.Google ScholarDigital Library
[110] Prettenhofer Peter and Stein Benno. 2011. Cross-lingual adaptation using structural correspondence learning. ACM Trans. Intell. Syst. Technol. 3, 1 (2011), 13:1–13:22.Google ScholarDigital Library
[111] Qiu Xipeng, Sun Tianxiang, Xu Yige, Shao Yunfan, Dai Ning, and Huang Xuanjing. 2020. Pre-trained models for natural language processing: a survey. arxiv:2003.08271.Google Scholar
[112] Radford Alec, Narasimhan Karthik, Salimans Tim, and Sutskever Ilya. 2018. Improving language understanding by generative pre-training. (2018). https://www.cs.ubc.ca/amuham01/LING530/papers/radford2018improving.pdf.Google Scholar
[113] Rajpurkar Pranav, Zhang Jian, Lopyrev Konstantin, and Liang Percy. 2016. SQuAD: 100, 000+ questions for machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2383–2392.Google ScholarCross Ref
[114] Reimers Nils and Gurevych Iryna. 2019. Sentence-BERT: Sentence embeddings using siamese BERT-Networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 3980–3990.Google ScholarCross Ref
[115] Reimers Nils and Gurevych Iryna. 2020. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 4512–4525.Google ScholarCross Ref
[116] Reinhart Tanya. 1980. Conditions for text coherence. Poet. Today 1, 4 (1980), 161–180.Google ScholarCross Ref
[117] Rott Susanne. 1999. The effect of exposure frequency on intermediate language learners’ incidental vocabulary acquisition and retention through reading. Stud. Sec. Lang. Acquis. 21, 4 (1999), 589–619.Google ScholarCross Ref
[118] Roy Dwaipayan, Ganguly Debasis, Bhatia Sumit, Bedathur Srikanta, and Mitra Mandar. 2018. Using word embeddings for information retrieval: How collection and term normalization choices affect performance. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1835–1838.Google ScholarDigital Library
[119] Rücklé Andreas, Eger Steffen, Peyrard Maxime, and Gurevych Iryna. 2018. Concatenated $ p $-mean word embeddings as universal cross-lingual sentence representations. Corr abs/1803.01400 (2018).Google Scholar
[120] Russell Stuart J. and Norvig Peter. 2020. Artificial Intelligence: a Modern Approach (4th Edition). Pearson.Google Scholar
[121] Schuster Mike and Nakajima Kaisuke. 2012. Japanese and Korean voice search. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 5149–5152.Google ScholarCross Ref
[122] Schwenk Holger. 2018. Filtering and mining parallel data in a joint multilingual space. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 228–234.Google ScholarCross Ref
[123] Schwenk Holger and Douze Matthijs. 2017. Learning joint multilingual sentence representations with neural machine translation. In Proceedings of the 2nd Workshop on Representation Learning for NLP. 157–167.Google ScholarCross Ref
[124] Schwenk Holger and Li Xian. 2018. A corpus for multilingual document classification in eight languages. In Proceedings of the 11th International Conference on Language Resources and Evaluation.Google Scholar
[125] See Abigail, Liu Peter J., and Manning Christopher D.. 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1073–1083.Google ScholarCross Ref
[126] Sennrich Rico, Haddow Barry, and Birch Alexandra. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
[127] Shi Haoyue, Mao Jiayuan, Xiao Tete, Jiang Yuning, and Sun Jian. 2018. Learning visually-grounded semantics from contrastive adversarial samples. In Proceedings of the 27th International Conference on Computational Linguistics. 3715–3727.Google Scholar
[128] Sileo Damien, Cruys Tim Van de, Pradel Camille, and Muller Philippe. 2019. Mining discourse markers for unsupervised sentence representation learning. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3477–3486.Google ScholarCross Ref
[129] Socher Richard, Huang Eric H., Pennington Jeffrey, Ng Andrew Y., and Manning Christopher D.. 2011. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Proceedings of the Advances in Neural Information Processing Systems Conference. 801–809.Google Scholar
[130] Socher Richard, Perelygin Alex, Wu Jean, Chuang Jason, Manning Christopher D., Ng Andrew Y., and Potts Christopher. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1631–1642.Google Scholar
[131] Sorensen Viggo. 1981. Coherence as a pragmatic concept. In Possibilities and Limitations of Pragmatics. John Benjamins, 657.Google Scholar
[132] Stab Christian and Gurevych Iryna. 2017. Parsing argumentation structures in persuasive essays. Comput. Ling. 43, 3 (2017), 619–659.Google ScholarCross Ref
[133] Subramanian Sandeep, Trischler Adam, Bengio Yoshua, and Pal Christopher J.. 2018. Learning general purpose distributed sentence representations via large scale multi-task learning. In Proceedings of the 6th International Conference on Learning Representations.Google Scholar
[134] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems. 3104–3112.Google Scholar
[135] Täckström Oscar and McDonald Ryan T.. 2011. Discovering fine-grained sentiment with latent variable structured prediction models. In Proceedings of the 33rd European Conference on IR Research. 368–374.Google ScholarDigital Library
[136] Tai Kai Sheng, Socher Richard, and Manning Christopher D.. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 1556–1566.Google ScholarCross Ref
[137] Tay Yi, Tran Vinh Q., Ruder Sebastian, Gupta Jai Prakash, Chung Hyung Won, Bahri Dara, Qin Zhen, Baumgartner Simon, Yu Cong, and Metzler Donald. 2021. Charformer: Fast character transformers via gradient-based subword tokenization. Retrieved from https://arxiv.org/abs/2106.12672.Google Scholar
[138] Dijk Teun A. Van. 1985. Handbook of discourse analysis. In Discourse and Dialogue. Citeseer.Google Scholar
[139] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the Annual Conference on Neural Information Processing Systems. 5998–6008.Google Scholar
[140] Velickovic Petar, Cucurull Guillem, Casanova Arantxa, Romero Adriana, Liò Pietro, and Bengio Yoshua. 2018. Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations.Google Scholar
[141] Vincent Pascal, Larochelle Hugo, Bengio Yoshua, and Manzagol Pierre-Antoine. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. 1096–1103.Google ScholarDigital Library
[142] Vinyals Oriol, Kaiser Lukasz, Koo Terry, Petrov Slav, Sutskever Ilya, and Hinton Geoffrey E.. 2015. Grammar as a foreign language. In Proceedings of the Advances in Neural Information Processing Systems Conference. 2773–2781.Google Scholar
[143] Vulic Ivan, Mrksic Nikola, Reichart Roi, Séaghdha Diarmuid Ó., Young Steve J., and Korhonen Anna. 2017. Morph-fitting: Fine-tuning word vector spaces with simple language-specific rules. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 56–68.Google Scholar
[144] Wang Alex, Singh Amanpreet, Michael Julian, Hill Felix, Levy Omer, and Bowman Samuel R.. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 7th International Conference on Learning Representations.Google Scholar
[145] Wang Suhang, Tang Jiliang, Aggarwal Charu C., and Liu Huan. 2016. Linked document embedding for classification. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 115–124.Google ScholarDigital Library
[146] Widdowson Henry George. 1979. Explorations in Applied Linguistics. Vol. 1. Oxford University Press.Google Scholar
[147] Wiebe Janyce, Wilson Theresa, and Cardie Claire. 2005. Annotating expressions of opinions and emotions in language. Lang. Resour. Eval. 39, 2–3 (2005), 165–210.Google ScholarCross Ref
[148] Wieting John, Bansal Mohit, Gimpel Kevin, and Livescu Karen. 2015. From paraphrase database to compositional paraphrase model and back. Trans. Assoc. Comput. Ling. 3 (2015), 345–358.Google ScholarCross Ref
[149] Wieting John, Bansal Mohit, Gimpel Kevin, and Livescu Karen. 2016. Towards universal paraphrastic sentence embeddings. In Proceedings of the 4th International Conference on Learning Representations.Google Scholar
[150] Wieting John and Gimpel Kevin. 2017. Revisiting recurrent networks for paraphrastic sentence embeddings. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2078–2088.Google ScholarCross Ref
[151] Williams Adina, Nangia Nikita, and Bowman Samuel R.. 2018. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1112–1122.Google ScholarCross Ref
[152] Wu Wei, Wang Houfeng, Liu Tianyu, and Ma Shuming. 2018. Phrase-level self-attention networks for universal sentence encoding. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 3729–3738.Google ScholarCross Ref
[153] Xu Wei, Callison-Burch Chris, and Dolan Bill. 2015. SemEval-2015 Task 1: Paraphrase and semantic similarity in Twitter (PIT). In Proceedings of the 9th International Workshop on Semantic Evaluation. 1–11.Google ScholarCross Ref
[154] Xue Linting, Barua Aditya, Constant Noah, Al-Rfou Rami, Narang Sharan, Kale Mihir, Roberts Adam, and Raffel Colin. 2021. ByT5: Towards a token-free future with pre-trained byte-to-byte models. (2021). arXiv 2105.13626.Google Scholar
[155] Yang Yinfei, Cer Daniel, Ahmad Amin, Guo Mandy, Law Jax, Constant Noah, Ábrego Gustavo Hernández, Yuan Steve, Tar Chris, Sung Yun-Hsuan, Strope Brian, and Kurzweil Ray. 2020. Multilingual universal sentence encoder for semantic retrieval. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 87–94.Google ScholarCross Ref
[156] Yang Zhilin, Dai Zihang, Yang Yiming, Carbonell Jaime G., Salakhutdinov Ruslan, and Le Quoc V.. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Annual Conference on Neural Information Processing Systems. 5754–5764.Google Scholar
[157] Yule George and Brown Gillian R.. 1986. Discourse Analysis. Cambridge University Press.Google Scholar
[158] Zaid M. A.. 2009. A comparison of inferencing and meaning-guessing of new lexicon in context versus non-context vocabulary presentation. Read. Matrix: Int. Online J. 9, 1 (2009), 56–66.Google Scholar
[159] Zellers Rowan, Bisk Yonatan, Schwartz Roy, and Choi Yejin. 2018. SWAG: a large-scale adversarial dataset for grounded commonsense inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 93–104.Google ScholarCross Ref
[160] Zhang Han, Xu Tao, and Li Hongsheng. 2017. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 5908–5916.Google ScholarCross Ref
[161] Zhang Minghua, Wu Yunfang, Li Weikang, and Li Wei. 2018. Learning universal sentence representations with mean-max attention autoencoder. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 4514–4523.Google ScholarCross Ref
[162] Zhang Yuan, Baldridge Jason, and He Luheng. 2019. PAWS: Paraphrase adversaries from word scrambling. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies). 1298–1308.Google Scholar
[163] Zhao Han, Lu Zhengdong, and Poupart Pascal. 2015. Self-adaptive hierarchical sentence model. In Proceedings of the 24th International Joint Conference on Artificial Intelligence. 4069–4076.Google ScholarDigital Library
[164] Zhu Yukun, Kiros Ryan, Zemel Richard S., Salakhutdinov Ruslan, Urtasun Raquel, Torralba Antonio, and Fidler Sanja. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of IEEE International Conference on Computer Vision. 19–27.Google Scholar

Index Terms

A Brief Overview of Universal Sentence Representation Methods: A Linguistic View
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
    2. Natural language processing
      1. Information extraction
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Subword-based Sentence Representation Model for Sentiment Classification
SMA 2020: The 9th International Conference on Smart Media and Applications

While most embedding methods in the Korean language focus on morpheme unit to alleviate the out of vocabulary problem, recent researches in the English use the subword unit for embedding. Considering that a word is composed of subwords, which have a ...
Read More
PKU Paraphrase Bank: A Sentence-Level Paraphrase Corpus for Chinese
Natural Language Processing and Chinese Computing
Abstract
One of the main challenges of conducting research on paraphrase is the lack of large-scale, high-quality corpus, which is particularly serious for non-English investigations. In this paper, we present a simple and effective unsupervised learning ...
Read More
Unsupervised sentence representations as word information series: Revisiting TF–IDF
Highlights
- An unsupervised sentence representation (embedding) method is proposed.
- Our ...
Abstract
Sentence representation at the semantic level is a challenging task for natural language processing and Artificial Intelligence. Despite the advances in word embeddings (i.e. word vector representations), capturing sentence meaning is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 55, Issue 3
March 2023
772 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3514180
Editor:
Albert Zomaya
University of Sydney, Australia
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 March 2022
- Accepted: 1 August 2021
- Revised: 1 April 2021
- Received: 1 August 2019
Published in csur Volume 55, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Sentence embedding
universal representation
deep learning
representation learning
Qualifiers
- survey
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 1,881
  Total Downloads
- Downloads (Last 12 months)610
- Downloads (Last 6 weeks)80
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

A Brief Overview of Universal Sentence Representation Methods: A Linguistic View

ACM Computing Surveys

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Subword-based Sentence Representation Model for Sentiment Classification

PKU Paraphrase Bank: A Sentence-Level Paraphrase Corpus for Chinese

Unsupervised sentence representations as word information series: Revisiting TF–IDF