Abstract
How to transfer the semantic information in a sentence to a computable numerical embedding form is a fundamental problem in natural language processing. An informative universal sentence embedding can greatly promote subsequent natural language processing tasks. However, unlike universal word embeddings, a widely accepted general-purpose sentence embedding technique has not been developed. This survey summarizes the current universal sentence-embedding methods, categorizes them into four groups from a linguistic view, and ultimately analyzes their reported performance. Sentence embeddings trained from words in a bottom-up manner are observed to have different, nearly opposite, performance patterns in downstream tasks compared to those trained from logical relationships between sentences. By comparing differences of training schemes in and between groups, we analyze possible essential reasons for different performance patterns. We additionally collect incentive strategies handling sentences from other models and propose potentially inspiring future research directions.
- [1] . 2016. Fine-grained analysis of sentence embeddings using auxiliary prediction tasks. Corr abs/1608.04207 (2016).Google Scholar
- [2] . 2015. SemEval-2015 task 2: Semantic textual similarity, English, Spanish and pilot on interpretability. In Proceedings of the 9th International Workshop on Semantic Evaluation. 252–263.Google ScholarCross Ref
- [3] . 2014. SemEval-2014 task 10: Multilingual semantic textual similarity. In Proceedings of the 8th International Workshop on Semantic Evaluation. 81–91.Google ScholarCross Ref
- [4] . 2016. SemEval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In Proceedings of the 10th International Workshop on Semantic Evaluation. 497–511.Google ScholarCross Ref
- [5] . 2012. SemEval-2012 task 6: a pilot on semantic textual similarity. In Proceedings of the 6th International Workshop on Semantic Evaluation. 385–393.Google Scholar
- [6] . 2013. *SEM 2013 shared task: Semantic textual similarity. In Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics. 32–43.Google Scholar
- [7] . 2016. A latent variable model approach to PMI-based word embeddings. Trans. Assoc. Comput. Ling. 4 (2016), 385–399.Google ScholarCross Ref
- [8] . 2017. A simple but tough-to-beat baseline for sentence embeddings. In Proceedings of 5th International Conference on Learning Representations.Google Scholar
- [9] . 2016. Layer normalization. Corr abs/1607.06450 (2016).Google Scholar
- [10] . 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations.Google Scholar
- [11] . 2014. Tailoring continuous word representations for dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 809–815.Google ScholarCross Ref
- [12] . 2000. A model of inductive bias learning. J. Artif. Intell. Res. 12 (2000), 149–198.Google ScholarCross Ref
- [13] . 2017. Enriching word vectors with subword information. Trans. Assoc. Comput. Ling. 5 (2017), 135–146.Google ScholarCross Ref
- [14] . 2015. A large annotated corpus for learning natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 632–642.Google ScholarCross Ref
- [15] . 2007. The interplay of discourse congruence and lexical association during sentence processing: Evidence from ERPs and eye tracking. J. Mem. Lang. 56, 1 (2007), 103–128.Google ScholarCross Ref
- [16] . 2017. SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation. 1–14.Google ScholarCross Ref
- [17] . 2018. Universal sentence encoder for English. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 169–174.Google ScholarCross Ref
- [18] . 1904. The proof and measurement of association between two things. Amer. J. Psychol. 15, 1 (1904), 72–101.Google ScholarCross Ref
- [19] . 2015. Semantic image segmentation with deep convolutional nets and fully connected CRFs. In Proceedings of 3rd International Conference on Learning Representations.Google Scholar
- [20] . 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2018), 834–848.Google ScholarCross Ref
- [21] . 2015. Event extraction via dynamic multi-pooling convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. 167–176.Google ScholarCross Ref
- [22] . 1993. Chinese students word-solving strategies in reading in English. Sec. Lang. Read. Vocab. Learn. (1993), 67–85.Google Scholar
- [23] . 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1724–1734.Google ScholarCross Ref
- [24] . 1980. Guessing the meanings of words from context. System 8, 3 (1980), 211–220.Google ScholarCross Ref
- [25] . 2016. Is an image worth more than a thousand words? On the fine-grain semantic differences between visual and linguistic representations. In Proceedings of the 26th International Conference on Computational Linguistics. 2807–2817.Google Scholar
- [26] . 2017. Imagined visual representations as multimodal embeddings. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 4378–4384.Google Scholar
- [27] . 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. 160–167.Google ScholarDigital Library
- [28] . 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 (2011), 2493–2537.Google ScholarCross Ref
- [29] . 2018. SentEval: An evaluation toolkit for universal sentence representations. In Proceedings of the 11th International Conference on Language Resources and Evaluation.Google Scholar
- [30] . 2017. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 670–680.Google ScholarCross Ref
- [31] . 2018. What you can cram into a single \$&!#* vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2126–2136.Google ScholarCross Ref
- [32] . 2003. Towards incremental parsing of natural language using recursive neural networks. Appl. Intell. 19, 1–2 (2003), 9–25.Google ScholarDigital Library
- [33] . 2015. Document embedding with paragraph vectors. arxiv:1507.07998.Google Scholar
- [34] . 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186.Google Scholar
- [35] . 2004. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics.Google ScholarDigital Library
- [36] . 2020. Self-training improves pre-training for natural language understanding. arxiv:2010.02194.Google Scholar
- [37] . 1990. Finding structure in time. Cogn. Sci. 14, 2 (1990), 179–211.Google ScholarCross Ref
- [38] . 2019. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 55–65.Google ScholarCross Ref
- [39] . 2020. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Trans. Assoc. Comput. Ling. 8 (2020), 34–48.Google ScholarCross Ref
- [40] . 2019. Knowledge-enhanced ensemble learning for word embeddings. In Proceedings of the World Wide Web Conference. 427–437.Google ScholarDigital Library
- [41] . 2018. LibGuides: Literature Review: Transition Words.Google Scholar
- [42] . 1981. On the status of theme in English: Arguments from discourse. In Forum Linguisticum, Vol. 6. Helmut Buske Verlag (Papers in Textlinguistics 45) Hamburg, 1–38.Google Scholar
- [43] . 2013. PPDB: the paraphrase database. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 758–764.Google Scholar
- [44] . 1989. Processes in language production. Ling.: Cambr. Surv. 3 (1989), 69–96.Google Scholar
- [45] . 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning. 1243–1252.Google ScholarDigital Library
- [46] . 2019. Assessing BERT’s syntactic abilities. arxiv:1901.05287.Google Scholar
- [47] . 1996. Learning task-dependent distributed representations by backpropagation through structure. Neural Netw. 1 (1996), 347–352.Google Scholar
- [48] . 2018. VSE-ens: Visual-semantic embeddings with efficient negative sampling. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 290–297.Google Scholar
- [49] . 2017. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems Conference. 1024–1034.Google Scholar
- [50] . 2019. Exploiting sentence embedding for medical question answering. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, the 31st Innovative Applications of Artificial Intelligence Conference, and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence. 938–945.Google ScholarDigital Library
- [51] . 2017. Efficient natural language response suggestion for smart reply. Corr abs/1705.00652.Google Scholar
- [52] . 2019. A structural probe for finding syntax in word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4129–4138.Google Scholar
- [53] . 2016. Learning distributed representations of sentences from unlabelled data. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1367–1377.Google ScholarCross Ref
- [54] . 1984. Distributed representations. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations. MIT Press, 77–109.Google Scholar
- [55] . 1996. LSTM can solve hard long time lag problems. In Proceedings of the Advances in Neural Information Processing Systems Conference. 473–479.Google Scholar
- [56] . 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.Google ScholarDigital Library
- [57] . 2004. Mining and summarizing customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 168–177.Google ScholarDigital Library
- [58] . 1993. Strategies for inferring word-meanings in context: A cognitive model. In Second Language Reading And Vocabulary Learning. Ablex Publishing Corporation, 153–178.Google Scholar
- [59] . 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. 1681–1691.Google ScholarCross Ref
- [60] . 2019. What does BERT learn about the structure of language? In Proceedings of the 57th Conference of the Association for Computational Linguistics. 3651–3657.Google ScholarCross Ref
- [61] . 2017. Discourse-based objectives for fast unsupervised sentence representation learning. Corr abs/1705.00557 (2017).Google Scholar
- [62] . 2017. Deep pyramid convolutional neural networks for text categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 562–570.Google ScholarCross Ref
- [63] . 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. 655–665.Google ScholarCross Ref
- [64] . 1895. note on regression and inheritance in the case of two parents. Proc. Roy. Societ. Lond. Series I 58 (1895), 240–242.Google Scholar
- [65] . 2020. Are pre-trained language models aware of phrases? Simple but strong baselines for grammar induction. In Proceedings of the International Conference on Learning Representations. https://openreview.net/pdf?id=H1xPR3NtPB.Google Scholar
- [66] . 2014. Convolutional neural networks for sentence classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1746–1751.Google ScholarCross Ref
- [67] . 2015. Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations.Google Scholar
- [68] . 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations.Google Scholar
- [69] . 2018. InferLite: Simple universal sentence representations from natural language inference data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 4868–4874.Google ScholarCross Ref
- [70] . 2014. Unifying visual-semantic embeddings with multimodal neural language models. Corr abs/1411.2539 (2014).Google Scholar
- [71] . 2015. Skip-thought vectors. In Proceedings of the Advances in Neural Information Processing Systems Conference. 3294–3302.Google Scholar
- [72] . 2018. Subword regularization: Improving neural network translation models with multiple subword candidates. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 66–75.Google ScholarCross Ref
- [73] . 2018. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 66–71.Google ScholarCross Ref
- [74] . 1992. Techniques for automatically correcting words in text. ACM Comput. Surv. 24, 4 (1992), 377–439.Google ScholarDigital Library
- [75] . 2011. Electrophysiological correlates of processing causal relationships between sentences. J. Cogn. Neurosci. 23, 5 (2011), 1230–1246.Google Scholar
- [76] . 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2267–2273.Google ScholarDigital Library
- [77] . 2019. ALBERT: a lite BERT for self-supervised learning of language representations. arxiv:1909.11942.Google Scholar
- [78] . 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning. 1188–1196.Google ScholarDigital Library
- [79] . 2020. On the sentence embeddings from pre-trained language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 9119–9130.Google ScholarCross Ref
- [80] . 2015. When are tree structures necessary for deep learning of representations? In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2304–2314.Google ScholarCross Ref
- [81] . 2002. Learning question classifiers. In Proceedings of the 19th International Conference on Computational Linguistics.Google ScholarDigital Library
- [82] . 2018. Visual question generation as dual task of visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6116–6124.Google ScholarCross Ref
- [83] . 2014. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision. 740–755.Google ScholarCross Ref
- [84] . 2017. A structured self-attentive sentence embedding. Corr abs/1703.03130 (2017).Google Scholar
- [85] . 2018. Table-to-text generation by structure-aware Seq2seq learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, the 30th Innovative Applications of Artificial Intelligence, and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence. 4881–4888.Google Scholar
- [86] . 2019. RoBERTa: a robustly optimized BERT pretraining approach. Corr abs/1907.11692.Google Scholar
- [87] . 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. Corr abs/1605.09090.Google Scholar
- [88] . 2018. An efficient framework for learning sentence representations. In Proceedings of the 6th International Conference on Learning Representations Conference.Google Scholar
- [89] . 2014. A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of the 9th International Conference on Language Resources and Evaluation. 216–223.Google Scholar
- [90] . 1913. Essai d’une recherche statistique sur le texte du roman “Eugene Onegin” illustrant la liaison des epreuve en chain (‘Example of a statistical investigation of the text of ‘Eugene Onegin’ illustrating the dependence between samples in chain”). Izvistia Imperatorskoi Akademii Nauk (Bulletin De l’académie impériale Des Sciences De st.-pétersbourg) 7 (1913), 153–162.
English translation by Morris Halle, 1956. Google Scholar - [91] . 2018. Sentence and word embedding employed in open question-answering. In Proceedings of the 10th International Conference on Agents and Artificial Intelligence. 486–492.Google ScholarCross Ref
- [92] . 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations.Google Scholar
- [93] . 2013. Exploiting similarities among languages for machine translation. Retrieved from http://arxiv.org/abs/1309.4168.Google Scholar
- [94] . 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Advances in Neural Information Processing Systems Conference. 3111–3119.Google Scholar
- [95] . 2017. Geometric deep learning on graphs and manifolds using mixture model CNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5425–5434.Google ScholarCross Ref
- [96] . 2017. Semantic specialization of distributional word vector spaces using monolingual and cross-lingual constraints. Trans. Assoc. Comput. Ling. 5 (2017), 309–324.Google ScholarCross Ref
- [97] . 1998. The role of natural language processing in alternative and augmentative communication. Nat. Lang. Eng. 4, 1 (1998), 1–16.Google ScholarDigital Library
- [98] . 2015. Event detection and domain adaptation with convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. 365–371.Google ScholarCross Ref
- [99] . 2017. DisSent: Sentence representation learning from explicit discourse relations. Corr abs/1710.04334.Google Scholar
- [100] . 2006. When peanuts fall in love: N400 evidence for the power of discourse. J. Cogn. Neurosci. 18, 7 (2006), 1098–1111.Google ScholarDigital Library
- [101] . [n.d.]. Joint word and entity embeddings for entity retrieval from a knowledge graph. In Proceedings of the 42nd European Conference on IR Research: Advances in Information Retrieval.Google Scholar
- [102] . 2007. Drawing inferences during discourse comprehension: An ERP study. Retrieved on 10 Nov., 2021 from https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.505.3466&rep=rep1&type=pdf.Google Scholar
- [103] . 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. 271–278.Google ScholarDigital Library
- [104] . 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. 115–124.Google ScholarDigital Library
- [105] . 2015. An annotated corpus of argumentative microtexts. In Proceedings of the 1st Conference on Argumentation.Google Scholar
- [106] . 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1532–1543.Google ScholarCross Ref
- [107] . 2018. Evaluation of sentence embeddings in downstream and linguistic probing tasks. Corr abs/1806.06259.Google Scholar
- [108] . 2018. Deep contextualized word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2227–2237.Google ScholarCross Ref
- [109] . 1990. Recursive distributed representations. Artif. Intell. 46, 1–2 (1990), 77–105.Google ScholarDigital Library
- [110] . 2011. Cross-lingual adaptation using structural correspondence learning. ACM Trans. Intell. Syst. Technol. 3, 1 (2011), 13:1–13:22.Google ScholarDigital Library
- [111] . 2020. Pre-trained models for natural language processing: a survey.
arxiv:2003.08271 .Google Scholar - [112] . 2018. Improving language understanding by generative pre-training. (2018). https://www.cs.ubc.ca/amuham01/LING530/papers/radford2018improving.pdf.Google Scholar
- [113] . 2016. SQuAD: 100, 000+ questions for machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2383–2392.Google ScholarCross Ref
- [114] . 2019. Sentence-BERT: Sentence embeddings using siamese BERT-Networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 3980–3990.Google ScholarCross Ref
- [115] . 2020. Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 4512–4525.Google ScholarCross Ref
- [116] . 1980. Conditions for text coherence. Poet. Today 1, 4 (1980), 161–180.Google ScholarCross Ref
- [117] . 1999. The effect of exposure frequency on intermediate language learners’ incidental vocabulary acquisition and retention through reading. Stud. Sec. Lang. Acquis. 21, 4 (1999), 589–619.Google ScholarCross Ref
- [118] . 2018. Using word embeddings for information retrieval: How collection and term normalization choices affect performance. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1835–1838.Google ScholarDigital Library
- [119] . 2018. Concatenated \( p \)-mean word embeddings as universal cross-lingual sentence representations. Corr abs/1803.01400 (2018).Google Scholar
- [120] . 2020. Artificial Intelligence: a Modern Approach (4th Edition). Pearson.Google Scholar
- [121] . 2012. Japanese and Korean voice search. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 5149–5152.Google ScholarCross Ref
- [122] . 2018. Filtering and mining parallel data in a joint multilingual space. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 228–234.Google ScholarCross Ref
- [123] . 2017. Learning joint multilingual sentence representations with neural machine translation. In Proceedings of the 2nd Workshop on Representation Learning for NLP. 157–167.Google ScholarCross Ref
- [124] . 2018. A corpus for multilingual document classification in eight languages. In Proceedings of the 11th International Conference on Language Resources and Evaluation.Google Scholar
- [125] . 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 1073–1083.Google ScholarCross Ref
- [126] . 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Google ScholarCross Ref
- [127] . 2018. Learning visually-grounded semantics from contrastive adversarial samples. In Proceedings of the 27th International Conference on Computational Linguistics. 3715–3727.Google Scholar
- [128] . 2019. Mining discourse markers for unsupervised sentence representation learning. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3477–3486.Google ScholarCross Ref
- [129] . 2011. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Proceedings of the Advances in Neural Information Processing Systems Conference. 801–809.Google Scholar
- [130] . 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1631–1642.Google Scholar
- [131] . 1981. Coherence as a pragmatic concept. In Possibilities and Limitations of Pragmatics. John Benjamins, 657.Google Scholar
- [132] . 2017. Parsing argumentation structures in persuasive essays. Comput. Ling. 43, 3 (2017), 619–659.Google ScholarCross Ref
- [133] . 2018. Learning general purpose distributed sentence representations via large scale multi-task learning. In Proceedings of the 6th International Conference on Learning Representations.Google Scholar
- [134] . 2014. Sequence to sequence learning with neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems. 3104–3112.Google Scholar
- [135] . 2011. Discovering fine-grained sentiment with latent variable structured prediction models. In Proceedings of the 33rd European Conference on IR Research. 368–374.Google ScholarDigital Library
- [136] . 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 1556–1566.Google ScholarCross Ref
- [137] . 2021. Charformer: Fast character transformers via gradient-based subword tokenization. Retrieved from https://arxiv.org/abs/2106.12672.Google Scholar
- [138] . 1985. Handbook of discourse analysis. In Discourse and Dialogue. Citeseer.Google Scholar
- [139] . 2017. Attention is all you need. In Proceedings of the Annual Conference on Neural Information Processing Systems. 5998–6008.Google Scholar
- [140] . 2018. Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations.Google Scholar
- [141] . 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. 1096–1103.Google ScholarDigital Library
- [142] . 2015. Grammar as a foreign language. In Proceedings of the Advances in Neural Information Processing Systems Conference. 2773–2781.Google Scholar
- [143] . 2017. Morph-fitting: Fine-tuning word vector spaces with simple language-specific rules. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 56–68.Google Scholar
- [144] . 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 7th International Conference on Learning Representations.Google Scholar
- [145] . 2016. Linked document embedding for classification. In Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 115–124.Google ScholarDigital Library
- [146] . 1979. Explorations in Applied Linguistics. Vol. 1. Oxford University Press.Google Scholar
- [147] . 2005. Annotating expressions of opinions and emotions in language. Lang. Resour. Eval. 39, 2–3 (2005), 165–210.Google ScholarCross Ref
- [148] . 2015. From paraphrase database to compositional paraphrase model and back. Trans. Assoc. Comput. Ling. 3 (2015), 345–358.Google ScholarCross Ref
- [149] . 2016. Towards universal paraphrastic sentence embeddings. In Proceedings of the 4th International Conference on Learning Representations.Google Scholar
- [150] . 2017. Revisiting recurrent networks for paraphrastic sentence embeddings. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2078–2088.Google ScholarCross Ref
- [151] . 2018. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1112–1122.Google ScholarCross Ref
- [152] . 2018. Phrase-level self-attention networks for universal sentence encoding. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 3729–3738.Google ScholarCross Ref
- [153] . 2015. SemEval-2015 Task 1: Paraphrase and semantic similarity in Twitter (PIT). In Proceedings of the 9th International Workshop on Semantic Evaluation. 1–11.Google ScholarCross Ref
- [154] . 2021. ByT5: Towards a token-free future with pre-trained byte-to-byte models. (2021). arXiv 2105.13626.Google Scholar
- [155] . 2020. Multilingual universal sentence encoder for semantic retrieval. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 87–94.Google ScholarCross Ref
- [156] . 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Annual Conference on Neural Information Processing Systems. 5754–5764.Google Scholar
- [157] . 1986. Discourse Analysis. Cambridge University Press.Google Scholar
- [158] . 2009. A comparison of inferencing and meaning-guessing of new lexicon in context versus non-context vocabulary presentation. Read. Matrix: Int. Online J. 9, 1 (2009), 56–66.Google Scholar
- [159] . 2018. SWAG: a large-scale adversarial dataset for grounded commonsense inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 93–104.Google ScholarCross Ref
- [160] . 2017. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 5908–5916.Google ScholarCross Ref
- [161] . 2018. Learning universal sentence representations with mean-max attention autoencoder. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 4514–4523.Google ScholarCross Ref
- [162] . 2019. PAWS: Paraphrase adversaries from word scrambling. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies). 1298–1308.Google Scholar
- [163] . 2015. Self-adaptive hierarchical sentence model. In Proceedings of the 24th International Joint Conference on Artificial Intelligence. 4069–4076.Google ScholarDigital Library
- [164] . 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of IEEE International Conference on Computer Vision. 19–27.Google Scholar
Index Terms
- A Brief Overview of Universal Sentence Representation Methods: A Linguistic View
Recommendations
Subword-based Sentence Representation Model for Sentiment Classification
SMA 2020: The 9th International Conference on Smart Media and ApplicationsWhile most embedding methods in the Korean language focus on morpheme unit to alleviate the out of vocabulary problem, recent researches in the English use the subword unit for embedding. Considering that a word is composed of subwords, which have a ...
PKU Paraphrase Bank: A Sentence-Level Paraphrase Corpus for Chinese
Natural Language Processing and Chinese ComputingAbstractOne of the main challenges of conducting research on paraphrase is the lack of large-scale, high-quality corpus, which is particularly serious for non-English investigations. In this paper, we present a simple and effective unsupervised learning ...
Unsupervised sentence representations as word information series: Revisiting TF–IDF
Highlights- An unsupervised sentence representation (embedding) method is proposed.
- Our ...
AbstractSentence representation at the semantic level is a challenging task for natural language processing and Artificial Intelligence. Despite the advances in word embeddings (i.e. word vector representations), capturing sentence meaning is ...
Comments