Abstract
Neural network techniques are widely applied to obtain high-quality distributed representations of words (i.e., word embeddings) to address text mining, information retrieval, and natural language processing tasks. Most recent efforts have proposed several efficient methods to learn word embeddings from context such that they can encode both semantic and syntactic relationships between words. However, it is quite challenging to handle unseen or rare words with insufficient context. Inspired by the study on the word recognition process in cognitive psychology, in this article, we propose to take advantage of seemingly less obvious but essentially important morphological knowledge to address these challenges. In particular, we introduce a novel neural network architecture called KNET that leverages both words’ contextual information and morphological knowledge to learn word embeddings. Meanwhile, this new learning architecture is also able to benefit from noisy knowledge and balance between contextual information and morphological knowledge. Experiments on an analogical reasoning task and a word similarity task both demonstrate that the proposed KNET framework can greatly enhance the effectiveness of word embeddings.
- Y. Bengio and J.-S. Senecal, and others. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling.Google Scholar
- Y. Bengio and J.-S. Senecal. 2008. Adaptive importance sampling to accelerate training of a neural probabilistic language model. Trans. Neur. Netw. 19, 4, 713--722. Google ScholarDigital Library
- J. Bian, B. Gao, and T.-Y. Liu. 2014. Knowledge-powered deep learning for word embedding. In Proc. of ECML/PKDD. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. Jordan. 2003. Latent Dirichlet allocation. The Journal of Machine Learning Research 3, 993--1022. Google ScholarDigital Library
- A. Bordes, J. Weston, R. Collobert, Y. Bengio, and others. 2011. Learning structured embeddings of knowledge bases. In AAAI.Google Scholar
- J. W. Chapman. 1998. Language prediction skill, phonological recoding ability, and beginning reading. Reading and Spelling: Development and Disorders, 33.Google Scholar
- R. Collobert and J. Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In ICML. ACM, New York, NY, 160--167. Google ScholarDigital Library
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. 2011. Natural language processing (almost) from scratch. JMLR 12, 2493--2537. Google ScholarDigital Library
- M. Creutz and K. Lagus. 2007. Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing (TSLP) 4, 1 (January 2007), 3. Google ScholarDigital Library
- L. Deng, X. He, and J. Gao. 2013. Deep stacking networks for information retrieval. In ICASSP. 3153--3157.Google Scholar
- L. C. Ehri. 2005. Learning to read words: Theory, findings, and issues. Scientific Studies of Reading 9, 2, 167--188.Google ScholarCross Ref
- L. C. Ehri, R. Barr, M. L. Kamil, P. Mosenthal, and P. D. Pearson. 1991. Development of the ability to read words. Handbook of Reading Research 2, 383--417.Google Scholar
- L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. 2001. Placing search in context: The concept revisited. In Proceedings of the 10th International Conference on World Wide Web. ACM, 406--414. Google ScholarDigital Library
- X. Glorot, A. Bordes, and Y. Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 513--520.Google Scholar
- U. Goswami. 1986. Children’s use of analogy in learning to read: A developmental study. Journal of Experimental Child Psychology. 42, 1, 73--83.Google ScholarCross Ref
- M. U. Gutmann and A. Hyvärinen. 2012. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 13, 307--361. Google ScholarDigital Library
- G. E. Hinton, J. L. McClelland, and D. E. Rumelhart. 1986. Distributed representations. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, 3:1137--1155. Google ScholarDigital Library
- T. Hofmann. 1999. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., 289--296. Google ScholarDigital Library
- E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 873--882. Google ScholarDigital Library
- F. M. Liang. 1983. Word Hy-phen-a-tion by Com-put-er (Hyphenation, Computer). Stanford University, Stanford, CA, USA.Google Scholar
- M.-T. Luong, R. Socher, and C. D. Manning. 2013. Better word representations with recursive neural networks for morphology. CoNLL-2013. 104.Google Scholar
- T. Mikolov. 2012. Statistical Language Models Based on Neural Networks. Ph.D. Dissertation. Brno University of Technology.Google Scholar
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013a. Efficient estimation of word representations in vector space (ICLR’13).Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013b. Distributed representations of words and phrases and their compositionality. In NIPS. 3111--3119.Google Scholar
- A. Mnih and G. E. Hinton. 2008. A scalable hierarchical distributed language model. In NIPS. 1081--1088.Google Scholar
- A. Mnih and K. Kavukcuoglu. 2013. Learning word embeddings efficiently with noise-contrastive estimation. In NIPS. 2265--2273.Google Scholar
- A Mnih and Y. W. Teh. 2012. A fast and simple algorithm for training neural probabilistic language models. In ICML. Omnipress, New York, NY, 1751--1758.Google Scholar
- F. Morin and Y. Bengio. 2005. Hierarchical probabilistic neural network language model. In AISTATS. 246--252.Google Scholar
- A. El-Desoky Mousa, H.-K. J. Kuo, L. Mangu, and H. Soltau. 2013. Morpheme-based feature-rich language models using deep neural networks for lvcsr of egyptian arabic. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8435--8439.Google Scholar
- S. Qiu, Q. Cui, J. Bian, B. Gao, and T.-Y. Liu. 2014. Co-learning of word representations and morpheme representations. In Proc. of COLING.Google Scholar
- R. Socher, D. Chen, C. D. Manning, and A. Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In NIPS. 926--934.Google Scholar
- R. Socher, C. C. Lin, A. Y. Ng, and C. D. Manning. 2011. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 129--136.Google ScholarDigital Library
- H. Sperr, J. Niehues, and A. Waibel. 2013. Letter n-gram-based input encoding for continuous space language models. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality. 30--39.Google Scholar
- J. P. Turian, L.-A. Ratinov, and Y. Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In ACL. 384--394. Google ScholarDigital Library
- P. D. Turney. 2013. Distributional semantics beyond words: Supervised learning of analogy and paraphrase. TACL, 353--366.Google Scholar
- P. D. Turney and P. Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37, (Jan 2010), 141--188. Google ScholarDigital Library
- J. Weston, A. Bordes, O. Yakhnenko, and N. Usunier. 2013. Connecting language and knowledge bases with embedding models for relation extraction. arXiv preprint arXiv:1307.7973.Google Scholar
- M. Yu and M. Dredze. 2014. Improving lexical embeddings with semantic knowledge. In Association for Computational Linguistics (ACL). 545--550.Google Scholar
Index Terms
- KNET: A General Framework for Learning Word Embedding Using Morphological Knowledge
Recommendations
Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding
AbstractWord Sense Disambiguation (WSD) is a task which aims to identify the meaning of a word given its context. This problem has been investigated and analyzed in depth in English. However, work in Arabic has been limited despite the fact that there are ...
Multi-prototype Morpheme Embedding for Text Classification
SMA 2020: The 9th International Conference on Smart Media and ApplicationsRepresenting a word into a continuous space, also known as a word vector, has been successful in various NLP tasks. The word-based embedding has two problems; one is the out-of-vocabulary problem and the other is does not take into account the context ...
Composing Word Embeddings for Compound Words Using Linguistic Knowledge
In recent years, the use of distributed representations has been a fundamental technology for natural language processing. However, Japanese has multiple compound words, and often we must compare the meanings of a word and a compound word. Moreover, word ...
Comments