skip to main content
research-article

KNET: A General Framework for Learning Word Embedding Using Morphological Knowledge

Authors Info & Claims
Published:24 August 2015Publication History
Skip Abstract Section

Abstract

Neural network techniques are widely applied to obtain high-quality distributed representations of words (i.e., word embeddings) to address text mining, information retrieval, and natural language processing tasks. Most recent efforts have proposed several efficient methods to learn word embeddings from context such that they can encode both semantic and syntactic relationships between words. However, it is quite challenging to handle unseen or rare words with insufficient context. Inspired by the study on the word recognition process in cognitive psychology, in this article, we propose to take advantage of seemingly less obvious but essentially important morphological knowledge to address these challenges. In particular, we introduce a novel neural network architecture called KNET that leverages both words’ contextual information and morphological knowledge to learn word embeddings. Meanwhile, this new learning architecture is also able to benefit from noisy knowledge and balance between contextual information and morphological knowledge. Experiments on an analogical reasoning task and a word similarity task both demonstrate that the proposed KNET framework can greatly enhance the effectiveness of word embeddings.

References

  1. Y. Bengio and J.-S. Senecal, and others. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling.Google ScholarGoogle Scholar
  2. Y. Bengio and J.-S. Senecal. 2008. Adaptive importance sampling to accelerate training of a neural probabilistic language model. Trans. Neur. Netw. 19, 4, 713--722. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Bian, B. Gao, and T.-Y. Liu. 2014. Knowledge-powered deep learning for word embedding. In Proc. of ECML/PKDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. M. Blei, A. Y. Ng, and M. Jordan. 2003. Latent Dirichlet allocation. The Journal of Machine Learning Research 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Bordes, J. Weston, R. Collobert, Y. Bengio, and others. 2011. Learning structured embeddings of knowledge bases. In AAAI.Google ScholarGoogle Scholar
  6. J. W. Chapman. 1998. Language prediction skill, phonological recoding ability, and beginning reading. Reading and Spelling: Development and Disorders, 33.Google ScholarGoogle Scholar
  7. R. Collobert and J. Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In ICML. ACM, New York, NY, 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. 2011. Natural language processing (almost) from scratch. JMLR 12, 2493--2537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Creutz and K. Lagus. 2007. Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing (TSLP) 4, 1 (January 2007), 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Deng, X. He, and J. Gao. 2013. Deep stacking networks for information retrieval. In ICASSP. 3153--3157.Google ScholarGoogle Scholar
  11. L. C. Ehri. 2005. Learning to read words: Theory, findings, and issues. Scientific Studies of Reading 9, 2, 167--188.Google ScholarGoogle ScholarCross RefCross Ref
  12. L. C. Ehri, R. Barr, M. L. Kamil, P. Mosenthal, and P. D. Pearson. 1991. Development of the ability to read words. Handbook of Reading Research 2, 383--417.Google ScholarGoogle Scholar
  13. L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. 2001. Placing search in context: The concept revisited. In Proceedings of the 10th International Conference on World Wide Web. ACM, 406--414. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. Glorot, A. Bordes, and Y. Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 513--520.Google ScholarGoogle Scholar
  15. U. Goswami. 1986. Children’s use of analogy in learning to read: A developmental study. Journal of Experimental Child Psychology. 42, 1, 73--83.Google ScholarGoogle ScholarCross RefCross Ref
  16. M. U. Gutmann and A. Hyvärinen. 2012. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 13, 307--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. E. Hinton, J. L. McClelland, and D. E. Rumelhart. 1986. Distributed representations. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, 3:1137--1155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Hofmann. 1999. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., 289--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 873--882. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. M. Liang. 1983. Word Hy-phen-a-tion by Com-put-er (Hyphenation, Computer). Stanford University, Stanford, CA, USA.Google ScholarGoogle Scholar
  21. M.-T. Luong, R. Socher, and C. D. Manning. 2013. Better word representations with recursive neural networks for morphology. CoNLL-2013. 104.Google ScholarGoogle Scholar
  22. T. Mikolov. 2012. Statistical Language Models Based on Neural Networks. Ph.D. Dissertation. Brno University of Technology.Google ScholarGoogle Scholar
  23. T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013a. Efficient estimation of word representations in vector space (ICLR’13).Google ScholarGoogle Scholar
  24. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013b. Distributed representations of words and phrases and their compositionality. In NIPS. 3111--3119.Google ScholarGoogle Scholar
  25. A. Mnih and G. E. Hinton. 2008. A scalable hierarchical distributed language model. In NIPS. 1081--1088.Google ScholarGoogle Scholar
  26. A. Mnih and K. Kavukcuoglu. 2013. Learning word embeddings efficiently with noise-contrastive estimation. In NIPS. 2265--2273.Google ScholarGoogle Scholar
  27. A Mnih and Y. W. Teh. 2012. A fast and simple algorithm for training neural probabilistic language models. In ICML. Omnipress, New York, NY, 1751--1758.Google ScholarGoogle Scholar
  28. F. Morin and Y. Bengio. 2005. Hierarchical probabilistic neural network language model. In AISTATS. 246--252.Google ScholarGoogle Scholar
  29. A. El-Desoky Mousa, H.-K. J. Kuo, L. Mangu, and H. Soltau. 2013. Morpheme-based feature-rich language models using deep neural networks for lvcsr of egyptian arabic. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8435--8439.Google ScholarGoogle Scholar
  30. S. Qiu, Q. Cui, J. Bian, B. Gao, and T.-Y. Liu. 2014. Co-learning of word representations and morpheme representations. In Proc. of COLING.Google ScholarGoogle Scholar
  31. R. Socher, D. Chen, C. D. Manning, and A. Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In NIPS. 926--934.Google ScholarGoogle Scholar
  32. R. Socher, C. C. Lin, A. Y. Ng, and C. D. Manning. 2011. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 129--136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Sperr, J. Niehues, and A. Waibel. 2013. Letter n-gram-based input encoding for continuous space language models. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality. 30--39.Google ScholarGoogle Scholar
  34. J. P. Turian, L.-A. Ratinov, and Y. Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In ACL. 384--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. D. Turney. 2013. Distributional semantics beyond words: Supervised learning of analogy and paraphrase. TACL, 353--366.Google ScholarGoogle Scholar
  36. P. D. Turney and P. Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37, (Jan 2010), 141--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Weston, A. Bordes, O. Yakhnenko, and N. Usunier. 2013. Connecting language and knowledge bases with embedding models for relation extraction. arXiv preprint arXiv:1307.7973.Google ScholarGoogle Scholar
  38. M. Yu and M. Dredze. 2014. Improving lexical embeddings with semantic knowledge. In Association for Computational Linguistics (ACL). 545--550.Google ScholarGoogle Scholar

Index Terms

  1. KNET: A General Framework for Learning Word Embedding Using Morphological Knowledge

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Information Systems
            ACM Transactions on Information Systems  Volume 34, Issue 1
            October 2015
            172 pages
            ISSN:1046-8188
            EISSN:1558-2868
            DOI:10.1145/2806674
            Issue’s Table of Contents

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 24 August 2015
            • Revised: 1 June 2015
            • Accepted: 1 June 2015
            • Received: 1 May 2014
            Published in tois Volume 34, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader