research-article

KNET: A General Framework for Learning Word Embedding Using Morphological Knowledge

Authors:
Qing Cui

Tsinghua University, Beijing, P. R. China

Tsinghua University, Beijing, P. R. China
View Profile

,
Bin Gao

Microsoft Research, Danling St, Beijing, P. R. China

Microsoft Research, Danling St, Beijing, P. R. China
View Profile

,
Jiang Bian

Microsoft Research, Danling St, Beijing, P. R. China

Microsoft Research, Danling St, Beijing, P. R. China
View Profile

,
Siyu Qiu

Nankai University, Tianjin, P. R. China

Nankai University, Tianjin, P. R. China
View Profile

,
Hanjun Dai

Fudan University, Shanghai, P. R. China

Fudan University, Shanghai, P. R. China
View Profile

,
Tie-Yan Liu

Microsoft Research, Danling St, Beijing, P. R. China

Microsoft Research, Danling St, Beijing, P. R. China
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 34 Issue 1Article No.: 4pp 1–25https://doi.org/10.1145/2797137

Published:24 August 2015Publication History

ACM Transactions on Information Systems

Abstract

Neural network techniques are widely applied to obtain high-quality distributed representations of words (i.e., word embeddings) to address text mining, information retrieval, and natural language processing tasks. Most recent efforts have proposed several efficient methods to learn word embeddings from context such that they can encode both semantic and syntactic relationships between words. However, it is quite challenging to handle unseen or rare words with insufficient context. Inspired by the study on the word recognition process in cognitive psychology, in this article, we propose to take advantage of seemingly less obvious but essentially important morphological knowledge to address these challenges. In particular, we introduce a novel neural network architecture called KNET that leverages both words’ contextual information and morphological knowledge to learn word embeddings. Meanwhile, this new learning architecture is also able to benefit from noisy knowledge and balance between contextual information and morphological knowledge. Experiments on an analogical reasoning task and a word similarity task both demonstrate that the proposed KNET framework can greatly enhance the effectiveness of word embeddings.

References

Y. Bengio and J.-S. Senecal, and others. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling.Google Scholar
Y. Bengio and J.-S. Senecal. 2008. Adaptive importance sampling to accelerate training of a neural probabilistic language model. Trans. Neur. Netw. 19, 4, 713--722. Google ScholarDigital Library
J. Bian, B. Gao, and T.-Y. Liu. 2014. Knowledge-powered deep learning for word embedding. In Proc. of ECML/PKDD. Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and M. Jordan. 2003. Latent Dirichlet allocation. The Journal of Machine Learning Research 3, 993--1022. Google ScholarDigital Library
A. Bordes, J. Weston, R. Collobert, Y. Bengio, and others. 2011. Learning structured embeddings of knowledge bases. In AAAI.Google Scholar
J. W. Chapman. 1998. Language prediction skill, phonological recoding ability, and beginning reading. Reading and Spelling: Development and Disorders, 33.Google Scholar
R. Collobert and J. Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In ICML. ACM, New York, NY, 160--167. Google ScholarDigital Library
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. 2011. Natural language processing (almost) from scratch. JMLR 12, 2493--2537. Google ScholarDigital Library
M. Creutz and K. Lagus. 2007. Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing (TSLP) 4, 1 (January 2007), 3. Google ScholarDigital Library
L. Deng, X. He, and J. Gao. 2013. Deep stacking networks for information retrieval. In ICASSP. 3153--3157.Google Scholar
L. C. Ehri. 2005. Learning to read words: Theory, findings, and issues. Scientific Studies of Reading 9, 2, 167--188.Google ScholarCross Ref
L. C. Ehri, R. Barr, M. L. Kamil, P. Mosenthal, and P. D. Pearson. 1991. Development of the ability to read words. Handbook of Reading Research 2, 383--417.Google Scholar
L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. 2001. Placing search in context: The concept revisited. In Proceedings of the 10th International Conference on World Wide Web. ACM, 406--414. Google ScholarDigital Library
X. Glorot, A. Bordes, and Y. Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 513--520.Google Scholar
U. Goswami. 1986. Children’s use of analogy in learning to read: A developmental study. Journal of Experimental Child Psychology. 42, 1, 73--83.Google ScholarCross Ref
M. U. Gutmann and A. Hyvärinen. 2012. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 13, 307--361. Google ScholarDigital Library
G. E. Hinton, J. L. McClelland, and D. E. Rumelhart. 1986. Distributed representations. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, 3:1137--1155. Google ScholarDigital Library
T. Hofmann. 1999. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc., 289--296. Google ScholarDigital Library
E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. 2012. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 873--882. Google ScholarDigital Library
F. M. Liang. 1983. Word Hy-phen-a-tion by Com-put-er (Hyphenation, Computer). Stanford University, Stanford, CA, USA.Google Scholar
M.-T. Luong, R. Socher, and C. D. Manning. 2013. Better word representations with recursive neural networks for morphology. CoNLL-2013. 104.Google Scholar
T. Mikolov. 2012. Statistical Language Models Based on Neural Networks. Ph.D. Dissertation. Brno University of Technology.Google Scholar
T. Mikolov, K. Chen, G. Corrado, and J. Dean. 2013a. Efficient estimation of word representations in vector space (ICLR’13).Google Scholar
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013b. Distributed representations of words and phrases and their compositionality. In NIPS. 3111--3119.Google Scholar
A. Mnih and G. E. Hinton. 2008. A scalable hierarchical distributed language model. In NIPS. 1081--1088.Google Scholar
A. Mnih and K. Kavukcuoglu. 2013. Learning word embeddings efficiently with noise-contrastive estimation. In NIPS. 2265--2273.Google Scholar
A Mnih and Y. W. Teh. 2012. A fast and simple algorithm for training neural probabilistic language models. In ICML. Omnipress, New York, NY, 1751--1758.Google Scholar
F. Morin and Y. Bengio. 2005. Hierarchical probabilistic neural network language model. In AISTATS. 246--252.Google Scholar
A. El-Desoky Mousa, H.-K. J. Kuo, L. Mangu, and H. Soltau. 2013. Morpheme-based feature-rich language models using deep neural networks for lvcsr of egyptian arabic. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8435--8439.Google Scholar
S. Qiu, Q. Cui, J. Bian, B. Gao, and T.-Y. Liu. 2014. Co-learning of word representations and morpheme representations. In Proc. of COLING.Google Scholar
R. Socher, D. Chen, C. D. Manning, and A. Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In NIPS. 926--934.Google Scholar
R. Socher, C. C. Lin, A. Y. Ng, and C. D. Manning. 2011. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 129--136.Google ScholarDigital Library
H. Sperr, J. Niehues, and A. Waibel. 2013. Letter n-gram-based input encoding for continuous space language models. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality. 30--39.Google Scholar
J. P. Turian, L.-A. Ratinov, and Y. Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In ACL. 384--394. Google ScholarDigital Library
P. D. Turney. 2013. Distributional semantics beyond words: Supervised learning of analogy and paraphrase. TACL, 353--366.Google Scholar
P. D. Turney and P. Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37, (Jan 2010), 141--188. Google ScholarDigital Library
J. Weston, A. Bordes, O. Yakhnenko, and N. Usunier. 2013. Connecting language and knowledge bases with embedding models for relation extraction. arXiv preprint arXiv:1307.7973.Google Scholar
M. Yu and M. Dredze. 2014. Improving lexical embeddings with semantic knowledge. In Association for Computational Linguistics (ACL). 545--550.Google Scholar

Index Terms

KNET: A General Framework for Learning Word Embedding Using Morphological Knowledge
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Lexical semantics
      2. Phonology / morphology
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Language models

Recommendations

Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding
Abstract
Word Sense Disambiguation (WSD) is a task which aims to identify the meaning of a word given its context. This problem has been investigated and analyzed in depth in English. However, work in Arabic has been limited despite the fact that there are ...
Read More
Multi-prototype Morpheme Embedding for Text Classification
SMA 2020: The 9th International Conference on Smart Media and Applications

Representing a word into a continuous space, also known as a word vector, has been successful in various NLP tasks. The word-based embedding has two problems; one is the out-of-vocabulary problem and the other is does not take into account the context ...
Read More
Composing Word Embeddings for Compound Words Using Linguistic Knowledge
In recent years, the use of distributed representations has been a fundamental technology for natural language processing. However, Japanese has multiple compound words, and often we must compare the meanings of a word and a compound word. Moreover, word ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Information Systems Volume 34, Issue 1
October 2015
172 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/2806674
Editor:
Maarten de Rijke
University of Amsterdam, The Netherlands
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2015
- Revised: 1 June 2015
- Accepted: 1 June 2015
- Received: 1 May 2014
Published in tois Volume 34, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Neural network
morphological knowledge
word embedding
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 496
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

KNET: A General Framework for Learning Word Embedding Using Morphological Knowledge

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding

Multi-prototype Morpheme Embedding for Text Classification

Composing Word Embeddings for Compound Words Using Linguistic Knowledge

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

KNET: A General Framework for Learning Word Embedding Using Morphological Knowledge

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding

Multi-prototype Morpheme Embedding for Text Classification

Composing Word Embeddings for Compound Words Using Linguistic Knowledge

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media