Relation Reconstructive Binarization of word embeddings

Pan, Feiyang; Li, Shuokai; Ao, Xiang; He, Qing

doi:10.1007/s11704-021-0108-3

Relation Reconstructive Binarization of word embeddings

Research Article
Published: 25 September 2021

Volume 16, article number 162307, (2022)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Feiyang Pan^1,2,
Shuokai Li^1,2,
Xiang Ao^1,2 &
…
Qing He^1,2

49 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Word-embedding acts as one of the backbones of modern natural language processing (NLP). Recently, with the need for deploying NLP models to low-resource devices, there has been a surge of interest to compress word embeddings into hash codes or binary vectors so as to save the storage and memory consumption. Typically, existing work learns to encode an embedding into a compressed representation from which the original embedding can be reconstructed. Although these methods aim to preserve most information of every individual word, they often fail to retain the relation between words, thus can yield large loss on certain tasks. To this end, this paper presents Relation Reconstructive Binarization (R2B) to transform word embeddings into binary codes that can preserve the relation between words. At its heart, R2B trains an auto-encoder to generate binary codes that allow reconstructing the word-by-word relations in the original embedding space. Experiments showed that our method achieved significant improvements over previous methods on a number of tasks along with a space-saving of up to 98.4%. Specifically, our method reached even better results on word similarity evaluation than the uncompressed pre-trained embeddings, and was significantly better than previous compression methods that do not consider word relations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pre-trained models for natural language processing: A survey

Article 15 September 2020

XiPeng Qiu, TianXiang Sun, … XuanJing Huang

Deep Convolutional Neural Network Compression Method: Tensor Ring Decomposition with Variational Bayesian Approach

Article Open access 13 March 2024

Weirong Liu, Min Zhang, … Jie Liu

Attention-based graph neural networks: a survey

Article 21 August 2023

Chengcheng Sun, Chenhao Li, … Zhixiao Wang

References

Lane N D, Bhattacharya S, Mathur A, Georgiev P, Forlivesi C, Kawsar F. Squeezing deep learning into mobile and embedded devices. IEEE Pervasive Computing, 2017, 16(3): 82–88
Article Google Scholar
Datar M, Immorlica N, Indyk P, Mirrokni V S. Locality-sensitive hashing scheme basedon p-stable distributions. In: Proceedings of the twentieth Annual Symposium on Computational Geometry. 2004, 253–262.
Salakhutdinov R, Hinton G. Semantic hashing. International Journal of Approximate Reasoning, 2009, 50(7): 969–978
Article Google Scholar
Kulis B, Darrell T. Learning to hash with binary reconstructive embeddings. In: Proceedings of Advances in Neural Information Processing Systems. 2009, 1042–1050.
Shu R, Nakayama H. Compressing wordembeddings via deep compositional code learning. In: Proceedings of International Conference on Learning Representations. 2018.
Tissier J, Gravier C, Habrard A. Near-lossless binarization of word embeddings. In: Proceedings of 33rd AAAI Conference on Artificial Intelligence. 2019, 7104–7111.
Maddison C J, Mnih A, Teh Y W. The concrete distribution: a continuous relaxation of discrete random variables. In: Proceedings of International Conference on Learning Representations. 2017
Jang E, Gu S, Poole B. Categorical reparameterization with gumbel-softmax. In: Proceedings of International Conference on Learning Representations. 2017
Chen T, Min M R, Sun Y. Learning k-way d-dimensional discrete codes for compact embedding representations. In: Proceedings of International Conference on Machine Learning. 2018, 854–863
Park W, Kim D, Lu Y, Cho M. Relational knowledge distillation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2019, 3967–3976
Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1532–1543.
Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 2017, 5: 135–146
Article Google Scholar
Tissier J, Gravier C, Habrard A. Dict2vec: learning word embeddings using lexical dictionaries. In: Proceedings of the 2017 Conferenceon Empirical Methods in Natural Language Processing. 2017, 254–263
Bruni E, Tran N-K, Baroni M. Multimodal distributional semantics. Journal of Artificial Intelligence Research, 2014, 49: 1–47
Article MathSciNet MATH Google Scholar
Luong T, Socher R, Manning C. Better word representations with recursive neuralnetworks for morphology. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning. 2013, 104–113
Hill F, Reichart R, Korhonen A. Simlex-999: evaluating semantic models with (genuine) similarity estimation. Computational Linguistics, 2015, 41(4): 665–695
Article MathSciNet Google Scholar
Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E. Placing search in context: the concept revisited. ACM Transactions on Information Systems, 2002, 20(1): 116–131
Article Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J. Distributed representations of wordsand phrases and their compositionality. In: Proceedings of Advances in Neural Information Processing Systems. 2013, 3111–3119
Mikolov T, Yih W-T, Zweig G. Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Com-putational Linguistics: Human Language Technologies. 2013, 746–751
Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. In: Proceedings of Advances in Neural Information Processing Systems. 2015, 649–657
Chen Q, Zhu X, Ling Z-H, Wei S, Jiang H, Inkpen D. Enhanced lstm for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, 1657–1668
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In: Proceedings of Advances in Neural Information Processing Systems. 2017, 5998–6008
Klein G, Kim Y, Deng Y, Senellart J, Rush J M. Opennmt: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017: System Demonstrations. 2017, 67–72
Yin Z, Shen Y. On the dimensionality of word embedding. In: Proceedings of Advances in Neural Information Processing Systems. 2018, 887–898

Download references

Acknowledgements

The reseach work was supported by the National Key Research and Development Program of China (2017YFB1002104) and the National Natural Science Foundation of China (Grant Nos. 92046003, 61976204, U1811461). Xiang Ao was also supported by the Project of Youth Innovation Promotion Association CAS and Beijing Nova Program (Z201100006820062).

Author information

Authors and Affiliations

Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China
Feiyang Pan, Shuokai Li, Xiang Ao & Qing He
University of Chinese Academy of Sciences, Beijing, 100049, China
Feiyang Pan, Shuokai Li, Xiang Ao & Qing He

Authors

Feiyang Pan
View author publications
You can also search for this author in PubMed Google Scholar
Shuokai Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Ao
View author publications
You can also search for this author in PubMed Google Scholar
Qing He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Ao.

Additional information

Feiyang PAN is a PhD candidate of Institute of Computing Technology, Chinese Academy of Sciences, and he is also a student at the University of Chinese Academy of Sciences (UCAS), China. He received his BS degree in Statistics and a dual degree of Computer Science from University of Science and Technology of China (USTC). His research focuses on machine learning and applications.

Shuokai LI is a PhD student of Institute of Computing Technology, Chinese Academy of Sciences, and he is also a student at the University of Chinese Academy of Sciences (UCAS), China. He received his BS degree in Mathematics from University of Science and Technology of China (USTC). His research interests include text mining and machine learning.

Xiang AO is an Associate Professor of Institute of Computing Technology, Chinese Academy of Sciences. He received his Ph.D. degree in Computer Science from Institute of Computing Technology, Chinese Academy of Sciences in 2015, and B.S. degree in Computer Science from Zhejiang University, China in 2010. His research interests include text and behavioral data mining for financial and business applications.

Qing HE is a Professor as well as a doctoral tutor in the Institute of Computing Technology, Chinese Academy of Science (CAS), and he is a Professor at the University of Chinese Academy of Sciences (UCAS), China. He received the B.S degree from Hebei Normal University, China in 1985, and the M.S. degree from Zhengzhou University, China in 1987, both in mathematics. He received the Ph.D. degree in 2000 from Beijing Normal University, China in fuzzy mathematics and artificial intelligence. His interests include data mining, machine learning, classification, fuzzy clustering.

Electronic supplementary material