research-article

UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural Networks

Authors:
Chaim Baskin

Technion, Technion, Haifa

Technion, Technion, Haifa
View Profile

,
Natan Liss

Technion, Technion, Haifa

Technion, Technion, Haifa
View Profile

,
Eli Schwartz

Tel Aviv University, Tel Aviv, Israel

Tel Aviv University, Tel Aviv, Israel
View Profile

,
Evgenii Zheltonozhskii

Technion, Technion, Haifa

Technion, Technion, Haifa

0000-0002-5400-9321
View Profile

,
Raja Giryes

Tel Aviv University, Tel Aviv, Israel

Tel Aviv University, Tel Aviv, Israel
View Profile

,
Alex M. Bronstein

View Profile

,
Avi Mendelson

Technion, Technion, Haifa

Technion, Technion, Haifa
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 37 Issue 1-4Article No.: 4pp 1–15https://doi.org/10.1145/3444943

Published:26 March 2021Publication History

ACM Transactions on Computer Systems

Abstract

We present a novel method for neural network quantization. Our method, named UNIQ, emulates a non-uniform k-quantile quantizer and adapts the model to perform well with quantized weights by injecting noise to the weights at training time. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-bit with only a minor accuracy degradation. Our non-uniform quantization approach provides a novel alternative to the existing uniform quantization techniques for neural networks. We further propose a novel complexity metric of number of bit operations performed (BOPs), and we show that this metric has a linear relation with logic utilization and power. We suggest evaluating the trade-off of accuracy vs. complexity (BOPs). The proposed method, when evaluated on ResNet18/34/50 and MobileNet on ImageNet, outperforms the prior state of the art both in the low-complexity regime and the high accuracy regime. We demonstrate the practical applicability of this approach, by implementing our non-uniformly quantized CNN on FPGA.

References

Alexander G. Anderson and Cory P. Berg. 2018. The high-dimensional geometry of binary neural networks. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google Scholar
Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. Weight uncertainty in neural network. In Proceedings of the 32nd International Conference on Machine Learning (PMLR’15). 1613--1622.Google Scholar
Zhaowei Cai, Xiaodong He, Jian Sun, and Nuno Vasconcelos. 2017. Deep learning with low precision by half-wave Gaussian quantization. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (April 2018), 834--848. DOI:https://doi.org/10.1109/TPAMI.2017.2699184Google ScholarCross Ref
Y. Chen, T. Krishna, J. S. Emer, and V. Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (Jan. 2017), 127--138. DOI:https://doi.org/10.1109/JSSC.2016.2616357Google ScholarCross Ref
Yinpeng Dong, Renkun Ni, Jianguo Li, Yurong Chen, Jun Zhu, and Hang Su. 2017. Learning accurate low-bit deep neural networks with stochastic quantization. In Proceedings of the British Machine Vision Conference (BMVC’17).Google ScholarCross Ref
Robert M. Gray and David L. Neuhoff. 1998. Quantization. IEEE Transactions on Information Theory 44, 6 (1998), 2325--2383.Google ScholarDigital Library
Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 1737--1746.Google ScholarDigital Library
Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-Rahman Mohamed, Navdeep Jaitly, Andrew Senior, et al. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (Nov. 2012), 82--97. DOI:https://doi.org/10.1109/MSP.2012.2205597Google ScholarCross Ref
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861Google Scholar
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in Neural Information Processing Systems. 4107--4115.Google Scholar
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research 18, 1 (2017), 6869--6898.Google ScholarDigital Library
P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 1--12. DOI:https://doi.org/10.1109/MICRO.2016.7783722Google ScholarCross Ref
Alex Krizhevsky. 2009. Learning Multiple Layers of Features From Tiny Images. Master’s Thesis. Department of Computer Science, University of Toronto.Google Scholar
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). 2267--2273.Google ScholarCross Ref
Edward H. Lee, Daisuke Miyashita, Elaina Chai, Boris Murmann, and S. Simon Wong. 2017. LogNet: Energy-efficient neural networks using logarithmic computation. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’17). IEEE, Los Alamitos, CA, 5900--5904.Google Scholar
Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 2 (1982), 129--137.Google ScholarDigital Library
Christos Louizos, Karen Ullrich, and Max Welling. 2017. Bayesian compression for deep learning. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Red Hook, NY, 3288--3298.Google Scholar
Asit Mishra and Debbie Marr. 2018. Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google Scholar
Asit Mishra, Eriko Nurvitadhi, Jeffrey J. Cook, and Debbie Marr. 2018. WRPN: Wide reduced-precision networks. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google Scholar
Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. 2017. Variational dropout sparsifies deep neural networks. In Proceedings of the International Conference on Machine Learning.Google Scholar
Eunhyeok Park, Sungjoo Yoo, and Peter Vajda. 2018. Value-aware quantization for training and inference of neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarCross Ref
Antonio Polino, Razvan Pascanu, and Dan Alistarh. 2018. Model compression via distillation and quantization. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google Scholar
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 525--542.Google ScholarCross Ref
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. DOI:https://doi.org/10.1007/s11263-015-0816-yGoogle ScholarDigital Library
Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, and Hadi Esmaeilzadeh. 2018. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA’18). IEEE, Los Alamitos, CA, 764--775. DOI:https://doi.org/10.1109/ISCA.2018.00069Google ScholarDigital Library
Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, and Mickey Aleksic. 2018. A quantization-friendly separable convolution for MobileNets. In Proceedings of the 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2’18). 14--18. DOI:https://doi.org/10.1109/EMC2.2018.00011Google ScholarCross Ref
Karen Ullrich, Edward Meeds, and Max Welling. 2017. Soft weight-sharing for neural network compression. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google Scholar
Yuhui Xu, Yongzhuang Wang, Aojun Zhou, Weiyao Lin, and Hongkai Xiong. 2018. Deep neural network compression with single and multiple level quantization. arXiv:1803.03289Google Scholar
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless CNNs with low-precision weights. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google Scholar
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160Google Scholar
Shu-Chang Zhou, Yu-Zhi Wang, He Wen, Qin-Yao He, and Yu-Heng Zou. 2017. Balanced quantization: An effective and efficient approach to quantized neural networks. Journal of Computer Science and Technology 32, 4 (2017), 667--682.Google ScholarCross Ref
Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2016. Trained ternary quantization. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google Scholar

Index Terms

Recommendations

A greedy algorithm for quantizing neural networks

We propose a new computationally efficient method for quantizing the weights of pretrained neural networks that is general enough to handle both multi-layer perceptrons and convolutional neural networks. Our method deterministically quantizes layers in an ...
Read More
Efficient bit-rate scalability for weighted squared error optimization in audio coding

We propose two quantization techniques for improving the bit-rate scalability of compression systems that optimize a weighted squared error (WSE) distortion metric. We show that quantization of the base-layer reconstruction error using entropy-coded ...
Read More
Adaptive quantization with balanced distortion distribution and its application to H.264 intra coding
ICIP'09: Proceedings of the 16th IEEE international conference on Image processing

Quantization in H.264 is achieved in the DCT domain using scalar quantizers, which assume a sum distortion constraint and often produce considerably larger distortions on block boundaries than inside a block in the pixel domain. This biased distortion ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computer Systems Volume 37, Issue 1-4
November 2019
177 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/3446674
Editor:
Michael Swift
University of Wisconsin, USA
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 March 2021
- Accepted: 1 December 2020
- Received: 1 June 2020
Published in tocs Volume 37, Issue 1-4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep learning
efficient deep learning
neural networks
quantization
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 43
  Total Citations
  View Citations
- 745
  Total Downloads
- Downloads (Last 12 months)167
- Downloads (Last 6 weeks)23
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural Networks

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

A greedy algorithm for quantizing neural networks

Efficient bit-rate scalability for weighted squared error optimization in audio coding

Adaptive quantization with balanced distortion distribution and its application to H.264 intra coding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural Networks

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

A greedy algorithm for quantizing neural networks

Efficient bit-rate scalability for weighted squared error optimization in audio coding

Adaptive quantization with balanced distortion distribution and its application to H.264 intra coding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media