skip to main content
research-article

UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural Networks

Published:26 March 2021Publication History
Skip Abstract Section

Abstract

We present a novel method for neural network quantization. Our method, named UNIQ, emulates a non-uniform k-quantile quantizer and adapts the model to perform well with quantized weights by injecting noise to the weights at training time. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-bit with only a minor accuracy degradation. Our non-uniform quantization approach provides a novel alternative to the existing uniform quantization techniques for neural networks. We further propose a novel complexity metric of number of bit operations performed (BOPs), and we show that this metric has a linear relation with logic utilization and power. We suggest evaluating the trade-off of accuracy vs. complexity (BOPs). The proposed method, when evaluated on ResNet18/34/50 and MobileNet on ImageNet, outperforms the prior state of the art both in the low-complexity regime and the high accuracy regime. We demonstrate the practical applicability of this approach, by implementing our non-uniformly quantized CNN on FPGA.

References

  1. Alexander G. Anderson and Cory P. Berg. 2018. The high-dimensional geometry of binary neural networks. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google ScholarGoogle Scholar
  2. Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. Weight uncertainty in neural network. In Proceedings of the 32nd International Conference on Machine Learning (PMLR’15). 1613--1622.Google ScholarGoogle Scholar
  3. Zhaowei Cai, Xiaodong He, Jian Sun, and Nuno Vasconcelos. 2017. Deep learning with low precision by half-wave Gaussian quantization. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  4. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (April 2018), 834--848. DOI:https://doi.org/10.1109/TPAMI.2017.2699184Google ScholarGoogle ScholarCross RefCross Ref
  5. Y. Chen, T. Krishna, J. S. Emer, and V. Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (Jan. 2017), 127--138. DOI:https://doi.org/10.1109/JSSC.2016.2616357Google ScholarGoogle ScholarCross RefCross Ref
  6. Yinpeng Dong, Renkun Ni, Jianguo Li, Yurong Chen, Jun Zhu, and Hang Su. 2017. Learning accurate low-bit deep neural networks with stochastic quantization. In Proceedings of the British Machine Vision Conference (BMVC’17).Google ScholarGoogle ScholarCross RefCross Ref
  7. Robert M. Gray and David L. Neuhoff. 1998. Quantization. IEEE Transactions on Information Theory 44, 6 (1998), 2325--2383.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 1737--1746.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Song Han, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google ScholarGoogle Scholar
  10. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  11. Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-Rahman Mohamed, Navdeep Jaitly, Andrew Senior, et al. 2012. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29, 6 (Nov. 2012), 82--97. DOI:https://doi.org/10.1109/MSP.2012.2205597Google ScholarGoogle ScholarCross RefCross Ref
  12. Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861Google ScholarGoogle Scholar
  13. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in Neural Information Processing Systems. 4107--4115.Google ScholarGoogle Scholar
  14. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research 18, 1 (2017), 6869--6898.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). 1--12. DOI:https://doi.org/10.1109/MICRO.2016.7783722Google ScholarGoogle ScholarCross RefCross Ref
  16. Alex Krizhevsky. 2009. Learning Multiple Layers of Features From Tiny Images. Master’s Thesis. Department of Computer Science, University of Toronto.Google ScholarGoogle Scholar
  17. Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent convolutional neural networks for text classification. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). 2267--2273.Google ScholarGoogle ScholarCross RefCross Ref
  18. Edward H. Lee, Daisuke Miyashita, Elaina Chai, Boris Murmann, and S. Simon Wong. 2017. LogNet: Energy-efficient neural networks using logarithmic computation. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’17). IEEE, Los Alamitos, CA, 5900--5904.Google ScholarGoogle Scholar
  19. Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 2 (1982), 129--137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Christos Louizos, Karen Ullrich, and Max Welling. 2017. Bayesian compression for deep learning. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Red Hook, NY, 3288--3298.Google ScholarGoogle Scholar
  21. Asit Mishra and Debbie Marr. 2018. Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google ScholarGoogle Scholar
  22. Asit Mishra, Eriko Nurvitadhi, Jeffrey J. Cook, and Debbie Marr. 2018. WRPN: Wide reduced-precision networks. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google ScholarGoogle Scholar
  23. Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. 2017. Variational dropout sparsifies deep neural networks. In Proceedings of the International Conference on Machine Learning.Google ScholarGoogle Scholar
  24. Eunhyeok Park, Sungjoo Yoo, and Peter Vajda. 2018. Value-aware quantization for training and inference of neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’18).Google ScholarGoogle ScholarCross RefCross Ref
  25. Antonio Polino, Razvan Pascanu, and Dan Alistarh. 2018. Model compression via distillation and quantization. In Proceedings of the International Conference on Learning Representations (ICLR’18).Google ScholarGoogle Scholar
  26. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision. 525--542.Google ScholarGoogle ScholarCross RefCross Ref
  27. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211--252. DOI:https://doi.org/10.1007/s11263-015-0816-yGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, and Hadi Esmaeilzadeh. 2018. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA’18). IEEE, Los Alamitos, CA, 764--775. DOI:https://doi.org/10.1109/ISCA.2018.00069Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, and Mickey Aleksic. 2018. A quantization-friendly separable convolution for MobileNets. In Proceedings of the 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2’18). 14--18. DOI:https://doi.org/10.1109/EMC2.2018.00011Google ScholarGoogle ScholarCross RefCross Ref
  30. Karen Ullrich, Edward Meeds, and Max Welling. 2017. Soft weight-sharing for neural network compression. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google ScholarGoogle Scholar
  31. Yuhui Xu, Yongzhuang Wang, Aojun Zhou, Weiyao Lin, and Hongkai Xiong. 2018. Deep neural network compression with single and multiple level quantization. arXiv:1803.03289Google ScholarGoogle Scholar
  32. Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless CNNs with low-precision weights. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google ScholarGoogle Scholar
  33. Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv:1606.06160Google ScholarGoogle Scholar
  34. Shu-Chang Zhou, Yu-Zhi Wang, He Wen, Qin-Yao He, and Yu-Heng Zou. 2017. Balanced quantization: An effective and efficient approach to quantized neural networks. Journal of Computer Science and Technology 32, 4 (2017), 667--682.Google ScholarGoogle ScholarCross RefCross Ref
  35. Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2016. Trained ternary quantization. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google ScholarGoogle Scholar

Index Terms

  1. UNIQ: Uniform Noise Injection for Non-Uniform Quantization of Neural Networks

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Computer Systems
              ACM Transactions on Computer Systems  Volume 37, Issue 1-4
              November 2019
              177 pages
              ISSN:0734-2071
              EISSN:1557-7333
              DOI:10.1145/3446674
              Issue’s Table of Contents

              Copyright © 2021 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 26 March 2021
              • Accepted: 1 December 2020
              • Received: 1 June 2020
              Published in tocs Volume 37, Issue 1-4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format