skip to main content
10.1145/3297858.3304076acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers

Authors Info & Claims
Published:04 April 2019Publication History

ABSTRACT

Model compression is an important technique to facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), a number of prior works are dedicated to model compression techniques. The target is to simultaneously reduce the model storage size and accelerate the computation, with minor effect on accuracy. Two important categories of DNN model compression techniques are weight pruning and weight quantization. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. These two sources of redundancy can be combined, thereby leading to a higher degree of DNN model compression. However, a systematic framework of joint weight pruning and quantization of DNNs is lacking, thereby limiting the available model compression ratio. Moreover, the computation reduction, energy efficiency improvement, and hardware performance overhead need to be accounted besides simply model size reduction, and the hardware performance overhead resulted from weight pruning method needs to be taken into consideration. To address these limitations, we present ADMM-NN, the first algorithm-hardware co-optimization framework of DNNs using Alternating Direction Method of Multipliers (ADMM), a powerful technique to solve non-convex optimization problems with possibly combinatorial constraints. The first part of ADMM-NN is a systematic, joint framework of DNN weight pruning and quantization using ADMM. It can be understood as a smart regularization technique with regularization target dynamically updated in each ADMM iteration, thereby resulting in higher performance in model compression than the state-of-the-art. The second part is hardware-aware DNN optimizations to facilitate hardware-level implementations. We perform ADMM-based weight pruning and quantization considering (i) the computation reduction and energy efficiency improvement, and (ii) the hardware performance overhead due to irregular sparsity. The first requirement prioritizes the convolutional layer compression over fully-connected layers, while the latter requires a concept of the break-even pruning ratio, defined as the minimum pruning ratio of a specific layer that results in no hardware performance degradation. Without accuracy loss, ADMM-NN achieves 85× and 24× pruning on LeNet-5 and AlexNet models, respectively, --- significantly higher than the state-of-the-art. The improvements become more significant when focusing on computation reduction. Combining weight pruning and quantization, we achieve 1,910× and 231× reductions in overall model size on these two benchmarks, when focusing on data storage. Highly promising results are also observed on other representative DNNs such as VGGNet and ResNet-50. We release codes and models at https://github.com/yeshaokai/admm-nn.

References

  1. http://www.techradar.com/news/computing-components/processors/google-s-tensor-processing-unit-explained-this-is-what-the-future-of-computing-looks-like-1326915.Google ScholarGoogle Scholar
  2. https://www.sdxcentral.com/articles/news/intels-deep-learning-chips-will-arrive-2017/2016/11/.Google ScholarGoogle Scholar
  3. Aghasi, A., Abdi, A., Nguyen, N., and Romberg, J.Nettrim: Convexpruning of deep neural networks with performance guarantee. In Advances in Neural Information Processing Systems(2017), pp. 3177--3186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bang, S., Wang, J., Li, Z., Gao, C., Kim, Y., Dong, Q., Chen, Y.-P.,Fick, L., Sun, X., Dreslinski, R., et al.14.7 a 288?w programmable deep-learning processor with 270kb on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International(2017), IEEE, pp. 250--251.Google ScholarGoogle ScholarCross RefCross Ref
  5. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3, 1 (2011), 1--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., and Temam, O. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49(2014), 269--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N., et al. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture(2014), IEEE Computer Society, pp. 609--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chen, Y.-H., Krishna, T., Emer, J. S., and Sze, V. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127--138.Google ScholarGoogle ScholarCross RefCross Ref
  9. Courbariaux, M., Bengio, Y., and David, J.-P. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems(2015), pp. 3123--3131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dai, X., Yin, H., and Jha, N. K.Nest: a neural network synthesis toolbased on a grow-and-prune paradigm. arXiv preprint arXiv:1711.02017(2017).Google ScholarGoogle Scholar
  11. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(2009), pp. 248--255.Google ScholarGoogle Scholar
  12. Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R.Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural information processing systems(2014), pp. 1269--1277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Desoli, G., Chawla, N., Boesch, T., Singh, S.-p., Guidetti, E., De Ambroggi, F., Majo, T., Zambotti, P., Ayodhyawasi, M., Singh, H., et al. 14.1 a 2.9 tops/w deep convolutional neural network soc in fd-soi 28nmfor intelligent embedded systems. In Solid-State Circuits Conference(ISSCC), 2017 IEEE International(2017), IEEE, pp. 238--239.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ding, C., Liao, S., Wang, Y., Li, Z., Liu, N., Zhuo, Y., Wang, C., Qian,X., Bai, Y., Yuan, G., et al.C ir cnn: accelerating and compressing deep neural networks using block-circulant weight matrices. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture(2017), ACM, pp. 395--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dong, X., Chen, S., and Pan, S. Learning to prune deep neural networks via layer-wise optimal brain surgeon. In Advances in Neural Information Processing Systems(2017), pp. 4857--4867. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X.,Chen, Y., and Temam, O. Shidiannao: Shifting vision processing closer to the sensor. In Computer Architecture (ISCA), 2015 ACM/IEEE 42ndAnnual International Symposium on(2015), IEEE, pp. 92--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y. Deep learning, vol. 1. MIT press Cambridge, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Guo, K., Han, S., Yao, S., Wang, Y., Xie, Y., and Yang, H.Software-hardware code sign for efficient neural network acceleration. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture(2017), IEEE Computer Society, pp. 18--25.Google ScholarGoogle Scholar
  19. Guo, Y., Yao, A., and Chen, Y. Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems(2016), pp. 1379--1387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Han, S., Kang, J., Mao, H., Hu, Y., Li, X., Li, Y., Xie, D., Luo, H., Yao, S.,Wang, Y., et al. Ese: Efficient speech recognition engine with sparselstm on fpga. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays(2017), ACM, pp. 75--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., and Dally, W. J.Eie: efficient inference engine on compressed deep neural network. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on(2016), IEEE, pp. 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Han, S., Mao, H., and Dally, W. J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In International Conference on Learning Representations (ICLR)(2016).Google ScholarGoogle Scholar
  23. Han, S., Pool, J., Narang, S., Mao, H., Gong, E., Tang, S., Elsen,E., Vajda, P., Paluri, M., Tran, J., et al. Dsd: Dense-sparse-dense training for deep neural networks. In International Conference on Learning Representations (ICLR)(2017).Google ScholarGoogle Scholar
  24. Han, S., Pool, J., Tran, J., and Dally, W.Learning both weights and connections for efficient neural network. In Advances in neural information processing systems(2015), pp. 1135--1143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. He, K., Zhang, X., Ren, S., and Sun, J.Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition(2016), pp. 770--778.Google ScholarGoogle Scholar
  26. He, Y., Zhang, X., and Sun, J.Channel pruning for accelerating very deep neural networks. In Computer Vision (ICCV), 2017 IEEE International Conference on(2017), IEEE, pp. 1398--1406.Google ScholarGoogle Scholar
  27. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., and Bengio, Y. Binarized neural networks. In Advances in neural information processing systems(2016), pp. 4107--4115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Judd, P., Albericio, J., Hetherington, T., Aamodt, T. M., and Moshovos, A.Stripes: Bit-serial deep neural network computing. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture(2016), IEEE Computer Society, pp. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kingma, D., and Ba, L. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR)(2016).Google ScholarGoogle Scholar
  30. Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems(2012), pp. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Kwon, H., Samajdar, A., and Krishna, T. Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems(2018), ACM, pp. 461--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. LeCun, Y., et al. Lenet-5, convolutional neural networks. URL: http://yann.lecun.com/exdb/lenet(2015), 20.Google ScholarGoogle Scholar
  33. Leng, C., Li, H., Zhu, S., and Jin, R. Extremely low bit neural network: Squeeze the last bit out with admm. arXiv preprint arXiv:1707.09870(2017).Google ScholarGoogle Scholar
  34. Lin, D., Talathi, S., and Annapureddy, S.Fixed point quantization of deep convolutional networks. In International Conference on Machine Learning(2016), pp. 2849--2858. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mahajan, D., Park, J., Amaro, E., Sharma, H., Yazdanbakhsh, A.,Kim, J. K., and Esmaeilzadeh, H.Tabla: A unified template-based framework for accelerating statistical machine learning. In High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on(2016), IEEE, pp. 14--26.Google ScholarGoogle Scholar
  36. Mao, H., Han, S., Pool, J., Li, W., Liu, X., Wang, Y., and Dally, W. J. Exploring the regularity of sparse structure in convolutional neural networks. arXiv preprint arXiv:1705.08922(2017).Google ScholarGoogle Scholar
  37. Moons, B., Uytterhoeven, R., Dehaene, W., and Verhelst, M.14.5 envision: A 0.26-to-10 tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm fdsoi. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International(2017), IEEE, pp. 246--247.Google ScholarGoogle ScholarCross RefCross Ref
  38. Ouyang, H., He, N., Tran, L., and Gray, A.Stochastic alternating direction method of multipliers. In International Conference on Machine Learning(2013), pp. 80--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan,R., Khailany, B., Emer, J., Keckler, S. W., and Dally, W. J. Scnn: An accelerator for compressed-sparse convolutional neural networks. In ACM SIGARCH Computer Architecture News(2017), vol. 45, ACM, pp. 27--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Park, E., Ahn, J., and Yoo, S. Weighted-entropy-based quantization for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(2017), pp. 7197--7205.Google ScholarGoogle Scholar
  41. Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., et al. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays(2016), ACM, pp. 26--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Rastegari, M., Ordonez, V., Redmon, J., and Farhadi, A. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision(2016), Springer, pp. 525--542.Google ScholarGoogle Scholar
  43. Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S. K.,Hernández-Lobato, J. M., Wei, G.-Y., and Brooks, D. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on(2016), IEEE, pp. 267--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sharma, H., Park, J., Mahajan, D., Amaro, E., Kim, J. K., Shao, C., Mishra, A., and Esmaeilzadeh, H. From high-level deep neural models to fpgas. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture(2016), IEEE Computer Society, pp. 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sim, J., Park, J.-S., Kim, M., Bae, D., Choi, Y., and Kim, L.-S. 14.6 a 1.42 tops/w deep convolutional neural network recognition processor for intelligent ioe systems. In Solid-State Circuits Conference (ISSCC), 2016 IEEE International(2016), IEEE, pp. 264--265.Google ScholarGoogle ScholarCross RefCross Ref
  46. Simonyan, K., and Zisserman, A.Very deep convolutional networksfor large-scale image recognition. arXiv preprint arXiv: 1409.1556(2014).Google ScholarGoogle Scholar
  47. Simonyan, K., and Zisserman, A. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR)(2015).Google ScholarGoogle Scholar
  48. Song, M., Zhong, K., Zhang, J., Hu, Y., Liu, D., Zhang, W., Wang, J., and Li, T.In-situ ai: Towards autonomous and incremental deep learning for iot systems. In High Performance Computer Architecture(HPCA), 2018 IEEE International Symposium on(2018), IEEE, pp. 92--103.Google ScholarGoogle Scholar
  49. Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrud-hula, S., Seo, J.-s., and Cao, Y. Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays(2016), ACM, pp. 16--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Suzuki, T. Dual averaging and proximal gradient descent for online alternating direction multiplier method. In International Conference on Machine Learning(2013), pp. 392--400. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Umuroglu, Y., Fraser, N. J., Gambardella, G., Blott, M., Leong, P., Jahre, M., and Vissers, K. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays(2017), ACM, pp. 65--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Venkataramani, S., Ranjan, A., Banerjee, S., Das, D., Avancha, S., Jagannathan, A., Durg, A., Nagaraj, D., Kaul, B., Dubey, P., et al. Scaledeep: A scalable compute architecture for learning and evaluating deep networks. In Computer Architecture (ISCA), 2017 ACM/IEEE 44th Annual International Symposium on(2017), IEEE, pp. 13--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Wen, W., Wu, C., Wang, Y., Chen, Y., and Li, H. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems(2016), pp. 2074--2082. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Whatmough, P. N., Lee, S. K., Lee, H., Rama, S., Brooks, D., and Wei, G.-Y.14.3 a 28nm soc with a 1.2 ghz 568nj/prediction sparse deep-neural-network engine with> 0.1 timing error rate tolerance for iot applications. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International(2017), IEEE, pp. 242--243.Google ScholarGoogle ScholarCross RefCross Ref
  55. Wu, J., Leng, C., Wang, Y., Hu, Q., and Cheng, J. Quantized con-volutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(2016), pp. 4820--4828.Google ScholarGoogle Scholar
  56. Yang, T.-J., Chen, Y.-H., and Sze, V. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition(2017), pp. 6071--6079.Google ScholarGoogle Scholar
  57. Ye, S., Zhang, T., Zhang, K., Li, J., Xie, J., Liang, Y., Liu, S., Lin, X.,and Wang, Y.A unified framework of dnn weight pruning and weight clustering/quantization using admm. arXiv preprint arXiv:1811.01907(2018).Google ScholarGoogle Scholar
  58. Yu, J., Lukefahr, A., Palframan, D., Dasika, G., Das, R., and Mahlke, S. Scalpel: Customizing dnn pruning to the underlying hardware parallelism. In Computer Architecture (ISCA), 2017 ACM/IEEE 44th Annual International Symposium on(2017), IEEE, pp. 548--560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Yu, X., Liu, T., Wang, X., and Tao, D. On compressing deep modelsby low rank and sparse decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(2017), pp. 7370--7379.Google ScholarGoogle Scholar
  60. Yuan, Z., Yue, J., Yang, H., Wang, Z., Li, J., Yang, Y., Guo, Q., Li, X., Chang, M.-F., Yang, H., et al. Sticker: A 0.41--62.1 tops/w 8bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers. In 2018 IEEE Symposium on VLSI Circuits(2018), IEEE, pp. 33--34.Google ScholarGoogle Scholar
  61. Zhang, C., Fang, Z., Zhou, P., Pan, P., and Cong, J. Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. In Proceedings of the 35th International Conferenceon Computer-Aided Design(2016), ACM, p. 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Zhang, C., Wu, D., Sun, J., Sun, G., Luo, G., and Cong, J. Energy-efficient cnn implementation on a deeply pipelined fpga cluster. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design(2016), ACM, pp. 326--331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Zhang, D., Wang, H., Figueiredo, M., and Balzano, L. Learningto share: Simultaneous parameter tying and sparsification in deeplearning.Google ScholarGoogle Scholar
  64. Zhang, T., Ye, S., Zhang, K., Tang, J., Wen, W., Fardad, M., and Wang, Y.A systematic dnn weight pruning framework using alternating direction method of multipliers. arXiv preprint arXiv:1804.03294(2018).Google ScholarGoogle Scholar
  65. Zhao, R., Song, W., Zhang, W., Xing, T., Lin, J.-H., Srivastava, M.,Gupta, R., and Zhang, Z. Accelerating binarized convolutional neural networks with software-programmable fpgas. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays(2017), ACM, pp. 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Zhou, A., Yao, A., Guo, Y., Xu, L., and Chen, Y. Incremental network quantization: Towards lossless cnns with low-precision weights. In International Conference on Learning Representations (ICLR)(2017).Google ScholarGoogle Scholar
  67. Zhu, C., Han, S., Mao, H., and Dally, W. J. Trained ternary quantization. In International Conference on Learning Representations (ICLR) (2017).Google ScholarGoogle Scholar

Index Terms

  1. ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems
          April 2019
          1126 pages
          ISBN:9781450362405
          DOI:10.1145/3297858

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 April 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          ASPLOS '19 Paper Acceptance Rate74of351submissions,21%Overall Acceptance Rate535of2,713submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader