ABSTRACT
Model compression is an important technique to facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), a number of prior works are dedicated to model compression techniques. The target is to simultaneously reduce the model storage size and accelerate the computation, with minor effect on accuracy. Two important categories of DNN model compression techniques are weight pruning and weight quantization. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. These two sources of redundancy can be combined, thereby leading to a higher degree of DNN model compression. However, a systematic framework of joint weight pruning and quantization of DNNs is lacking, thereby limiting the available model compression ratio. Moreover, the computation reduction, energy efficiency improvement, and hardware performance overhead need to be accounted besides simply model size reduction, and the hardware performance overhead resulted from weight pruning method needs to be taken into consideration. To address these limitations, we present ADMM-NN, the first algorithm-hardware co-optimization framework of DNNs using Alternating Direction Method of Multipliers (ADMM), a powerful technique to solve non-convex optimization problems with possibly combinatorial constraints. The first part of ADMM-NN is a systematic, joint framework of DNN weight pruning and quantization using ADMM. It can be understood as a smart regularization technique with regularization target dynamically updated in each ADMM iteration, thereby resulting in higher performance in model compression than the state-of-the-art. The second part is hardware-aware DNN optimizations to facilitate hardware-level implementations. We perform ADMM-based weight pruning and quantization considering (i) the computation reduction and energy efficiency improvement, and (ii) the hardware performance overhead due to irregular sparsity. The first requirement prioritizes the convolutional layer compression over fully-connected layers, while the latter requires a concept of the break-even pruning ratio, defined as the minimum pruning ratio of a specific layer that results in no hardware performance degradation. Without accuracy loss, ADMM-NN achieves 85× and 24× pruning on LeNet-5 and AlexNet models, respectively, --- significantly higher than the state-of-the-art. The improvements become more significant when focusing on computation reduction. Combining weight pruning and quantization, we achieve 1,910× and 231× reductions in overall model size on these two benchmarks, when focusing on data storage. Highly promising results are also observed on other representative DNNs such as VGGNet and ResNet-50. We release codes and models at https://github.com/yeshaokai/admm-nn.
- http://www.techradar.com/news/computing-components/processors/google-s-tensor-processing-unit-explained-this-is-what-the-future-of-computing-looks-like-1326915.Google Scholar
- https://www.sdxcentral.com/articles/news/intels-deep-learning-chips-will-arrive-2017/2016/11/.Google Scholar
- Aghasi, A., Abdi, A., Nguyen, N., and Romberg, J.Nettrim: Convexpruning of deep neural networks with performance guarantee. In Advances in Neural Information Processing Systems(2017), pp. 3177--3186. Google ScholarDigital Library
- Bang, S., Wang, J., Li, Z., Gao, C., Kim, Y., Dong, Q., Chen, Y.-P.,Fick, L., Sun, X., Dreslinski, R., et al.14.7 a 288?w programmable deep-learning processor with 270kb on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International(2017), IEEE, pp. 250--251.Google ScholarCross Ref
- Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3, 1 (2011), 1--122. Google ScholarDigital Library
- Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., and Temam, O. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49(2014), 269--284. Google ScholarDigital Library
- Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N., et al. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture(2014), IEEE Computer Society, pp. 609--622. Google ScholarDigital Library
- Chen, Y.-H., Krishna, T., Emer, J. S., and Sze, V. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127--138.Google ScholarCross Ref
- Courbariaux, M., Bengio, Y., and David, J.-P. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in neural information processing systems(2015), pp. 3123--3131. Google ScholarDigital Library
- Dai, X., Yin, H., and Jha, N. K.Nest: a neural network synthesis toolbased on a grow-and-prune paradigm. arXiv preprint arXiv:1711.02017(2017).Google Scholar
- Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(2009), pp. 248--255.Google Scholar
- Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R.Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural information processing systems(2014), pp. 1269--1277. Google ScholarDigital Library
- Desoli, G., Chawla, N., Boesch, T., Singh, S.-p., Guidetti, E., De Ambroggi, F., Majo, T., Zambotti, P., Ayodhyawasi, M., Singh, H., et al. 14.1 a 2.9 tops/w deep convolutional neural network soc in fd-soi 28nmfor intelligent embedded systems. In Solid-State Circuits Conference(ISSCC), 2017 IEEE International(2017), IEEE, pp. 238--239.Google ScholarCross Ref
- Ding, C., Liao, S., Wang, Y., Li, Z., Liu, N., Zhuo, Y., Wang, C., Qian,X., Bai, Y., Yuan, G., et al.C ir cnn: accelerating and compressing deep neural networks using block-circulant weight matrices. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture(2017), ACM, pp. 395--408. Google ScholarDigital Library
- Dong, X., Chen, S., and Pan, S. Learning to prune deep neural networks via layer-wise optimal brain surgeon. In Advances in Neural Information Processing Systems(2017), pp. 4857--4867. Google ScholarDigital Library
- Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X.,Chen, Y., and Temam, O. Shidiannao: Shifting vision processing closer to the sensor. In Computer Architecture (ISCA), 2015 ACM/IEEE 42ndAnnual International Symposium on(2015), IEEE, pp. 92--104. Google ScholarDigital Library
- Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y. Deep learning, vol. 1. MIT press Cambridge, 2016. Google ScholarDigital Library
- Guo, K., Han, S., Yao, S., Wang, Y., Xie, Y., and Yang, H.Software-hardware code sign for efficient neural network acceleration. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture(2017), IEEE Computer Society, pp. 18--25.Google Scholar
- Guo, Y., Yao, A., and Chen, Y. Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems(2016), pp. 1379--1387. Google ScholarDigital Library
- Han, S., Kang, J., Mao, H., Hu, Y., Li, X., Li, Y., Xie, D., Luo, H., Yao, S.,Wang, Y., et al. Ese: Efficient speech recognition engine with sparselstm on fpga. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays(2017), ACM, pp. 75--84. Google ScholarDigital Library
- Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M. A., and Dally, W. J.Eie: efficient inference engine on compressed deep neural network. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on(2016), IEEE, pp. 243--254. Google ScholarDigital Library
- Han, S., Mao, H., and Dally, W. J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In International Conference on Learning Representations (ICLR)(2016).Google Scholar
- Han, S., Pool, J., Narang, S., Mao, H., Gong, E., Tang, S., Elsen,E., Vajda, P., Paluri, M., Tran, J., et al. Dsd: Dense-sparse-dense training for deep neural networks. In International Conference on Learning Representations (ICLR)(2017).Google Scholar
- Han, S., Pool, J., Tran, J., and Dally, W.Learning both weights and connections for efficient neural network. In Advances in neural information processing systems(2015), pp. 1135--1143. Google ScholarDigital Library
- He, K., Zhang, X., Ren, S., and Sun, J.Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition(2016), pp. 770--778.Google Scholar
- He, Y., Zhang, X., and Sun, J.Channel pruning for accelerating very deep neural networks. In Computer Vision (ICCV), 2017 IEEE International Conference on(2017), IEEE, pp. 1398--1406.Google Scholar
- Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., and Bengio, Y. Binarized neural networks. In Advances in neural information processing systems(2016), pp. 4107--4115. Google ScholarDigital Library
- Judd, P., Albericio, J., Hetherington, T., Aamodt, T. M., and Moshovos, A.Stripes: Bit-serial deep neural network computing. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture(2016), IEEE Computer Society, pp. 1--12. Google ScholarDigital Library
- Kingma, D., and Ba, L. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR)(2016).Google Scholar
- Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems(2012), pp. 1097--1105. Google ScholarDigital Library
- Kwon, H., Samajdar, A., and Krishna, T. Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable interconnects. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems(2018), ACM, pp. 461--475. Google ScholarDigital Library
- LeCun, Y., et al. Lenet-5, convolutional neural networks. URL: http://yann.lecun.com/exdb/lenet(2015), 20.Google Scholar
- Leng, C., Li, H., Zhu, S., and Jin, R. Extremely low bit neural network: Squeeze the last bit out with admm. arXiv preprint arXiv:1707.09870(2017).Google Scholar
- Lin, D., Talathi, S., and Annapureddy, S.Fixed point quantization of deep convolutional networks. In International Conference on Machine Learning(2016), pp. 2849--2858. Google ScholarDigital Library
- Mahajan, D., Park, J., Amaro, E., Sharma, H., Yazdanbakhsh, A.,Kim, J. K., and Esmaeilzadeh, H.Tabla: A unified template-based framework for accelerating statistical machine learning. In High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on(2016), IEEE, pp. 14--26.Google Scholar
- Mao, H., Han, S., Pool, J., Li, W., Liu, X., Wang, Y., and Dally, W. J. Exploring the regularity of sparse structure in convolutional neural networks. arXiv preprint arXiv:1705.08922(2017).Google Scholar
- Moons, B., Uytterhoeven, R., Dehaene, W., and Verhelst, M.14.5 envision: A 0.26-to-10 tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm fdsoi. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International(2017), IEEE, pp. 246--247.Google ScholarCross Ref
- Ouyang, H., He, N., Tran, L., and Gray, A.Stochastic alternating direction method of multipliers. In International Conference on Machine Learning(2013), pp. 80--88. Google ScholarDigital Library
- Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan,R., Khailany, B., Emer, J., Keckler, S. W., and Dally, W. J. Scnn: An accelerator for compressed-sparse convolutional neural networks. In ACM SIGARCH Computer Architecture News(2017), vol. 45, ACM, pp. 27--40. Google ScholarDigital Library
- Park, E., Ahn, J., and Yoo, S. Weighted-entropy-based quantization for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(2017), pp. 7197--7205.Google Scholar
- Qiu, J., Wang, J., Yao, S., Guo, K., Li, B., Zhou, E., Yu, J., Tang, T., Xu, N., Song, S., et al. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays(2016), ACM, pp. 26--35. Google ScholarDigital Library
- Rastegari, M., Ordonez, V., Redmon, J., and Farhadi, A. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision(2016), Springer, pp. 525--542.Google Scholar
- Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S. K.,Hernández-Lobato, J. M., Wei, G.-Y., and Brooks, D. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on(2016), IEEE, pp. 267--278. Google ScholarDigital Library
- Sharma, H., Park, J., Mahajan, D., Amaro, E., Kim, J. K., Shao, C., Mishra, A., and Esmaeilzadeh, H. From high-level deep neural models to fpgas. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture(2016), IEEE Computer Society, pp. 1--13. Google ScholarDigital Library
- Sim, J., Park, J.-S., Kim, M., Bae, D., Choi, Y., and Kim, L.-S. 14.6 a 1.42 tops/w deep convolutional neural network recognition processor for intelligent ioe systems. In Solid-State Circuits Conference (ISSCC), 2016 IEEE International(2016), IEEE, pp. 264--265.Google ScholarCross Ref
- Simonyan, K., and Zisserman, A.Very deep convolutional networksfor large-scale image recognition. arXiv preprint arXiv: 1409.1556(2014).Google Scholar
- Simonyan, K., and Zisserman, A. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR)(2015).Google Scholar
- Song, M., Zhong, K., Zhang, J., Hu, Y., Liu, D., Zhang, W., Wang, J., and Li, T.In-situ ai: Towards autonomous and incremental deep learning for iot systems. In High Performance Computer Architecture(HPCA), 2018 IEEE International Symposium on(2018), IEEE, pp. 92--103.Google Scholar
- Suda, N., Chandra, V., Dasika, G., Mohanty, A., Ma, Y., Vrud-hula, S., Seo, J.-s., and Cao, Y. Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays(2016), ACM, pp. 16--25. Google ScholarDigital Library
- Suzuki, T. Dual averaging and proximal gradient descent for online alternating direction multiplier method. In International Conference on Machine Learning(2013), pp. 392--400. Google ScholarDigital Library
- Umuroglu, Y., Fraser, N. J., Gambardella, G., Blott, M., Leong, P., Jahre, M., and Vissers, K. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays(2017), ACM, pp. 65--74. Google ScholarDigital Library
- Venkataramani, S., Ranjan, A., Banerjee, S., Das, D., Avancha, S., Jagannathan, A., Durg, A., Nagaraj, D., Kaul, B., Dubey, P., et al. Scaledeep: A scalable compute architecture for learning and evaluating deep networks. In Computer Architecture (ISCA), 2017 ACM/IEEE 44th Annual International Symposium on(2017), IEEE, pp. 13--26. Google ScholarDigital Library
- Wen, W., Wu, C., Wang, Y., Chen, Y., and Li, H. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems(2016), pp. 2074--2082. Google ScholarDigital Library
- Whatmough, P. N., Lee, S. K., Lee, H., Rama, S., Brooks, D., and Wei, G.-Y.14.3 a 28nm soc with a 1.2 ghz 568nj/prediction sparse deep-neural-network engine with> 0.1 timing error rate tolerance for iot applications. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International(2017), IEEE, pp. 242--243.Google ScholarCross Ref
- Wu, J., Leng, C., Wang, Y., Hu, Q., and Cheng, J. Quantized con-volutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(2016), pp. 4820--4828.Google Scholar
- Yang, T.-J., Chen, Y.-H., and Sze, V. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition(2017), pp. 6071--6079.Google Scholar
- Ye, S., Zhang, T., Zhang, K., Li, J., Xie, J., Liang, Y., Liu, S., Lin, X.,and Wang, Y.A unified framework of dnn weight pruning and weight clustering/quantization using admm. arXiv preprint arXiv:1811.01907(2018).Google Scholar
- Yu, J., Lukefahr, A., Palframan, D., Dasika, G., Das, R., and Mahlke, S. Scalpel: Customizing dnn pruning to the underlying hardware parallelism. In Computer Architecture (ISCA), 2017 ACM/IEEE 44th Annual International Symposium on(2017), IEEE, pp. 548--560. Google ScholarDigital Library
- Yu, X., Liu, T., Wang, X., and Tao, D. On compressing deep modelsby low rank and sparse decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(2017), pp. 7370--7379.Google Scholar
- Yuan, Z., Yue, J., Yang, H., Wang, Z., Li, J., Yang, Y., Guo, Q., Li, X., Chang, M.-F., Yang, H., et al. Sticker: A 0.41--62.1 tops/w 8bit neural network processor with multi-sparsity compatible convolution arrays and online tuning acceleration for fully connected layers. In 2018 IEEE Symposium on VLSI Circuits(2018), IEEE, pp. 33--34.Google Scholar
- Zhang, C., Fang, Z., Zhou, P., Pan, P., and Cong, J. Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. In Proceedings of the 35th International Conferenceon Computer-Aided Design(2016), ACM, p. 12. Google ScholarDigital Library
- Zhang, C., Wu, D., Sun, J., Sun, G., Luo, G., and Cong, J. Energy-efficient cnn implementation on a deeply pipelined fpga cluster. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design(2016), ACM, pp. 326--331. Google ScholarDigital Library
- Zhang, D., Wang, H., Figueiredo, M., and Balzano, L. Learningto share: Simultaneous parameter tying and sparsification in deeplearning.Google Scholar
- Zhang, T., Ye, S., Zhang, K., Tang, J., Wen, W., Fardad, M., and Wang, Y.A systematic dnn weight pruning framework using alternating direction method of multipliers. arXiv preprint arXiv:1804.03294(2018).Google Scholar
- Zhao, R., Song, W., Zhang, W., Xing, T., Lin, J.-H., Srivastava, M.,Gupta, R., and Zhang, Z. Accelerating binarized convolutional neural networks with software-programmable fpgas. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays(2017), ACM, pp. 15--24. Google ScholarDigital Library
- Zhou, A., Yao, A., Guo, Y., Xu, L., and Chen, Y. Incremental network quantization: Towards lossless cnns with low-precision weights. In International Conference on Learning Representations (ICLR)(2017).Google Scholar
- Zhu, C., Han, S., Mao, H., and Dally, W. J. Trained ternary quantization. In International Conference on Learning Representations (ICLR) (2017).Google Scholar
Index Terms
- ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Methods of Multipliers
Recommendations
ADMM attack: an enhanced adversarial attack for deep neural networks with undetectable distortions
ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation ConferenceMany recent studies demonstrate that state-of-the-art Deep neural networks (DNNs) might be easily fooled by adversarial examples, generated by adding carefully crafted and visually imperceptible distortions onto original legal inputs through adversarial ...
Iterative Adaptive Nonconvex Low-Rank Tensor Approximation to Image Restoration Based on ADMM
In this paper, in order to recover more finer details of the image and to avoid the loss of image structure information for image restoration problem, we develop an iterative adaptive weighted core tensor thresholding (IAWCTT) approach based on the ...
A systematic DNN weight pruning framework based on symmetric accelerated stochastic ADMM
AbstractWeight pruning is widely employed in compressing Deep Neural Networks (DNNs) because of the increasing computation and storage requirement. However, related work failed to efficiently combine the structure of the DNN loss function with the ...
Comments