ABSTRACT
Bayesian Neural Networks (BNNs) that possess a property of uncertainty estimation have been increasingly adopted in a wide range of safety-critical AI applications which demand reliable and robust decision making, e.g., self-driving, rescue robots, medical image diagnosis. The training procedure of a probabilistic BNN model involves training an ensemble of sampled DNN models, which induces orders of magnitude larger volume of data movement than training a single DNN model. In this paper, we reveal that the root cause for BNN training inefficiency originates from the massive off-chip data transfer by Gaussian Random Variables (GRVs). To tackle this challenge, we propose a novel design that eliminates all the off-chip data transfer by GRVs through the reversed shifting of Linear Feedback Shift Registers (LFSRs) without incurring any training accuracy loss. To efficiently support our LFSR reversion strategy at the hardware level, we explore the design space of the current DNN accelerators and identify the optimal computation mapping scheme to best accommodate our strategy. By leveraging this finding, we design and prototype the first highly efficient BNN training accelerator, named Shift-BNN, that is low-cost and scalable. Extensive evaluation on five representative BNN models demonstrates that Shift-BNN achieves an average of 4.9 × (up to 10.8 ×) boost in energy efficiency and 1.6 × (up to 2.8 ×) speedup over the baseline DNN training accelerator.
- Alexander Amini, Ava Soleimany, Sertac Karaman, and Daniela Rus. 2018. Spatial uncertainty sampling for end-to-end control. arXiv preprint arXiv:1805.04829(2018).Google Scholar
- Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565(2016).Google Scholar
- R Andraka and R Phelps. 1998. An FPGA based processor yields a real time high fidelity radar environment simulator. In Military and Aerospace Applications of Programmable Devices and Technologies Conference. 220–224.Google Scholar
- Ron Banner, Itay Hubara, Elad Hoffer, and Daniel Soudry. 2018. Scalable methods for 8-bit training of neural networks. arXiv preprint arXiv:1805.11046(2018).Google Scholar
- David M Blei, Alp Kucukelbir, and Jon D McAuliffe. 2017. Variational inference: A review for statisticians. Journal of the American statistical Association 112, 518(2017), 859–877.Google ScholarCross Ref
- Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. Weight uncertainty in neural network. In International Conference on Machine Learning. PMLR, 1613–1622.Google Scholar
- Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316(2016).Google Scholar
- Gunnar A Brosamler. 1988. An almost everywhere central limit theorem. In Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 104. Cambridge University Press, 561–574.Google ScholarCross Ref
- Ruizhe Cai, Ao Ren, Ning Liu, Caiwen Ding, Luhao Wang, Xuehai Qian, Massoud Pedram, and Yanzhi Wang. 2018. Vibnn: Hardware acceleration of bayesian neural networks. ACM SIGPLAN Notices 53, 2 (2018), 476–488.Google ScholarDigital Library
- Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Computer Architecture News 42, 1 (2014), 269–284.Google ScholarDigital Library
- Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Computer Architecture News 44, 3 (2016), 367–379.Google ScholarDigital Library
- Guilhem Chéron, Ivan Laptev, and Cordelia Schmid. 2015. P-cnn: Pose-based cnn features for action recognition. In Proceedings of the IEEE international conference on computer vision. 3218–3226.Google ScholarDigital Library
- Sai Hung Cheung, Todd A Oliver, Ernesto E Prudencio, Serge Prudhomme, and Robert D Moser. 2011. Bayesian uncertainty analysis with applications to turbulence modeling. Reliability Engineering & System Safety 96, 9 (2011), 1137–1149.Google ScholarCross Ref
- C Condo and WJ Gross. 2015. Pseudo-random Gaussian distribution through optimised LFSR permutations. Electronics Letters 51, 25 (2015), 2098–2100.Google ScholarCross Ref
- Bill Dally. 2011. Power, programmability, and granularity: The challenges of exascale computing. In 2011 IEEE International Test Conference. IEEE Computer Society, 12–12.Google ScholarCross Ref
- Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidynathan, Srinivas Sridharan, Dhiraj Kalamkar, Bharat Kaul, and Pradeep Dubey. 2016. Distributed deep learning using synchronous stochastic gradient descent. arXiv preprint arXiv:1602.06709(2016).Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.Google ScholarCross Ref
- Li Deng. 2012. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine 29, 6 (2012), 141–142.Google ScholarCross Ref
- Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 92–104.Google ScholarDigital Library
- Facebook. 2021. Baysian Optimization Research. Retrieved June 8, 2021 from https://research.fb.com/programs/research-awards/proposals/sample-efficient-sequential-bayesian-decision-making-rfp/Google Scholar
- Clément Farabet, Cyril Poulet, Jefferson Y Han, and Yann LeCun. 2009. Cnp: An fpga-based processor for convolutional networks. In 2009 International Conference on Field Programmable Logic and Applications. IEEE, 32–37.Google ScholarCross Ref
- Ammarah Farooq, SyedMuhammad Anwar, Muhammad Awais, and Saad Rehman. 2017. A deep CNN based multi-class classification of Alzheimer’s disease using MRI. In 2017 IEEE International Conference on Imaging systems and techniques (IST). IEEE, 1–6.Google ScholarDigital Library
- Yonggan Fu, Haoran You, Yang Zhao, Yue Wang, Chaojian Li, Kailash Gopalakrishnan, Zhangyang Wang, and Yingyan Lin. 2020. Fractrain: Fractionally squeezing bit savings both temporally and spatially for efficient dnn training. arXiv preprint arXiv:2012.13113(2020).Google Scholar
- Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. PMLR, 1050–1059.Google Scholar
- Abhinav Goel, Caleb Tung, Yung-Hsiang Lu, and George K Thiruvathukal. 2020. A survey of methods for low-power deep learning and computer vision. In 2020 IEEE 6th World Forum on Internet of Things (WF-IoT). IEEE, 1–6.Google Scholar
- Alex Graves. 2011. Practical variational inference for neural networks. In Advances in neural information processing systems. Citeseer, 2348–2356.Google Scholar
- Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In International conference on machine learning. PMLR, 1737–1746.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
- Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. 2013. Stochastic variational inference.Journal of Machine Learning Research 14, 5 (2013).Google Scholar
- Mark Horowitz. 2014. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE, 10–14.Google ScholarCross Ref
- Minsu Kang. 2010. FPGA implementation of Gaussian-distributed pseudo-random number generator. In 6th International Conference on Digital Content, Multimedia Technology and its Applications. IEEE, 11–13.Google Scholar
- Alex Kendall, Vijay Badrinarayanan, and Roberto Cipolla. 2015. Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv preprint arXiv:1511.02680(2015).Google Scholar
- Skanda Koppula, Lois Orosa, A Giray Yağlıkçı, Roknoddin Azizi, Taha Shahroodi, Konstantinos Kanellopoulos, and Onur Mutlu. 2019. EDEN: enabling energy-efficient, high-performance deep neural network inference using approximate DRAM. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 166–181.Google ScholarDigital Library
- Alex Krizhevsky, Geoffrey Hinton, 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012), 1097–1105.Google Scholar
- Alberto Delmás Lascorz, Sayeh Sharify, Isak Edo, Dylan Malone Stuart, Omar Mohamed Awad, Patrick Judd, Mostafa Mahmoud, Milos Nikolic, Kevin Siu, Zissis Poulos, 2019. Shapeshifter: Enabling fine-grain data width adaptation in deep learning. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 28–41.Google ScholarDigital Library
- Yann LeCun 2015. LeNet-5, convolutional neural networks. URL: http://yann. lecun. com/exdb/lenet.Google Scholar
- Christian Leibig, Vaneeda Allken, Murat Seçkin Ayhan, Philipp Berens, and Siegfried Wahl. 2017. Leveraging uncertainty information from deep neural networks for disease detection. Scientific reports 7, 1 (2017), 1–14.Google Scholar
- Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Omar Mohamed Awad, Gennady Pekhimenko, Jorge Albericio, and Andreas Moshovos. 2020. Tensordash: Exploiting sparsity to accelerate deep neural network training. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 781–795.Google ScholarCross Ref
- NHTSA. 2017. Tesla Crash Preliminary Evaluation Report. Technical Report. U.S. Department of Transportation, National Highway Traffic Safety Administration.Google Scholar
- Nvidia. 2021. Nvidia Deep Learning Accelerator. Retrieved June 8, 2021 from http://nvdla.org/Google Scholar
- Nvidia. 2021. Nvidia Visual Profiler. Retrieved June 8, 2021 from https://docs.nvidia.com/cuda/profiler-users-guide/index.htmlGoogle Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703(2019).Google Scholar
- Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. 2020. Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 58–70.Google ScholarCross Ref
- Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1–13.Google ScholarCross Ref
- Kumar Shridhar, Felix Laumann, and Marcus Liwicki. 2019. A comprehensive guide to bayesian convolutional neural network with variational inference. arXiv preprint arXiv:1901.02731(2019).Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199(2014).Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).Google Scholar
- Linghao Song, Jiachen Mao, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2019. Hypar: Towards hybrid parallelism for deep learning accelerator array. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 56–68.Google ScholarCross Ref
- Christian Szegedy, Alexander Toshev, and Dumitru Erhan. 2013. Deep neural networks for object detection. (2013).Google Scholar
- Tesla. 2020. Tesla car crash report 2020. Retrieved June 8, 2021 from https://arstechnica.com/cars/2019/05/feds-autopilot-was-active-during-deadly-march-tesla-crash/Google Scholar
- Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, 2017. Scaledeep: A scalable compute architecture for learning and evaluating deep networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 13–26.Google ScholarDigital Library
- Qiyu Wan and Xin Fu. 2020. Fast-BCNN: Massive Neuron Skipping in Bayesian Convolutional Neural Networks. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 229–240.Google Scholar
- Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, and Tim Kraska. 2018. Superneurons: Dynamic GPU memory management for training deep neural networks. In Proceedings of the 23rd ACM SIGPLAN symposium on principles and practice of parallel programming. 41–53.Google ScholarDigital Library
- Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, and Kailash Gopalakrishnan. 2018. Training deep neural networks with 8-bit floating point numbers. arXiv preprint arXiv:1812.08011(2018).Google Scholar
- Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, and Zhangyang Wang. 2019. E2-train: Training state-of-the-art cnns with over 80% energy savings. arXiv preprint arXiv:1910.13349(2019).Google Scholar
- Markus Wulfmeier. 2018. On machine learning and structure for mobile robots. arXiv preprint arXiv:1806.06003(2018).Google Scholar
- Xilinx. 2019. Xilinx Memory Interface Generator. Retrieved June 8, 2021 from https://www.xilinx.com/support/documentation/ip_documentation/ug086.pdfGoogle Scholar
- Xilinx. 2020. Xilinx Power Estimator. Retrieved June 8, 2021 from https://www.xilinx.com/products/technology/power/xpe.htmlGoogle Scholar
- Dingqing Yang, Amin Ghasemazar, Xiaowei Ren, Maximilian Golub, Guy Lemieux, and Mieszko Lis. 2020. Procrustes: a dataflow and accelerator for sparse deep neural network training. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 711–724.Google ScholarCross Ref
- Kezhou Yang, Akul Malhotra, Sen Lu, and Abhronil Sengupta. 2020. All-spin Bayesian neural networks. IEEE Transactions on Electron Devices 67, 3 (2020), 1340–1347.Google ScholarCross Ref
- Yukuan Yang, Lei Deng, Shuang Wu, Tianyi Yan, Yuan Xie, and Guoqi Li. 2020. Training high-performance and large-scale deep neural networks with full 8-bit integers. Neural Networks 125(2020), 70–82.Google ScholarCross Ref
- Cheng Zhang, Judith Bütepage, Hedvig Kjellström, and Stephan Mandt. 2018. Advances in variational inference. IEEE transactions on pattern analysis and machine intelligence 41, 8(2018), 2008–2026.Google Scholar
- Jiaqi Zhang, Xiangru Chen, Mingcong Song, and Tao Li. 2019. Eager pruning: algorithm and architecture support for fast training of deep neural networks. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, 292–303.Google ScholarDigital Library
- Bojian Zheng, Nandita Vijaykumar, and Gennady Pekhimenko. 2020. Echo: Compiler-based GPU memory footprint reduction for LSTM RNN training. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1089–1102.Google ScholarDigital Library
Recommendations
Hardware-friendly Higher-Order Neural Network Training using Distributed Evolutionary Algorithms
In this paper, we study the class of Higher-Order Neural Networks and especially the Pi-Sigma Networks. The performance of Pi-Sigma Networks is evaluated through several well known Neural Network Training benchmarks. In the experiments reported here, ...
Efficient and reliable training of neural networks
HSI'09: Proceedings of the 2nd conference on Human System InteractionsThis paper introduces a neural network training tool, NBN 2.0, which is developed based on neuron by neuron computing method [1][2]. Error backpropagation (EBP) algorithm, Levenberg Marquardt (LM) algorithm and its improved versions are implemented in ...
Complexity of training ReLU neural network
AbstractIn this paper, we explore some basic questions on the complexity of training neural networks with ReLU activation function. We show that it is NP-hard to train a two-hidden layer feedforward ReLU neural network. If dimension of the ...
Comments