research-article

Public Access

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

Authors:
Qiyu Wan

University of Houston

University of Houston
View Profile

,
Haojun Xia

University of Sydney

University of Sydney
View Profile

,
Xingyao Zhang

University of Washington, United States of America

University of Washington, United States of America
View Profile

,
Lening Wang

University of Houston, United States of America

University of Houston, United States of America
View Profile

,
Shuaiwen Leon Song

University of Sydney and UW Seattle, Australia

University of Sydney and UW Seattle, Australia
View Profile

,
Xin Fu

University of Houston

University of Houston
View Profile

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitectureOctober 2021Pages 885–897https://doi.org/10.1145/3466752.3480120

Published:17 October 2021Publication History

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 885–897

ABSTRACT

Bayesian Neural Networks (BNNs) that possess a property of uncertainty estimation have been increasingly adopted in a wide range of safety-critical AI applications which demand reliable and robust decision making, e.g., self-driving, rescue robots, medical image diagnosis. The training procedure of a probabilistic BNN model involves training an ensemble of sampled DNN models, which induces orders of magnitude larger volume of data movement than training a single DNN model. In this paper, we reveal that the root cause for BNN training inefficiency originates from the massive off-chip data transfer by Gaussian Random Variables (GRVs). To tackle this challenge, we propose a novel design that eliminates all the off-chip data transfer by GRVs through the reversed shifting of Linear Feedback Shift Registers (LFSRs) without incurring any training accuracy loss. To efficiently support our LFSR reversion strategy at the hardware level, we explore the design space of the current DNN accelerators and identify the optimal computation mapping scheme to best accommodate our strategy. By leveraging this finding, we design and prototype the first highly efficient BNN training accelerator, named Shift-BNN, that is low-cost and scalable. Extensive evaluation on five representative BNN models demonstrates that Shift-BNN achieves an average of 4.9 × (up to 10.8 ×) boost in energy efficiency and 1.6 × (up to 2.8 ×) speedup over the baseline DNN training accelerator.

References

Alexander Amini, Ava Soleimany, Sertac Karaman, and Daniela Rus. 2018. Spatial uncertainty sampling for end-to-end control. arXiv preprint arXiv:1805.04829(2018).Google Scholar
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. 2016. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565(2016).Google Scholar
R Andraka and R Phelps. 1998. An FPGA based processor yields a real time high fidelity radar environment simulator. In Military and Aerospace Applications of Programmable Devices and Technologies Conference. 220–224.Google Scholar
Ron Banner, Itay Hubara, Elad Hoffer, and Daniel Soudry. 2018. Scalable methods for 8-bit training of neural networks. arXiv preprint arXiv:1805.11046(2018).Google Scholar
David M Blei, Alp Kucukelbir, and Jon D McAuliffe. 2017. Variational inference: A review for statisticians. Journal of the American statistical Association 112, 518(2017), 859–877.Google ScholarCross Ref
Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. 2015. Weight uncertainty in neural network. In International Conference on Machine Learning. PMLR, 1613–1622.Google Scholar
Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316(2016).Google Scholar
Gunnar A Brosamler. 1988. An almost everywhere central limit theorem. In Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 104. Cambridge University Press, 561–574.Google ScholarCross Ref
Ruizhe Cai, Ao Ren, Ning Liu, Caiwen Ding, Luhao Wang, Xuehai Qian, Massoud Pedram, and Yanzhi Wang. 2018. Vibnn: Hardware acceleration of bayesian neural networks. ACM SIGPLAN Notices 53, 2 (2018), 476–488.Google ScholarDigital Library
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Computer Architecture News 42, 1 (2014), 269–284.Google ScholarDigital Library
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Computer Architecture News 44, 3 (2016), 367–379.Google ScholarDigital Library
Guilhem Chéron, Ivan Laptev, and Cordelia Schmid. 2015. P-cnn: Pose-based cnn features for action recognition. In Proceedings of the IEEE international conference on computer vision. 3218–3226.Google ScholarDigital Library
Sai Hung Cheung, Todd A Oliver, Ernesto E Prudencio, Serge Prudhomme, and Robert D Moser. 2011. Bayesian uncertainty analysis with applications to turbulence modeling. Reliability Engineering & System Safety 96, 9 (2011), 1137–1149.Google ScholarCross Ref
C Condo and WJ Gross. 2015. Pseudo-random Gaussian distribution through optimised LFSR permutations. Electronics Letters 51, 25 (2015), 2098–2100.Google ScholarCross Ref
Bill Dally. 2011. Power, programmability, and granularity: The challenges of exascale computing. In 2011 IEEE International Test Conference. IEEE Computer Society, 12–12.Google ScholarCross Ref
Dipankar Das, Sasikanth Avancha, Dheevatsa Mudigere, Karthikeyan Vaidynathan, Srinivas Sridharan, Dhiraj Kalamkar, Bharat Kaul, and Pradeep Dubey. 2016. Distributed deep learning using synchronous stochastic gradient descent. arXiv preprint arXiv:1602.06709(2016).Google Scholar
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.Google ScholarCross Ref
Li Deng. 2012. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine 29, 6 (2012), 141–142.Google ScholarCross Ref
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 92–104.Google ScholarDigital Library
Facebook. 2021. Baysian Optimization Research. Retrieved June 8, 2021 from https://research.fb.com/programs/research-awards/proposals/sample-efficient-sequential-bayesian-decision-making-rfp/Google Scholar
Clément Farabet, Cyril Poulet, Jefferson Y Han, and Yann LeCun. 2009. Cnp: An fpga-based processor for convolutional networks. In 2009 International Conference on Field Programmable Logic and Applications. IEEE, 32–37.Google ScholarCross Ref
Ammarah Farooq, SyedMuhammad Anwar, Muhammad Awais, and Saad Rehman. 2017. A deep CNN based multi-class classification of Alzheimer’s disease using MRI. In 2017 IEEE International Conference on Imaging systems and techniques (IST). IEEE, 1–6.Google ScholarDigital Library
Yonggan Fu, Haoran You, Yang Zhao, Yue Wang, Chaojian Li, Kailash Gopalakrishnan, Zhangyang Wang, and Yingyan Lin. 2020. Fractrain: Fractionally squeezing bit savings both temporally and spatially for efficient dnn training. arXiv preprint arXiv:2012.13113(2020).Google Scholar
Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning. PMLR, 1050–1059.Google Scholar
Abhinav Goel, Caleb Tung, Yung-Hsiang Lu, and George K Thiruvathukal. 2020. A survey of methods for low-power deep learning and computer vision. In 2020 IEEE 6th World Forum on Internet of Things (WF-IoT). IEEE, 1–6.Google Scholar
Alex Graves. 2011. Practical variational inference for neural networks. In Advances in neural information processing systems. Citeseer, 2348–2356.Google Scholar
Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In International conference on machine learning. PMLR, 1737–1746.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
Matthew D Hoffman, David M Blei, Chong Wang, and John Paisley. 2013. Stochastic variational inference.Journal of Machine Learning Research 14, 5 (2013).Google Scholar
Mark Horowitz. 2014. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). IEEE, 10–14.Google ScholarCross Ref
Minsu Kang. 2010. FPGA implementation of Gaussian-distributed pseudo-random number generator. In 6th International Conference on Digital Content, Multimedia Technology and its Applications. IEEE, 11–13.Google Scholar
Alex Kendall, Vijay Badrinarayanan, and Roberto Cipolla. 2015. Bayesian segnet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv preprint arXiv:1511.02680(2015).Google Scholar
Skanda Koppula, Lois Orosa, A Giray Yağlıkçı, Roknoddin Azizi, Taha Shahroodi, Konstantinos Kanellopoulos, and Onur Mutlu. 2019. EDEN: enabling energy-efficient, high-performance deep neural network inference using approximate DRAM. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 166–181.Google ScholarDigital Library
Alex Krizhevsky, Geoffrey Hinton, 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012), 1097–1105.Google Scholar
Alberto Delmás Lascorz, Sayeh Sharify, Isak Edo, Dylan Malone Stuart, Omar Mohamed Awad, Patrick Judd, Mostafa Mahmoud, Milos Nikolic, Kevin Siu, Zissis Poulos, 2019. Shapeshifter: Enabling fine-grain data width adaptation in deep learning. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 28–41.Google ScholarDigital Library
Yann LeCun 2015. LeNet-5, convolutional neural networks. URL: http://yann. lecun. com/exdb/lenet.Google Scholar
Christian Leibig, Vaneeda Allken, Murat Seçkin Ayhan, Philipp Berens, and Siegfried Wahl. 2017. Leveraging uncertainty information from deep neural networks for disease detection. Scientific reports 7, 1 (2017), 1–14.Google Scholar
Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Omar Mohamed Awad, Gennady Pekhimenko, Jorge Albericio, and Andreas Moshovos. 2020. Tensordash: Exploiting sparsity to accelerate deep neural network training. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 781–795.Google ScholarCross Ref
NHTSA. 2017. Tesla Crash Preliminary Evaluation Report. Technical Report. U.S. Department of Transportation, National Highway Traffic Safety Administration.Google Scholar
Nvidia. 2021. Nvidia Deep Learning Accelerator. Retrieved June 8, 2021 from http://nvdla.org/Google Scholar
Nvidia. 2021. Nvidia Visual Profiler. Retrieved June 8, 2021 from https://docs.nvidia.com/cuda/profiler-users-guide/index.htmlGoogle Scholar
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703(2019).Google Scholar
Eric Qin, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das, Bharat Kaul, and Tushar Krishna. 2020. Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 58–70.Google ScholarCross Ref
Minsoo Rhu, Natalia Gimelshein, Jason Clemons, Arslan Zulfiqar, and Stephen W Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 1–13.Google ScholarCross Ref
Kumar Shridhar, Felix Laumann, and Marcus Liwicki. 2019. A comprehensive guide to bayesian convolutional neural network with variational inference. arXiv preprint arXiv:1901.02731(2019).Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199(2014).Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).Google Scholar
Linghao Song, Jiachen Mao, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2019. Hypar: Towards hybrid parallelism for deep learning accelerator array. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 56–68.Google ScholarCross Ref
Christian Szegedy, Alexander Toshev, and Dumitru Erhan. 2013. Deep neural networks for object detection. (2013).Google Scholar
Tesla. 2020. Tesla car crash report 2020. Retrieved June 8, 2021 from https://arstechnica.com/cars/2019/05/feds-autopilot-was-active-during-deadly-march-tesla-crash/Google Scholar
Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, 2017. Scaledeep: A scalable compute architecture for learning and evaluating deep networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 13–26.Google ScholarDigital Library
Qiyu Wan and Xin Fu. 2020. Fast-BCNN: Massive Neuron Skipping in Bayesian Convolutional Neural Networks. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 229–240.Google Scholar
Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, and Tim Kraska. 2018. Superneurons: Dynamic GPU memory management for training deep neural networks. In Proceedings of the 23rd ACM SIGPLAN symposium on principles and practice of parallel programming. 41–53.Google ScholarDigital Library
Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, and Kailash Gopalakrishnan. 2018. Training deep neural networks with 8-bit floating point numbers. arXiv preprint arXiv:1812.08011(2018).Google Scholar
Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, and Zhangyang Wang. 2019. E2-train: Training state-of-the-art cnns with over 80% energy savings. arXiv preprint arXiv:1910.13349(2019).Google Scholar
Markus Wulfmeier. 2018. On machine learning and structure for mobile robots. arXiv preprint arXiv:1806.06003(2018).Google Scholar
Xilinx. 2019. Xilinx Memory Interface Generator. Retrieved June 8, 2021 from https://www.xilinx.com/support/documentation/ip_documentation/ug086.pdfGoogle Scholar
Xilinx. 2020. Xilinx Power Estimator. Retrieved June 8, 2021 from https://www.xilinx.com/products/technology/power/xpe.htmlGoogle Scholar
Dingqing Yang, Amin Ghasemazar, Xiaowei Ren, Maximilian Golub, Guy Lemieux, and Mieszko Lis. 2020. Procrustes: a dataflow and accelerator for sparse deep neural network training. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 711–724.Google ScholarCross Ref
Kezhou Yang, Akul Malhotra, Sen Lu, and Abhronil Sengupta. 2020. All-spin Bayesian neural networks. IEEE Transactions on Electron Devices 67, 3 (2020), 1340–1347.Google ScholarCross Ref
Yukuan Yang, Lei Deng, Shuang Wu, Tianyi Yan, Yuan Xie, and Guoqi Li. 2020. Training high-performance and large-scale deep neural networks with full 8-bit integers. Neural Networks 125(2020), 70–82.Google ScholarCross Ref
Cheng Zhang, Judith Bütepage, Hedvig Kjellström, and Stephan Mandt. 2018. Advances in variational inference. IEEE transactions on pattern analysis and machine intelligence 41, 8(2018), 2008–2026.Google Scholar
Jiaqi Zhang, Xiangru Chen, Mingcong Song, and Tao Li. 2019. Eager pruning: algorithm and architecture support for fast training of deep neural networks. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, 292–303.Google ScholarDigital Library
Bojian Zheng, Nandita Vijaykumar, and Gennady Pekhimenko. 2020. Echo: Compiler-based GPU memory footprint reduction for LSTM RNN training. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 1089–1102.Google ScholarDigital Library

Recommendations

Hardware-friendly Higher-Order Neural Network Training using Distributed Evolutionary Algorithms

In this paper, we study the class of Higher-Order Neural Networks and especially the Pi-Sigma Networks. The performance of Pi-Sigma Networks is evaluated through several well known Neural Network Training benchmarks. In the experiments reported here, ...
Read More
Efficient and reliable training of neural networks
HSI'09: Proceedings of the 2nd conference on Human System Interactions

This paper introduces a neural network training tool, NBN 2.0, which is developed based on neuron by neuron computing method [1][2]. Error backpropagation (EBP) algorithm, Levenberg Marquardt (LM) algorithm and its improved versions are implemented in ...
Read More
Complexity of training ReLU neural network
Abstract
In this paper, we explore some basic questions on the complexity of training neural networks with ReLU activation function. We show that it is NP-hard to train a two-hidden layer feedforward ReLU neural network. If dimension of the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
October 2021
1322 pages
ISBN:9781450385572
DOI:10.1145/3466752

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bayesian neural networks accelerator
energy efficiency
random number generation
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate484of2,242submissions,22%
Upcoming Conference
MICRO '24

Sponsor:

sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 779
  Total Downloads
- Downloads (Last 12 months)271
- Downloads (Last 6 weeks)96
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Recommendations

Hardware-friendly Higher-Order Neural Network Training using Distributed Evolutionary Algorithms

Efficient and reliable training of neural networks

Complexity of training ReLU neural network

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture

ABSTRACT

References

Cited By

Recommendations

Hardware-friendly Higher-Order Neural Network Training using Distributed Evolutionary Algorithms

Efficient and reliable training of neural networks

Complexity of training ReLU neural network

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media