ABSTRACT
We present a method for transparently identifying ineffectual computations during inference with Deep Learning models. Specifically, by decomposing multiplications down to the bit level, the amount of work needed by multiplications during inference can be potentially reduced by at least 40× across a wide selection of neural networks (8b and 16b). This method produces numerically identical results and does not affect overall accuracy. We present Laconic, a hardware accelerator that implements this approach to boost energy efficiency for inference with Deep Learning Networks. Laconic judiciously gives up some of the work reduction potential to yield a low-cost, simple, and energy efficient design that outperforms other state-of-the-art accelerators: an optimized DaDianNao-like design [13], Eyeriss [15], SCNN [71], Pragmatic [3], and BitFusion [83]. We study 16b, 8b, and 1b/2b fixed-point quantized models.
- Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, and Luc J. Van Gool. 2017. Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations. In Advances in Neural Information Processing Systems 30: Annual Conf. on Neural Information Processing Systems 2017, 4--9 Dec. 2017, Long Beach, CA, USA. 1141--1151. Google ScholarDigital Library
- Vahideh Akhlaghi, Amir Yazdanbakhsh, Kambiz Samadi, Rajesh K. Gupta, and Hadi Esmaeilzadeh. 2018. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks. In Intl' Symp. on Computer Architecture. Google ScholarDigital Library
- Jorge Albericio, Alberto Delmás, Patrick Judd, Sayeh Sharify, Gerard O'Leary, Roman Genov, and Andreas Moshovos. 2017. Bit-pragmatic Deep Neural Network Computing. In Intl' Symp. on Microarchitecture. Google ScholarDigital Library
- Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. CNVLUTIN: Ineffectual-Neuron-Free Deep Neural Network Computing. In Intl' Symp. on Computer Architecture. Google ScholarDigital Library
- Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-layer CNN accelerators. In Intl' Symp. on Microarchitecture.Google ScholarCross Ref
- Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence (2017).Google Scholar
- Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, and Doina Precup. 2015. Conditional Computation in Neural Networks for faster models. CoRR abs/1511.06297.Google Scholar
- Gabriel J. Brostow, Julien Fauqueur, and Roberto Cipolla. 2008. Semantic Object Classes in Video: A High-Definition Ground Truth Database. Pattern Recognition Letters (2008). Google ScholarDigital Library
- Cadence. 2019. Encounter RTL Compiler. (2019). https://www.cadence.comGoogle Scholar
- Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In Intl' Conf. on Learning Representations. https://arxiv.org/pdf/1812.00332.pdfGoogle Scholar
- Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Viji Srinivasan, and Swagath Venkataramani. 2018. Exploiting approximate computing for deep learning acceleration. In Design, Automation & Test in Europe Conf.Google Scholar
- Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Conf. on Architectural support for programming languages and operating systems. Google ScholarDigital Library
- Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and O. Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In Intl' Symp. on Microarchitecture. Google ScholarDigital Library
- Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-efficient Dataflow for Convolutional Neural Networks. In Intl' Symp. on Computer Architecture. Google ScholarDigital Library
- Yu-Hsin Chen, Tushar Krishna, Joel Emer, and Vivienne Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits 52, 1 (Jan 2017).Google ScholarCross Ref
- Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. 2016. Towards the Limit of Network Quantization. CoRR abs/1612.01543 (2016).Google Scholar
- Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Low precision arithmetic for deep learning. CoRR abs/1412.7024 (2014).Google Scholar
- M. Courbariaux, Y. Bengio, and J.-P. David. 2015. BinaryConnect: Training Deep Neural Networks with binary weights during propagations. CoPR abs/1511.00363.Google Scholar
- Bin Dai, Chen Zhu, and David P. Wipf. 2018. Compressing Neural Networks using the Variational Information Bottleneck. CoRR abs/1802.10399 (2018).Google Scholar
- Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc V. Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. CoRR abs/1901.02860 (2019). http://arxiv.org/abs/1901.02860Google Scholar
- Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language Modeling with Gated Convolutional Networks. In Intl' Conf. on Machine Learning. Google ScholarDigital Library
- Alberto Delmas, Patrick Judd, Sayeh Sharify, and Andreas Moshovos. 2017. Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks. CoRR abs/1706.00504 (2017).Google Scholar
- Alberto Delmas, Sayeh Sharify, Patrick Judd, Milos Nikolic, and Andreas Moshovos. 2018. DPRed: Making Typical Activation Values Matter In Deep Learning Computing. CoRR abs/1804.06732 (2018).Google Scholar
- Alberto Delmas Lascorz, Patrick Judd, Dylan Malone Stuart, Zissis Poulos, Mostafa Mahmoud, Sayeh Sharify, Milos Nikolic, Kevin Siu, and Andreas Moshovos. 2019. Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks. In Intl' Conf. on Architectural Support for Programming Languages and Operating Systems.Google Scholar
- Lei Deng, Peng Jiao, Jing Pei, Zhenzhi Wu, and Guoqi Li. 2018. GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework. Neural Networks 100 (2018), 49--58.Google ScholarDigital Library
- Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, Xiaolong Ma, Yipeng Zhang, Jian Tang, Qinru Qiu, Xue Lin, and Bo Yuan. 2017. CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices. In Intl' Symp. on Microarchitecture.Google ScholarDigital Library
- Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark Silicon and the End of Multicore Scaling. In Intl' Symp. on Computer Architecture. Google ScholarDigital Library
- M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2015. The Pascal Visual Object Classes Challenge: A Retrospective. Intl' Journal of Computer Vision 111, 1 (Jan. 2015), 98--136. Google ScholarDigital Library
- Angela Fan, Mike Lewis, and Yann Dauphin. 2018. Hierarchical Neural Story Generation. CoRR abs/1805.04833 (2018).Google Scholar
- Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. In Intl' Conf. on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
- Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. Tetris: Scalable and efficient neural network acceleration with 3d memory. ACM SIGOPS Operating Systems Review 51, 2 (2017), 751--764. Google ScholarDigital Library
- Michaël Gharbi, Gaurav Chaurasia, Sylvain Paris, and Frédo Durand. 2016. Deep Joint Demosaicking and Denoising. ACM Trans. on Graphics (2016). Google ScholarDigital Library
- Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic Network Surgery for Efficient DNNs. CoRR abs/1608.04493 (2016). Google ScholarDigital Library
- Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep Learning with Limited Numerical Precision. In Intl' Conf. on Machine Learning. Google ScholarDigital Library
- Song Han and William J. Dally. 2018. Bandwidth-efficient Deep Learning. In Design Automation Conf. Google ScholarDigital Library
- Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William (Bill) J. Dally. 2017. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA. In Intl' Symp. on Field-Programmable Gate Arrays. Google ScholarDigital Library
- Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Intl' Symp. on Computer Architecture.Google Scholar
- Song Han, Huizi Mao, and William J. Dally. 2015. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. CoPR abs/1510.00149 (2015).Google Scholar
- Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning Both Weights and Connections for Efficient Neural Networks. In Intl' Conf. on Neural Information Processing Systems. Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR abs/1512.03385 (2015).Google Scholar
- Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. AMC: AutoML for Model Compression and Acceleration on Mobile Devices. In European Conf. on Computer Vision.Google Scholar
- Mark Horowitz. 2014. Computing's energy problem (and what we can do about it). IEEE Intl' Solid-State Circuits Conf. 57 (02 2014), 10--14.Google ScholarCross Ref
- Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. Journal of Machine Learning Research 18 (2017), 187:1--187:30. Google ScholarDigital Library
- Yani Ioannou, Duncan P. Robertson, Darko Zikic, Peter Kontschieder, Jamie Shotton, Matthew Brown, and Antonio Criminisi. 2016. Decision Forests, Convolutional Networks and the Models in-Between. CoRR abs/1603.01250 (2016).Google Scholar
- Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew G. Howard, Hartwig Adam, and Dmitry Kalenichenko. 2017. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. CoRR abs/1712.05877 (2017).Google Scholar
- Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks. In Workshop On Approximate Computing (WAPCO).Google Scholar
- Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Raquel Urtasun, and Andreas Moshovos. 2015. Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets. CoPR abs/1511.05236v4 (2015).Google Scholar
- Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial Deep Neural Network Computing . In Intl' Symp. on Microarchitecture. Google ScholarDigital Library
- Supriya Kapur, Asit K. Mishra, and Debbie Marr. 2017. Low Precision RNNs: Quantizing RNNs Without Losing Accuracy. CoRR abs/1710.07706 (2017).Google Scholar
- Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo. 2017. A novel zero weight/activation-aware hardware architecture of convolutional neural network. In Design Automation and Test Europe. Google ScholarDigital Library
- Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo. 2018. ZeNA: Zero-Aware Neural Network Accelerator. IEEE Design Test 35 (2018), 39--46.Google ScholarCross Ref
- Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A Programmable Digital Neuromorphic Architecture with High-density 3D Memory. In Intl' Symp. on Computer Architecture. Google ScholarDigital Library
- Minje Kim and Paris Smaragdis. 2016. Bitwise Neural Networks. CoRR abs/1601.06071 (2016).Google Scholar
- H.T. Kung, Bradley McDanel, and Sai Qian Zhang. 2019. Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization. In Intl' Conf. on Architectural Support for Programming Languages and Operating Systems.Google Scholar
- Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. In Intl' Conf. on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
- Vadim Lebedev and Victor S. Lempitsky. 2016. Fast ConvNets Using Group-Wise Brain Damage. In Computer Vision and Pattern Recognition.Google Scholar
- Dongwoo Lee, Sungbum Kang, and Kiyoung Choi. 2018. ComPEND: Computation Pruning through Early Negative Detection for ReLU in a deep neural network accelerator. In Intl' Conf. on Supercomputing. Google ScholarDigital Library
- Jonathan Lew, Deval Shah, Suchita Pati, Shaylin Cattell, Mengchi Zhang, Amruth Sandhupatla, Christopher Ng, Negar Goli, Matthew D. Sinclair, Timothy G. Rogers, and Tor M. Aamodt. 2018. Analyzing Machine Learning Workloads Using a Detailed GPU Simulator. CoRR abs/1811.08933 (2018). arXiv:1811.08933 http://arxiv.org/abs/1811.08933Google Scholar
- Dingyi Li and Zengfu Wang. 2017. Video Superresolution via Motion Compensation and Deep Residual Learning. IEEE Trans. on Computational Imaging.Google Scholar
- Fengfu Li and Bin Liu. 2016. Ternary Weight Networks. CoRR abs/1605.04711 (2016).Google Scholar
- Darryl D. Lin, Sachin S. Talathi, and V. Sreekanth Annapureddy. 2016. Fixed Point Quantization of Deep Convolutional Networks. In Intl' Conf. on Machine Learning. Google ScholarDigital Library
- Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. 2017. Runtime Neural Pruning. In Advances in Neural Information Processing Systems 30. Curran Associates, Inc. Google ScholarDigital Library
- Zhenhong Liu, Amir Yazdanbakhsh, Taejoon Park, Hadi Esmaeilzadeh, and Nam Kim. 2018. SiMul: An Algorithm-Driven Approximate Multiplier Design for Machine Learning. IEEE Micro 38 (2018), 50--59.Google ScholarDigital Library
- Christos Louizos, Karen Ullrich, and Max Welling. 2017. Bayesian Compression for Deep Learning. In Conf. on Neural Information Processing Systems. Google ScholarDigital Library
- Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. 2017. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks. In Intl' Symp. on High Performance Computer Architecture.Google Scholar
- David Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Intl' Conf. on Computer Vision, Vol. 2. 416--423 vol. 2.Google ScholarCross Ref
- Mayler Martins, Jody Maick Matos, Renato P Ribas, André Reis, Guilherme Schlinker, Lucio Rech, and Jens Michelsen. 2015. Open cell library in 15nm FreePDK technology. In Proceedings of the 2015 Symposium on International Symposium on Physical Design. ACM, 171--178. Google ScholarDigital Library
- Szymon Migacz. 2017. 8-bit Inference with TensorRT. GPU Technology Conf.Google Scholar
- Asit K. Mishra, Eriko Nurvitadhi, Jeffrey J. Cook, and Debbie Marr. 2017. WRPN: Wide Reduced-Precision Networks. CoRR abs/1709.01134(2017).Google Scholar
- Naveen Muralimanohar and Rajeev Balasubramonian. 2015. CACTI 6.0: A Tool to Understand Large Caches. (2015).Google Scholar
- Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Intl' Symp. on Computer Architecture (ISCA '17). Google ScholarDigital Library
- Eunhyeok Park, Junwhan Ahn, and Sungjoo Yoo. 2017. Weighted-Entropy-Based Quantization for Deep Neural Networks. In Conf. on Computer Vision and Pattern Recognition.Google Scholar
- Eunhyeok Park, Dongyoung Kim, and Sungjoo Yoo. 2018. Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation. In Intl' Symp. on Computer Architecture. Google ScholarDigital Library
- Eunhyeok Park, Sungjoo Yoo, and Peter Vajda. 2018. Value-Aware Quantization for Training and Inference of Neural Networks. In European Conf. on Computer Vision.Google Scholar
- Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. 2017. Faster CNNs with Direct Sparse Convolutions and Guided Pruning. In Intl' Conf. on Learning Representations.Google Scholar
- Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In Intl' Symp. on Field-Programmable Gate Arrays.Google Scholar
- Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting Image Annotations Using Amazon's Mechanical Turk. In NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Google ScholarDigital Library
- Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR abs/1603.05279 (2016).Google Scholar
- Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, Faster, Stronger. CoRR abs/1612.08242 (2016).Google Scholar
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2014. ImageNet Large Scale Visual Recognition Challenge. CoRR abs/1409.0575 (Sept. 2014).Google Scholar
- Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. CoRR abs/1801.04381.Google Scholar
- Sayeh Sharify, Alberto Delmas Lascorz, Kevin Siu, Patrick Judd, and Andreas Moshovos. 2018. Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks. In Proceedings of the 55th Annual Design Automation Conference. ACM, 20.Google ScholarDigital Library
- Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, and Hadi Esmaeilzadeh. 2018. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network. In ISCA. IEEE Computer Society, 764--775. Google ScholarDigital Library
- Sungho Shin, Kyuyeon Hwang, and Wonyong Sung. 2015. Fixed Point Performance Analysis of Recurrent Neural Networks. CoRR abs/1512.01322 (2015).Google Scholar
- Kevin Siu, Dylan Malone Stuart, Mostafa Mahmoud, and Andreas Moshovos. 2018. Memory Requirements for Convolutional Neural Network Hardware Accelerators. In IEEE Intl' Symp. on Workload Characterization.Google Scholar
- Synopsys. 2019. Design Compiler. http://www.synopsys.com/Tools/Implementation/RTLSynthesis/DesignCompiler/Pages. (2019).Google Scholar
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In IEEE Conf. on computer vision and pattern recognition.Google ScholarCross Ref
- Karen Ullrich, Edward Meeds, and Max Welling. 2017. Soft Weight-Sharing for Neural Network Compression. CoRR abs/1702.04008 (2017).Google Scholar
- Cheng Wang, Haojin Yang, Christian Bartz, and Christoph Meinel. 2016. Image captioning with deep bidirectional LSTMs. In ACM Multimedia Conf. Google ScholarDigital Library
- Peter Warden. 2016. Low-precision matrix multiplication. https://petewarden.com.Google Scholar
- Pete Warden. 2017. How to Quantize Neural Networks with TensorFlow. https://www.tensorflow.org/performance/quantization. (2017).Google Scholar
- Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. In Annual Design Automation Conf. Google ScholarDigital Library
- Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning Structured Sparsity in Deep Neural Networks. In Intl' Conf. on Neural Information Processing Systems. Google ScholarDigital Library
- Neil HE Weste, David Harris, and Ayan Banerjee. 2010. CMOS VLSI design. Pearson India.Google Scholar
- Xuan Yang, Mingyu Gao, Jing Pu, Ankita Nayak, Qiaoyi Liu, Steven Bell, Jeff Setter, Kaidi Cao, Heonjae Ha, Christos Kozyrakis, and Mark Horowitz. 2018. DNN Dataflow Choice Is Overrated. CoRR abs/1809.04070 (2018).Google Scholar
- Xuan Yang, Jing Pu, Blaine Burton Rister, Nikhil Bhagdikar, Stephen Richardson, Shahar Kvatinsky, Jonathan Ragan-Kelley, Ardavan Pedram, and Mark Horowitz. 2016. A Systematic Approach to Blocking Convolutional Neural Networks. CoRR abs/1606.04209 (2016).Google Scholar
- Yang, Tien-Ju and Chen, Yu-Hsin and Sze, Vivienne. 2017. Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning. In IEEE Conf. on Computer Vision and Pattern Recognition.Google Scholar
- Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. Intl' Symp. on Computer Architecture (2017).Google ScholarDigital Library
- Yu-Hsin Chen and Tushar Krishna and Joel Emer and Vivienne Sze. 2016. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. In IEEE Intl' Solid-State Circuits Conf. 262--263.Google ScholarDigital Library
- Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In Intl' Symp. on Field-Programmable Gate Arrays. Google ScholarDigital Library
- Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An Accelerator for Sparse Neural Networks. In Intl' Symp. on Microarchitecture.Google Scholar
- Chuan Zhang Tang and Hon Keung Kwan. 1993. Multilayer Feedforward Neural Networks with Single Powers-of-Two Weights. IEEE Trans. on Signal Processing 41 (09 1993), 2724--2727.Google ScholarDigital Library
- Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights. CoRR abs/1702.03044 (2017).Google Scholar
- Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou. 2016. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. CoRR abs/1606.06160 (2016).Google Scholar
- Xuda Zhou, Zidong Du, Qi Guo, Chengsi Liu, Chao Wang, Xuehai Zhou, Ling Li, Tianshi Chen, and Yunji Chen. 2018. Cambricon-S: Addressing Irregularity in Sparse Neural Networks through a Cooperative Software/Hardware Approach. In Intl' Symp. on Microarchitecture.Google ScholarDigital Library
- Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2016. Trained Ternary Quantization. CoRR abs/1612.01064 (2016).Google Scholar
- Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2017. Learning Transferable Architectures for Scalable Image Recognition. CoRR abs/1707.07012 (2017). arXiv:1707.07012 http://arxiv.org/abs/1707.07012Google Scholar
Recommendations
Deep active inference
This work combines the free energy principle and the ensuing active inference dynamics with recent advances in variational inference in deep generative models, and evolution strategies to introduce the "deep active inference" agent. This agent minimises ...
Deep learning: an overview and main paradigms
In the present paper, we examine and analyze main paradigms of learning of multilayer neural networks starting with a single layer perceptron and ending with deep neural networks, which are considered regarded as a breakthrough in the field of the ...
Comments