skip to main content
10.1145/3307650.3322255acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Laconic deep learning inference acceleration

Published:22 June 2019Publication History

ABSTRACT

We present a method for transparently identifying ineffectual computations during inference with Deep Learning models. Specifically, by decomposing multiplications down to the bit level, the amount of work needed by multiplications during inference can be potentially reduced by at least 40× across a wide selection of neural networks (8b and 16b). This method produces numerically identical results and does not affect overall accuracy. We present Laconic, a hardware accelerator that implements this approach to boost energy efficiency for inference with Deep Learning Networks. Laconic judiciously gives up some of the work reduction potential to yield a low-cost, simple, and energy efficient design that outperforms other state-of-the-art accelerators: an optimized DaDianNao-like design [13], Eyeriss [15], SCNN [71], Pragmatic [3], and BitFusion [83]. We study 16b, 8b, and 1b/2b fixed-point quantized models.

References

  1. Eirikur Agustsson, Fabian Mentzer, Michael Tschannen, Lukas Cavigelli, Radu Timofte, Luca Benini, and Luc J. Van Gool. 2017. Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations. In Advances in Neural Information Processing Systems 30: Annual Conf. on Neural Information Processing Systems 2017, 4--9 Dec. 2017, Long Beach, CA, USA. 1141--1151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Vahideh Akhlaghi, Amir Yazdanbakhsh, Kambiz Samadi, Rajesh K. Gupta, and Hadi Esmaeilzadeh. 2018. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks. In Intl' Symp. on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jorge Albericio, Alberto Delmás, Patrick Judd, Sayeh Sharify, Gerard O'Leary, Roman Genov, and Andreas Moshovos. 2017. Bit-pragmatic Deep Neural Network Computing. In Intl' Symp. on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. CNVLUTIN: Ineffectual-Neuron-Free Deep Neural Network Computing. In Intl' Symp. on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Manoj Alwani, Han Chen, Michael Ferdman, and Peter Milder. 2016. Fused-layer CNN accelerators. In Intl' Symp. on Microarchitecture.Google ScholarGoogle ScholarCross RefCross Ref
  6. Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence (2017).Google ScholarGoogle Scholar
  7. Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, and Doina Precup. 2015. Conditional Computation in Neural Networks for faster models. CoRR abs/1511.06297.Google ScholarGoogle Scholar
  8. Gabriel J. Brostow, Julien Fauqueur, and Roberto Cipolla. 2008. Semantic Object Classes in Video: A High-Definition Ground Truth Database. Pattern Recognition Letters (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cadence. 2019. Encounter RTL Compiler. (2019). https://www.cadence.comGoogle ScholarGoogle Scholar
  10. Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In Intl' Conf. on Learning Representations. https://arxiv.org/pdf/1812.00332.pdfGoogle ScholarGoogle Scholar
  11. Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Viji Srinivasan, and Swagath Venkataramani. 2018. Exploiting approximate computing for deep learning acceleration. In Design, Automation & Test in Europe Conf.Google ScholarGoogle Scholar
  12. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Conf. on Architectural support for programming languages and operating systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and O. Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In Intl' Symp. on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-efficient Dataflow for Convolutional Neural Networks. In Intl' Symp. on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yu-Hsin Chen, Tushar Krishna, Joel Emer, and Vivienne Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits 52, 1 (Jan 2017).Google ScholarGoogle ScholarCross RefCross Ref
  16. Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. 2016. Towards the Limit of Network Quantization. CoRR abs/1612.01543 (2016).Google ScholarGoogle Scholar
  17. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Low precision arithmetic for deep learning. CoRR abs/1412.7024 (2014).Google ScholarGoogle Scholar
  18. M. Courbariaux, Y. Bengio, and J.-P. David. 2015. BinaryConnect: Training Deep Neural Networks with binary weights during propagations. CoPR abs/1511.00363.Google ScholarGoogle Scholar
  19. Bin Dai, Chen Zhu, and David P. Wipf. 2018. Compressing Neural Networks using the Variational Information Bottleneck. CoRR abs/1802.10399 (2018).Google ScholarGoogle Scholar
  20. Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc V. Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. CoRR abs/1901.02860 (2019). http://arxiv.org/abs/1901.02860Google ScholarGoogle Scholar
  21. Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. 2017. Language Modeling with Gated Convolutional Networks. In Intl' Conf. on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alberto Delmas, Patrick Judd, Sayeh Sharify, and Andreas Moshovos. 2017. Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks. CoRR abs/1706.00504 (2017).Google ScholarGoogle Scholar
  23. Alberto Delmas, Sayeh Sharify, Patrick Judd, Milos Nikolic, and Andreas Moshovos. 2018. DPRed: Making Typical Activation Values Matter In Deep Learning Computing. CoRR abs/1804.06732 (2018).Google ScholarGoogle Scholar
  24. Alberto Delmas Lascorz, Patrick Judd, Dylan Malone Stuart, Zissis Poulos, Mostafa Mahmoud, Sayeh Sharify, Milos Nikolic, Kevin Siu, and Andreas Moshovos. 2019. Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks. In Intl' Conf. on Architectural Support for Programming Languages and Operating Systems.Google ScholarGoogle Scholar
  25. Lei Deng, Peng Jiao, Jing Pei, Zhenzhi Wu, and Guoqi Li. 2018. GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework. Neural Networks 100 (2018), 49--58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, Xiaolong Ma, Yipeng Zhang, Jian Tang, Qinru Qiu, Xue Lin, and Bo Yuan. 2017. CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices. In Intl' Symp. on Microarchitecture.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark Silicon and the End of Multicore Scaling. In Intl' Symp. on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2015. The Pascal Visual Object Classes Challenge: A Retrospective. Intl' Journal of Computer Vision 111, 1 (Jan. 2015), 98--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Angela Fan, Mike Lewis, and Yann Dauphin. 2018. Hierarchical Neural Story Generation. CoRR abs/1805.04833 (2018).Google ScholarGoogle Scholar
  30. Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory. In Intl' Conf. on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. Tetris: Scalable and efficient neural network acceleration with 3d memory. ACM SIGOPS Operating Systems Review 51, 2 (2017), 751--764. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Michaël Gharbi, Gaurav Chaurasia, Sylvain Paris, and Frédo Durand. 2016. Deep Joint Demosaicking and Denoising. ACM Trans. on Graphics (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic Network Surgery for Efficient DNNs. CoRR abs/1608.04493 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep Learning with Limited Numerical Precision. In Intl' Conf. on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Song Han and William J. Dally. 2018. Bandwidth-efficient Deep Learning. In Design Automation Conf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William (Bill) J. Dally. 2017. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA. In Intl' Symp. on Field-Programmable Gate Arrays. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In Intl' Symp. on Computer Architecture.Google ScholarGoogle Scholar
  38. Song Han, Huizi Mao, and William J. Dally. 2015. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. CoPR abs/1510.00149 (2015).Google ScholarGoogle Scholar
  39. Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning Both Weights and Connections for Efficient Neural Networks. In Intl' Conf. on Neural Information Processing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR abs/1512.03385 (2015).Google ScholarGoogle Scholar
  41. Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. AMC: AutoML for Model Compression and Acceleration on Mobile Devices. In European Conf. on Computer Vision.Google ScholarGoogle Scholar
  42. Mark Horowitz. 2014. Computing's energy problem (and what we can do about it). IEEE Intl' Solid-State Circuits Conf. 57 (02 2014), 10--14.Google ScholarGoogle ScholarCross RefCross Ref
  43. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. Journal of Machine Learning Research 18 (2017), 187:1--187:30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yani Ioannou, Duncan P. Robertson, Darko Zikic, Peter Kontschieder, Jamie Shotton, Matthew Brown, and Antonio Criminisi. 2016. Decision Forests, Convolutional Networks and the Models in-Between. CoRR abs/1603.01250 (2016).Google ScholarGoogle Scholar
  45. Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew G. Howard, Hartwig Adam, and Dmitry Kalenichenko. 2017. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. CoRR abs/1712.05877 (2017).Google ScholarGoogle Scholar
  46. Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks. In Workshop On Approximate Computing (WAPCO).Google ScholarGoogle Scholar
  47. Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, Raquel Urtasun, and Andreas Moshovos. 2015. Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets. CoPR abs/1511.05236v4 (2015).Google ScholarGoogle Scholar
  48. Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial Deep Neural Network Computing . In Intl' Symp. on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Supriya Kapur, Asit K. Mishra, and Debbie Marr. 2017. Low Precision RNNs: Quantizing RNNs Without Losing Accuracy. CoRR abs/1710.07706 (2017).Google ScholarGoogle Scholar
  50. Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo. 2017. A novel zero weight/activation-aware hardware architecture of convolutional neural network. In Design Automation and Test Europe. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo. 2018. ZeNA: Zero-Aware Neural Network Accelerator. IEEE Design Test 35 (2018), 39--46.Google ScholarGoogle ScholarCross RefCross Ref
  52. Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. Neurocube: A Programmable Digital Neuromorphic Architecture with High-density 3D Memory. In Intl' Symp. on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Minje Kim and Paris Smaragdis. 2016. Bitwise Neural Networks. CoRR abs/1601.06071 (2016).Google ScholarGoogle Scholar
  54. H.T. Kung, Bradley McDanel, and Sai Qian Zhang. 2019. Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization. In Intl' Conf. on Architectural Support for Programming Languages and Operating Systems.Google ScholarGoogle Scholar
  55. Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. In Intl' Conf. on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Vadim Lebedev and Victor S. Lempitsky. 2016. Fast ConvNets Using Group-Wise Brain Damage. In Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  57. Dongwoo Lee, Sungbum Kang, and Kiyoung Choi. 2018. ComPEND: Computation Pruning through Early Negative Detection for ReLU in a deep neural network accelerator. In Intl' Conf. on Supercomputing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Jonathan Lew, Deval Shah, Suchita Pati, Shaylin Cattell, Mengchi Zhang, Amruth Sandhupatla, Christopher Ng, Negar Goli, Matthew D. Sinclair, Timothy G. Rogers, and Tor M. Aamodt. 2018. Analyzing Machine Learning Workloads Using a Detailed GPU Simulator. CoRR abs/1811.08933 (2018). arXiv:1811.08933 http://arxiv.org/abs/1811.08933Google ScholarGoogle Scholar
  59. Dingyi Li and Zengfu Wang. 2017. Video Superresolution via Motion Compensation and Deep Residual Learning. IEEE Trans. on Computational Imaging.Google ScholarGoogle Scholar
  60. Fengfu Li and Bin Liu. 2016. Ternary Weight Networks. CoRR abs/1605.04711 (2016).Google ScholarGoogle Scholar
  61. Darryl D. Lin, Sachin S. Talathi, and V. Sreekanth Annapureddy. 2016. Fixed Point Quantization of Deep Convolutional Networks. In Intl' Conf. on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Ji Lin, Yongming Rao, Jiwen Lu, and Jie Zhou. 2017. Runtime Neural Pruning. In Advances in Neural Information Processing Systems 30. Curran Associates, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Zhenhong Liu, Amir Yazdanbakhsh, Taejoon Park, Hadi Esmaeilzadeh, and Nam Kim. 2018. SiMul: An Algorithm-Driven Approximate Multiplier Design for Machine Learning. IEEE Micro 38 (2018), 50--59.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Christos Louizos, Karen Ullrich, and Max Welling. 2017. Bayesian Compression for Deep Learning. In Conf. on Neural Information Processing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. 2017. FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks. In Intl' Symp. on High Performance Computer Architecture.Google ScholarGoogle Scholar
  66. David Martin, Charless Fowlkes, Doron Tal, and Jitendra Malik. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Intl' Conf. on Computer Vision, Vol. 2. 416--423 vol. 2.Google ScholarGoogle ScholarCross RefCross Ref
  67. Mayler Martins, Jody Maick Matos, Renato P Ribas, André Reis, Guilherme Schlinker, Lucio Rech, and Jens Michelsen. 2015. Open cell library in 15nm FreePDK technology. In Proceedings of the 2015 Symposium on International Symposium on Physical Design. ACM, 171--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Szymon Migacz. 2017. 8-bit Inference with TensorRT. GPU Technology Conf.Google ScholarGoogle Scholar
  69. Asit K. Mishra, Eriko Nurvitadhi, Jeffrey J. Cook, and Debbie Marr. 2017. WRPN: Wide Reduced-Precision Networks. CoRR abs/1709.01134(2017).Google ScholarGoogle Scholar
  70. Naveen Muralimanohar and Rajeev Balasubramonian. 2015. CACTI 6.0: A Tool to Understand Large Caches. (2015).Google ScholarGoogle Scholar
  71. Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In Intl' Symp. on Computer Architecture (ISCA '17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Eunhyeok Park, Junwhan Ahn, and Sungjoo Yoo. 2017. Weighted-Entropy-Based Quantization for Deep Neural Networks. In Conf. on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  73. Eunhyeok Park, Dongyoung Kim, and Sungjoo Yoo. 2018. Energy-Efficient Neural Network Accelerator Based on Outlier-Aware Low-Precision Computation. In Intl' Symp. on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Eunhyeok Park, Sungjoo Yoo, and Peter Vajda. 2018. Value-Aware Quantization for Training and Inference of Neural Networks. In European Conf. on Computer Vision.Google ScholarGoogle Scholar
  75. Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. 2017. Faster CNNs with Direct Sparse Convolutions and Guided Pruning. In Intl' Conf. on Learning Representations.Google ScholarGoogle Scholar
  76. Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network. In Intl' Symp. on Field-Programmable Gate Arrays.Google ScholarGoogle Scholar
  77. Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting Image Annotations Using Amazon's Mechanical Turk. In NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR abs/1603.05279 (2016).Google ScholarGoogle Scholar
  79. Joseph Redmon and Ali Farhadi. 2016. YOLO9000: Better, Faster, Stronger. CoRR abs/1612.08242 (2016).Google ScholarGoogle Scholar
  80. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2014. ImageNet Large Scale Visual Recognition Challenge. CoRR abs/1409.0575 (Sept. 2014).Google ScholarGoogle Scholar
  81. Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. CoRR abs/1801.04381.Google ScholarGoogle Scholar
  82. Sayeh Sharify, Alberto Delmas Lascorz, Kevin Siu, Patrick Judd, and Andreas Moshovos. 2018. Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks. In Proceedings of the 55th Annual Design Automation Conference. ACM, 20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, and Hadi Esmaeilzadeh. 2018. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network. In ISCA. IEEE Computer Society, 764--775. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Sungho Shin, Kyuyeon Hwang, and Wonyong Sung. 2015. Fixed Point Performance Analysis of Recurrent Neural Networks. CoRR abs/1512.01322 (2015).Google ScholarGoogle Scholar
  85. Kevin Siu, Dylan Malone Stuart, Mostafa Mahmoud, and Andreas Moshovos. 2018. Memory Requirements for Convolutional Neural Network Hardware Accelerators. In IEEE Intl' Symp. on Workload Characterization.Google ScholarGoogle Scholar
  86. Synopsys. 2019. Design Compiler. http://www.synopsys.com/Tools/Implementation/RTLSynthesis/DesignCompiler/Pages. (2019).Google ScholarGoogle Scholar
  87. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In IEEE Conf. on computer vision and pattern recognition.Google ScholarGoogle ScholarCross RefCross Ref
  88. Karen Ullrich, Edward Meeds, and Max Welling. 2017. Soft Weight-Sharing for Neural Network Compression. CoRR abs/1702.04008 (2017).Google ScholarGoogle Scholar
  89. Cheng Wang, Haojin Yang, Christian Bartz, and Christoph Meinel. 2016. Image captioning with deep bidirectional LSTMs. In ACM Multimedia Conf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Peter Warden. 2016. Low-precision matrix multiplication. https://petewarden.com.Google ScholarGoogle Scholar
  91. Pete Warden. 2017. How to Quantize Neural Networks with TensorFlow. https://www.tensorflow.org/performance/quantization. (2017).Google ScholarGoogle Scholar
  92. Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and Jason Cong. 2017. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. In Annual Design Automation Conf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning Structured Sparsity in Deep Neural Networks. In Intl' Conf. on Neural Information Processing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Neil HE Weste, David Harris, and Ayan Banerjee. 2010. CMOS VLSI design. Pearson India.Google ScholarGoogle Scholar
  95. Xuan Yang, Mingyu Gao, Jing Pu, Ankita Nayak, Qiaoyi Liu, Steven Bell, Jeff Setter, Kaidi Cao, Heonjae Ha, Christos Kozyrakis, and Mark Horowitz. 2018. DNN Dataflow Choice Is Overrated. CoRR abs/1809.04070 (2018).Google ScholarGoogle Scholar
  96. Xuan Yang, Jing Pu, Blaine Burton Rister, Nikhil Bhagdikar, Stephen Richardson, Shahar Kvatinsky, Jonathan Ragan-Kelley, Ardavan Pedram, and Mark Horowitz. 2016. A Systematic Approach to Blocking Convolutional Neural Networks. CoRR abs/1606.04209 (2016).Google ScholarGoogle Scholar
  97. Yang, Tien-Ju and Chen, Yu-Hsin and Sze, Vivienne. 2017. Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning. In IEEE Conf. on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  98. Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. Intl' Symp. on Computer Architecture (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Yu-Hsin Chen and Tushar Krishna and Joel Emer and Vivienne Sze. 2016. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. In IEEE Intl' Solid-State Circuits Conf. 262--263.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In Intl' Symp. on Field-Programmable Gate Arrays. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. 2016. Cambricon-X: An Accelerator for Sparse Neural Networks. In Intl' Symp. on Microarchitecture.Google ScholarGoogle Scholar
  102. Chuan Zhang Tang and Hon Keung Kwan. 1993. Multilayer Feedforward Neural Networks with Single Powers-of-Two Weights. IEEE Trans. on Signal Processing 41 (09 1993), 2724--2727.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights. CoRR abs/1702.03044 (2017).Google ScholarGoogle Scholar
  104. Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou. 2016. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. CoRR abs/1606.06160 (2016).Google ScholarGoogle Scholar
  105. Xuda Zhou, Zidong Du, Qi Guo, Chengsi Liu, Chao Wang, Xuehai Zhou, Ling Li, Tianshi Chen, and Yunji Chen. 2018. Cambricon-S: Addressing Irregularity in Sparse Neural Networks through a Cooperative Software/Hardware Approach. In Intl' Symp. on Microarchitecture.Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2016. Trained Ternary Quantization. CoRR abs/1612.01064 (2016).Google ScholarGoogle Scholar
  107. Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. 2017. Learning Transferable Architectures for Scalable Image Recognition. CoRR abs/1707.07012 (2017). arXiv:1707.07012 http://arxiv.org/abs/1707.07012Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture
    June 2019
    849 pages
    ISBN:9781450366694
    DOI:10.1145/3307650

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 22 June 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    ISCA '19 Paper Acceptance Rate62of365submissions,17%Overall Acceptance Rate543of3,203submissions,17%

    Upcoming Conference

    ISCA '24

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader