skip to main content
10.1145/3410463.3414634acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article
Open Access

Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic

Published:30 September 2020Publication History

ABSTRACT

Albeit low-power, mixed-signal circuitry suffers from significant overhead of Analog to Digital (A/D) conversion, limited range for information encoding, and susceptibility to noise. This paper aims to address these challenges by offering and leveraging the following mathematical insight regarding vector dot-product---the basic operator in Deep Neural Networks (DNNs). This operator can be reformulated as a wide regrouping of spatially parallel low-bitwidth calculations that are interleaved across the bit partitions of multiple elements of the vectors. As such, the computational building block of our accelerator becomes a wide bit-interleaved analog vector unit comprising a collection of low-bitwidth multiply-accumulate modules that operate in the analog domain and share a single A/D converter(ADC). This bit-partitioning results in a lower-resolution ADC while the wide regrouping alleviates the need for A/D conversion per operation, amortizing its cost across multiple bit-partitions of the vector elements. Moreover, the low-bitwidth modules require smaller encoding range and also provide larger margins for noise mitigation. We also utilize the switched-capacitor design for our bit-level reformulation of DNN operations. The proposed switched-capacitor circuitry performs the regrouped multiplications in the charge domain and accumulates the results of the group in its capacitors over multiple cycles. The capacitive accumulation combined with wide bit-partitioned regrouping reduces the rate of A/D conversions, further improving the overall efficiency of the design.

With such mathematical reformulation and its switched-capacitor implementation, we define one possible 3D-stacked microarchitecture, dubbed BiHiwe, that leverages clustering and hierarchical design to best utilize power-efficiency of the mixed-signal domain and 3D stacking. We also build models for noise, computational non-idealities, and variations. For ten DNN benchmarks, BiHiwe delivers 5.5x speedup over a leading purely-digital 3D-stacked accelerator Tetris, with a mere of less than 0.5% accuracy loss achieved by careful treatment of noise, computation error, and various forms of variation. Compared to RTX~2080~TI with tensor cores and Titan Xp GPUs, all with 8-bit execution, BiHiwe offers 35.4x and 70.1x higher Performance-per-Watt, respectively. Relative to the mixed-signal RedEye, ISAAC, and PipeLayer, BiHiwe offers 5.5x, 3.6x, and 9.6x improvement in Performance-per-Watt respectively. The results suggest that BiHiwe is an effective initial step in a road that combines mathematics, circuits, and architecture.

References

  1. N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31 (4): 6--15, July--Aug. 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. Conservation cores: Reducing the energy of mature computations. In ASPLOS, 2010.Google ScholarGoogle Scholar
  3. Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. Dark silicon and the end of multicore scaling. In ISCA, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In FPGA, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. Neural acceleration for general-purpose approximate programs. in Commun. ACM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. Dadiannao: A machine-learning supercomputer. In MICRO, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. Tetris: Scalable and efficient neural network acceleration with 3d memory. In ASPLOS, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Alberto Delmas, Sayeh Sharify, Patrick Judd, and Andreas Moshovos. Tartan: Accelerating fully-connected and convolutional layers in deep learning networks by exploiting numerical precision variability. arXiv, 2017.Google ScholarGoogle Scholar
  9. Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kim, and Hadi Esmaeilzadeh. TABLA: A unified template-based framework for accelerating statistical machine learning. In HPCA, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  10. Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. Cambricon-x: An accelerator for sparse neural networks. In MICRO, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. Cnvlutin: ineffectual-neuron-free deep neural network computing. In ISCA, 2016.Google ScholarGoogle Scholar
  12. Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M Aamodt, and Andreas Moshovos. Stripes: Bit-serial deep neural network computing. In MICRO, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kim, Chenkai Shao, Asit Misra, and Hadi Esmaeilzadeh. From high-level deep neural models to fpgas. In MICRO, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengil, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Christian Boehn, Oren Firestein, Alessandro Forin, Kang Su Gatlin, Mahdi Ghandi, Stephen Heil, Kyle Holohan, Tamas Juhasz, Ratna Kumar Kovvuri, Sitaram Lanka, Friedel van Megen, Dima Mukhortov, Prerak Patel, Steve Reinhardt, Adam Sapek, Raja Seera, Balaji Sridharan, Lisa Woods, Phillip Yi-Xiao, Ritchie Zhao, and Doug Burger. Accelerating persistent neural networks at datacenter scale. In HotChips, 2017.Google ScholarGoogle Scholar
  15. Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In ISCA, 2017.Google ScholarGoogle Scholar
  16. Renzo Andri, Lukas Cavigelli, Davide Rossi, and Luca Benini. Yodann: An ultra-low power convolutional neural network accelerator based on binary weights. arXiv, 2016.Google ScholarGoogle Scholar
  17. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. Eie: efficient inference engine on compressed deep neural network. In ISCA, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yu-Hsin Chen, Joel Emer, and Vivienne Sze. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In ISCA, 2016.Google ScholarGoogle Scholar
  19. Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. JSSC, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  20. Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. Neurocube: A programmable digital neuromorphic architecture with high-density 3d memory. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on, pages 380--392. IEEE, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. In-datacenter performance analysis of a tensor processing unit. In ISCA, 2017.Google ScholarGoogle Scholar
  22. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In ASPLOS, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, and Hadi Esmaeilzadeh. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks.Google ScholarGoogle Scholar
  24. Vahide Aklaghi, Amir Yazdanbakhsh, Kambiz Samadi, Hadi Esmaeilzadeh, and Rajesh K. Gupta. Snapea: Predictive early activation for reducing computation in deep convolutional neural networks. In ISCA, 2018.Google ScholarGoogle Scholar
  25. Kartik Hegde, Jiyong Yu, Rohit Agrawal, Mengjia Yan, Michael Pellauer, and Christopher W Fletcher. Ucnn: Exploiting computational reuse in deep neural networks via weight repetition. ISCA, 2018.Google ScholarGoogle Scholar
  26. Jinmook Lee, Changhyeon Kim, Sanghoon Kang, Dongjoo Shin, Sangyeob Kim, and Hoi-Jun Yoo. Unpu: A 50.6 tops/w unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In ISSCC, 2018.Google ScholarGoogle Scholar
  27. Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In ISCA, 2016.Google ScholarGoogle Scholar
  28. Prakalp Srivastava, Mingu Kang, Sujan K Gonugondla, Sungmin Lim, Jungwook Choi, Vikram Adve, Nam Sung Kim, and Naresh Shanbhag. Promise: An end-to-end design of a programmable mixed-signal accelerator for machine-learning algorithms. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. YP Tsividis and D Anastassiou. Switched-capacitor neural networks. Electronics Letters, 23 (18): 958--959, 1987.Google ScholarGoogle ScholarCross RefCross Ref
  30. Robert LiKamWa, Yunhui Hou, Julian Gao, Mia Polansky, and Lin Zhong. Redeye: analog convnet image sensor architecture for continuous mobile vision. In ACM SIGARCH Computer Architecture News, volume 44, pages 255--266. IEEE Press, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Daniel Bankman and Boris Murmann. Passive charge redistribution digital-to-analogue multiplier. Electronics Letters, 51 (5): 386--388, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  32. E. H. Lee and S. S. Wong. Analysis and design of a passive switched-capacitor matrix multiplier for approximate computing. IEEE Journal of Solid-State Circuits, 52 (1): 261--271, Jan 2017. ISSN 0018-9200. 10.1109/JSSC.2016.2599536.Google ScholarGoogle ScholarCross RefCross Ref
  33. Daniel Bankman, Lita Yang, Bert Moons, Marian Verhelst, and Boris Murmann. An always-on 3.8 μj/86% cifar-10 mixed-signal binary cnn processor with all memory on chip in 28nm cmos. In Solid-State Circuits Conference-(ISSCC), 2018 IEEE International, pages 222--224. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  34. Fred N Buhler, Peter Brown, Jiabo Li, Thomas Chen, Zhengya Zhang, and Michael P Flynn. A 3.43 tops/w 48.9 pj/pixel 50.1 nj/classification 512 analog neuron sparse coding neural network with on-chip learning and classification in 40nm cmos. In VLSI Circuits, 2017 Symposium on, pages C30--C31. IEEE, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  35. Renée St. Amant, Amir Yazdanbakhsh, Jongse Park, Bradley Thwaites, Hadi Esmaeilzadeh, Arjang Hassibi, Luis Ceze, and Doug Burger. General-purpose code acceleration with limited-precision analog computation. In ISCA, 2014.Google ScholarGoogle Scholar
  36. Zhang, Wang, and Verma]zhang201518Jintao Zhang, Zhuo Wang, and Naveen Verma. 18.4 a matrix-multiplying adc implementing a machine-learning classifier directly with data conversion. In Solid-State Circuits Conference-(ISSCC), 2015 IEEE International, pages 1--3. IEEE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  37. ]passive:switchEdward H Lee and S Simon Wong. Analysis and Design of a Passive Switched-Capacitor Matrix Multiplier for Approximate Computing. IEEE Journal of Solid-State Circuits, 52 (1): 261--271, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  38. Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In ISCA, 2016.Google ScholarGoogle Scholar
  39. Song, Qian, Li, and Chen]pipelayer:hpca:2017Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. Pipelayer: A pipelined reram-based accelerator for deep learning. In HPCA, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  40. Sayeh Sharify, Alberto Delmas Lascorz, Patrick Judd, and Andreas Moshovos. Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks. arXiv, 2017.Google ScholarGoogle Scholar
  41. Paul R Gray, Paul Hurst, Robert G Meyer, and Stephen Lewis. Analysis and design of analog integrated circuits. Wiley, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Pieter Harpe. A 0.0013 mm2 10b 10ms/s sar adc with a 0.0048 mm2 42db-rejection passive fir filter. In 2018 IEEE Custom Integrated Circuits Conference, CICC 2018. Institute of Electrical and Electronics Engineers Inc., 2018.Google ScholarGoogle Scholar
  43. Facebook AI Research. Caffe2. https://caffe2.ai/.Google ScholarGoogle Scholar
  44. Angad S Rekhi, Brian Zimmer, Nikola Nedovic, Ningxi Liu, Rangharajan Venkatesan, Miaorong Wang, Brucek Khailany, William J Dally, and C Thomas Gray. Analog/mixed-signal hardware error modeling for deep learning inference. In Proceedings of the 56th Annual Design Automation Conference 2019, page 81. ACM, 2019.Google ScholarGoogle Scholar
  45. Mohammed Ismail and Terri Fiez. Analog VLSI: signal and information processing, volume 166. McGraw-Hill New York, 1994.Google ScholarGoogle Scholar
  46. Vaibhav Tripathi and Boris Murmann. Mismatch characterization of small metal fringe capacitors. IEEE Transactions on Circuits and Systems I: Regular Papers, 61 (8): 2236--2242, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  47. Yasuko Eckert, Nuwan Jayasena, and Gabriel H Loh. Thermal feasibility of die-stacked processing in memory. 2014.Google ScholarGoogle Scholar
  48. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. URL http://image-net.org/.Google ScholarGoogle ScholarCross RefCross Ref
  50. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv, 2014.Google ScholarGoogle Scholar
  51. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv, 2016.Google ScholarGoogle Scholar
  52. Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Rep, 2009.Google ScholarGoogle Scholar
  53. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1--9, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  54. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  55. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.Google ScholarGoogle Scholar
  56. Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 1993.Google ScholarGoogle Scholar
  57. Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 1997.Google ScholarGoogle Scholar
  58. Gao, Pu, Yang, Horowitz, and Kozyrakis]tetris:simulatorMingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. Tetris: Scalable and efficient neural network acceleration with 3d memory. https://github.com/stanford-mast/nn_dataflow, 2017.Google ScholarGoogle Scholar
  59. Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv, 2016.Google ScholarGoogle Scholar
  60. Asit K. Mishra, Eriko Nurvitadhi, Jeffrey J. Cook, and Debbie Marr. WRPN: wide reduced-precision networks. arXiv, 2017.Google ScholarGoogle Scholar
  61. Fengfu Li, Bo Zhang, and Bin Liu. Ternary weight networks. arXiv, 2016.Google ScholarGoogle Scholar
  62. Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. arXiv preprint arXiv:1807.10029, 2018.Google ScholarGoogle Scholar
  63. Song, Qian, Li, and Chen]pipelayer:hpca:17Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. Pipelayer: A pipelined reram-based accelerator for deep learning. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on, pages 541--552. IEEE, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  64. Nvidia tensor rt 5.1. https://developer.nvidia.com/tensorrt.Google ScholarGoogle Scholar
  65. NCSU. Freepdk45, 2018. URL https://www.eda.ncsu.edu/wiki/FreePDK45.Google ScholarGoogle Scholar
  66. B. Murmann. ADC Performance Survey 1997--2016. murmann/adcsurvey.html, [Online]. Available. URL http://web.stanford.edu/.Google ScholarGoogle Scholar
  67. S. Li, K. Chen, J. H. Ahn, J. B. Brockman, and N. P. Jouppi. CACTI-P: Architecture-level Modeling for SRAM-based Structures with Advanced Leakage Reduction Techniques. In ICCAD, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Hybrid Memory Cube Consortium et al. Hybrid memory cube specification 1.0. Last Revision Jan, 2013.Google ScholarGoogle Scholar
  69. Joe Jeddeloh and Brent Keeth. Hybrid memory cube new dram architecture increases density and performance. In VLSI Technology (VLSIT), 2012 Symposium on, pages 87--88. IEEE, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  70. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NIPS-W, 2017.Google ScholarGoogle Scholar
  71. Neta Zmora, Guy Jacob, and Gal Novik. Neural network distiller, June 2018. URL https://doi.org/10.5281/zenodo.1297430.Google ScholarGoogle Scholar
  72. Yun Long, Taesik Na, and Saibal Mukhopadhyay. Reram-based processing-in-memory architecture for recurrent neural network acceleration. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, (99): 1--14, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  73. Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R Stanley Williams, Paolo Faraboschi, John Paul Strachan, Kaushik Roy, and Dejan S Milojicic. Puma: A programmable ultra-efficient memristor-based accelerator for machine learning inference. arXiv preprint arXiv:1901.10351, 2019.Google ScholarGoogle Scholar
  74. Amir Yazdanbakhsh, Michael Brzozowski, Behnam Khaleghi, Soroush Ghodrati, Kambiz Samadi, Nam Sung Kim, and Hadi Esmaeilzadeh. Flexigan: An end-to-end solution for fpga acceleration of generative adversarial networks. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 65--72. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  75. Mohammad Samragh, mojan javaheripi, and Farinaz Koushanfar. Encodeep: Realizing bit-flexible encoding for deep neural networks. ACM Transactions on Embedded Computing Systems (TECS).Google ScholarGoogle Scholar
  76. Bita Darvish Rouhani, Mohammad Samragh, Mojan Javaheripi, Tara Javidi, and Farinaz Koushanfar. Deepfense: Online accelerated defense against adversarial deep learning. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1--8. IEEE, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Shehzeen Hussain, Mojan Javaheripi, Paarth Neekhara, Ryan Kastner, and Farinaz Koushanfar. Fastwave: Accelerating autoregressive convolutional neural networks on fpga. arXiv preprint arXiv:2002.04971, 2020.Google ScholarGoogle Scholar
  78. Sungju Ryu, Hyungjun Kim, Wooseok Yi, and Jae-Joon Kim. Bitblade: Area and energy-efficient precision-scalable neural network accelerator with bitwise summation. In Proceedings of the 56th Annual Design Automation Conference 2019, pages 1--6, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Soroush Ghodrati, Hardik Sharma, Cliff Young, Nam Sung Kim, and Hadi Esmaeilzadeh. Bit-parallel vector composability for neural acceleration. In Proceedings of the 57th Annual IEEE/ACM Design Automation Conference (DAC), July 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Jan Crols and Michel Steyaert. Switched-opamp: An approach to realize full cmos switched-capacitor circuits at very low power supply voltages. IEEE Journal of Solid-State Circuits, 29 (8): 936--942, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  81. John K Fiorenza, Todd Sepke, Peter Holloway, Charles G Sodini, and Hae-Seung Lee. Comparator-based switched-capacitor circuits for scaled cmos technologies. IEEE Journal of Solid-State Circuits, 41 (12): 2658--2668, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  82. Robert W Brodersen, Paul R Gray, and David A Hodges. Mos switched-capacitor filters. Proceedings of the IEEE, 67 (1): 61--75, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  83. Daniel Bankman and Boris Murmann. An 8-bit, 16 input, 3.2 pj/op switched-capacitor dot product circuit in 28-nm fdsoi cmos. In Solid-State Circuits Conference (A-SSCC), 2016 IEEE Asian, pages 21--24. IEEE, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  84. Daisuke Miyashita, Shouhei Kousai, Tomoya Suzuki, and Jun Deguchi. A neuromorphic chip optimized for deep learning and cmos technology with time-domain analog and digital mixed-signal processing. IEEE Journal of Solid-State Circuits, 52 (10): 2679--2689, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  85. Ximing Qiao, Xiong Cao, Huanrui Yang, Linghao Song, and Hai Li. Atomlayer: a universal reram-based cnn accelerator with atomic layer computation. In Proceedings of the 55th Annual Design Automation Conference, page 103. ACM, 2018.Google ScholarGoogle Scholar
  86. Houxiang Ji, Linghao Song, Li Jiang, Hai Halen Li, and Yiran Chen. Recom: An efficient resistive accelerator for compressed deep neural networks. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, pages 237--240. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  87. Bing Li, Linghao Song, Fan Chen, Xuehai Qian, Yiran Chen, and Hai Helen Li. Reram-based accelerator for deep learning. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, pages 815--820. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  88. Chen, Li, Chen, Deng, Shen, Liang, and Jiang]chen2017acceleratorLerong Chen, Jiawen Li, Yiran Chen, Qiuping Deng, Jiyuan Shen, Xiaoyao Liang, and Li Jiang. Accelerator-friendly neural-network training: learning variations and defects in rram crossbar. In Proceedings of the Conference on Design, Automation & Test in Europe, pages 19--24. European Design and Automation Association, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  89. Yu Ji, Youyang Zhang, Xinfeng Xie, Shuangchen Li, Peiqi Wang, Xing Hu, Youhui Zhang, and Yuan Xie. Fpsa: A full system stack solution for reconfigurable reram-based nn accelerator architecture. arXiv preprint arXiv:1901.09904, 2019.Google ScholarGoogle Scholar
  90. Tzu-Hsien Yang, Hsiang-Yun Cheng, Chia-Lin Yang, I Tseng, Han-Wen Hu, Hung-Sheng Chang, Hsiang-Pang Li, et al. Sparse reram engine: joint exploration of activation and weight sparsity in compressed neural networks. In Proceedings of the 46th International Symposium on Computer Architecture, pages 236--249. ACM, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader