Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic

Authors:
Soroush Ghodrati

University of California, San Diego, La Jolla, CA, USA

University of California, San Diego, La Jolla, CA, USA
View Profile

,
Hardik Sharma

Bigstream, Inc, Mountain View, CA, USA

Bigstream, Inc, Mountain View, CA, USA
View Profile

,
Sean Kinzer

University of California, San Diego, La Jolla, CA, USA

University of California, San Diego, La Jolla, CA, USA
View Profile

,
Amir Yazdanbakhsh

Google Research, Mountain View, CA, USA

Google Research, Mountain View, CA, USA
View Profile

,
Jongse Park

KAIST, Daejeon, South Korea

KAIST, Daejeon, South Korea
View Profile

,
Nam Sung Kim

University of Illinois at Urbana Champaign, Urbana Champaign, IL, USA

University of Illinois at Urbana Champaign, Urbana Champaign, IL, USA
View Profile

,
Doug Burger

Microsoft, Redmond, WA, USA

Microsoft, Redmond, WA, USA
View Profile

,
Hadi Esmaeilzadeh

University of California, San Diego, La Jolla, CA, USA

University of California, San Diego, La Jolla, CA, USA
View Profile

PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesSeptember 2020Pages 399–411https://doi.org/10.1145/3410463.3414634

Published:30 September 2020Publication History

PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques

Pages 399–411

ABSTRACT

Albeit low-power, mixed-signal circuitry suffers from significant overhead of Analog to Digital (A/D) conversion, limited range for information encoding, and susceptibility to noise. This paper aims to address these challenges by offering and leveraging the following mathematical insight regarding vector dot-product---the basic operator in Deep Neural Networks (DNNs). This operator can be reformulated as a wide regrouping of spatially parallel low-bitwidth calculations that are interleaved across the bit partitions of multiple elements of the vectors. As such, the computational building block of our accelerator becomes a wide bit-interleaved analog vector unit comprising a collection of low-bitwidth multiply-accumulate modules that operate in the analog domain and share a single A/D converter(ADC). This bit-partitioning results in a lower-resolution ADC while the wide regrouping alleviates the need for A/D conversion per operation, amortizing its cost across multiple bit-partitions of the vector elements. Moreover, the low-bitwidth modules require smaller encoding range and also provide larger margins for noise mitigation. We also utilize the switched-capacitor design for our bit-level reformulation of DNN operations. The proposed switched-capacitor circuitry performs the regrouped multiplications in the charge domain and accumulates the results of the group in its capacitors over multiple cycles. The capacitive accumulation combined with wide bit-partitioned regrouping reduces the rate of A/D conversions, further improving the overall efficiency of the design.

With such mathematical reformulation and its switched-capacitor implementation, we define one possible 3D-stacked microarchitecture, dubbed BiHiwe, that leverages clustering and hierarchical design to best utilize power-efficiency of the mixed-signal domain and 3D stacking. We also build models for noise, computational non-idealities, and variations. For ten DNN benchmarks, BiHiwe delivers 5.5x speedup over a leading purely-digital 3D-stacked accelerator Tetris, with a mere of less than 0.5% accuracy loss achieved by careful treatment of noise, computation error, and various forms of variation. Compared to RTX~2080~TI with tensor cores and Titan Xp GPUs, all with 8-bit execution, BiHiwe offers 35.4x and 70.1x higher Performance-per-Watt, respectively. Relative to the mixed-signal RedEye, ISAAC, and PipeLayer, BiHiwe offers 5.5x, 3.6x, and 9.6x improvement in Performance-per-Watt respectively. The results suggest that BiHiwe is an effective initial step in a road that combines mathematics, circuits, and architecture.

References

N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki. Toward dark silicon in servers. IEEE Micro, 31 (4): 6--15, July--Aug. 2011.Google ScholarDigital Library
Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. Conservation cores: Reducing the energy of mature computations. In ASPLOS, 2010.Google Scholar
Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. Dark silicon and the end of multicore scaling. In ISCA, 2011.Google ScholarDigital Library
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In FPGA, 2015.Google ScholarDigital Library
Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. Neural acceleration for general-purpose approximate programs. in Commun. ACM, 2013.Google ScholarDigital Library
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. Dadiannao: A machine-learning supercomputer. In MICRO, 2014.Google ScholarDigital Library
Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. Tetris: Scalable and efficient neural network acceleration with 3d memory. In ASPLOS, 2017.Google ScholarDigital Library
Alberto Delmas, Sayeh Sharify, Patrick Judd, and Andreas Moshovos. Tartan: Accelerating fully-connected and convolutional layers in deep learning networks by exploiting numerical precision variability. arXiv, 2017.Google Scholar
Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kim, and Hadi Esmaeilzadeh. TABLA: A unified template-based framework for accelerating statistical machine learning. In HPCA, 2016.Google ScholarCross Ref
Shijin Zhang, Zidong Du, Lei Zhang, Huiying Lan, Shaoli Liu, Ling Li, Qi Guo, Tianshi Chen, and Yunji Chen. Cambricon-x: An accelerator for sparse neural networks. In MICRO, 2016.Google ScholarDigital Library
Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. Cnvlutin: ineffectual-neuron-free deep neural network computing. In ISCA, 2016.Google Scholar
Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M Aamodt, and Andreas Moshovos. Stripes: Bit-serial deep neural network computing. In MICRO, 2016.Google ScholarDigital Library
Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kim, Chenkai Shao, Asit Misra, and Hadi Esmaeilzadeh. From high-level deep neural models to fpgas. In MICRO, 2016.Google ScholarDigital Library
Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengil, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Christian Boehn, Oren Firestein, Alessandro Forin, Kang Su Gatlin, Mahdi Ghandi, Stephen Heil, Kyle Holohan, Tamas Juhasz, Ratna Kumar Kovvuri, Sitaram Lanka, Friedel van Megen, Dima Mukhortov, Prerak Patel, Steve Reinhardt, Adam Sapek, Raja Seera, Balaji Sridharan, Lisa Woods, Phillip Yi-Xiao, Ritchie Zhao, and Doug Burger. Accelerating persistent neural networks at datacenter scale. In HotChips, 2017.Google Scholar
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. In ISCA, 2017.Google Scholar
Renzo Andri, Lukas Cavigelli, Davide Rossi, and Luca Benini. Yodann: An ultra-low power convolutional neural network accelerator based on binary weights. arXiv, 2016.Google Scholar
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. Eie: efficient inference engine on compressed deep neural network. In ISCA, 2016.Google ScholarDigital Library
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In ISCA, 2016.Google Scholar
Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. JSSC, 2017.Google ScholarCross Ref
Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. Neurocube: A programmable digital neuromorphic architecture with high-density 3d memory. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on, pages 380--392. IEEE, 2016.Google ScholarDigital Library
Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. In-datacenter performance analysis of a tensor processing unit. In ISCA, 2017.Google Scholar
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In ASPLOS, 2014.Google ScholarDigital Library
Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Vikas Chandra, and Hadi Esmaeilzadeh. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks.Google Scholar
Vahide Aklaghi, Amir Yazdanbakhsh, Kambiz Samadi, Hadi Esmaeilzadeh, and Rajesh K. Gupta. Snapea: Predictive early activation for reducing computation in deep convolutional neural networks. In ISCA, 2018.Google Scholar
Kartik Hegde, Jiyong Yu, Rohit Agrawal, Mengjia Yan, Michael Pellauer, and Christopher W Fletcher. Ucnn: Exploiting computational reuse in deep neural networks via weight repetition. ISCA, 2018.Google Scholar
Jinmook Lee, Changhyeon Kim, Sanghoon Kang, Dongjoo Shin, Sangyeob Kim, and Hoi-Jun Yoo. Unpu: A 50.6 tops/w unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In ISSCC, 2018.Google Scholar
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R Stanley Williams, and Vivek Srikumar. Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In ISCA, 2016.Google Scholar
Prakalp Srivastava, Mingu Kang, Sujan K Gonugondla, Sungmin Lim, Jungwook Choi, Vikram Adve, Nam Sung Kim, and Naresh Shanbhag. Promise: An end-to-end design of a programmable mixed-signal accelerator for machine-learning algorithms. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2018.Google ScholarDigital Library
YP Tsividis and D Anastassiou. Switched-capacitor neural networks. Electronics Letters, 23 (18): 958--959, 1987.Google ScholarCross Ref
Robert LiKamWa, Yunhui Hou, Julian Gao, Mia Polansky, and Lin Zhong. Redeye: analog convnet image sensor architecture for continuous mobile vision. In ACM SIGARCH Computer Architecture News, volume 44, pages 255--266. IEEE Press, 2016.Google ScholarDigital Library
Daniel Bankman and Boris Murmann. Passive charge redistribution digital-to-analogue multiplier. Electronics Letters, 51 (5): 386--388, 2015.Google ScholarCross Ref
E. H. Lee and S. S. Wong. Analysis and design of a passive switched-capacitor matrix multiplier for approximate computing. IEEE Journal of Solid-State Circuits, 52 (1): 261--271, Jan 2017. ISSN 0018-9200. 10.1109/JSSC.2016.2599536.Google ScholarCross Ref
Daniel Bankman, Lita Yang, Bert Moons, Marian Verhelst, and Boris Murmann. An always-on 3.8 μj/86% cifar-10 mixed-signal binary cnn processor with all memory on chip in 28nm cmos. In Solid-State Circuits Conference-(ISSCC), 2018 IEEE International, pages 222--224. IEEE, 2018.Google ScholarCross Ref
Fred N Buhler, Peter Brown, Jiabo Li, Thomas Chen, Zhengya Zhang, and Michael P Flynn. A 3.43 tops/w 48.9 pj/pixel 50.1 nj/classification 512 analog neuron sparse coding neural network with on-chip learning and classification in 40nm cmos. In VLSI Circuits, 2017 Symposium on, pages C30--C31. IEEE, 2017.Google ScholarCross Ref
Renée St. Amant, Amir Yazdanbakhsh, Jongse Park, Bradley Thwaites, Hadi Esmaeilzadeh, Arjang Hassibi, Luis Ceze, and Doug Burger. General-purpose code acceleration with limited-precision analog computation. In ISCA, 2014.Google Scholar
Zhang, Wang, and Verma]zhang201518Jintao Zhang, Zhuo Wang, and Naveen Verma. 18.4 a matrix-multiplying adc implementing a machine-learning classifier directly with data conversion. In Solid-State Circuits Conference-(ISSCC), 2015 IEEE International, pages 1--3. IEEE, 2015.Google ScholarCross Ref
]passive:switchEdward H Lee and S Simon Wong. Analysis and Design of a Passive Switched-Capacitor Matrix Multiplier for Approximate Computing. IEEE Journal of Solid-State Circuits, 52 (1): 261--271, 2017.Google ScholarCross Ref
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In ISCA, 2016.Google Scholar
Song, Qian, Li, and Chen]pipelayer:hpca:2017Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. Pipelayer: A pipelined reram-based accelerator for deep learning. In HPCA, 2017.Google ScholarCross Ref
Sayeh Sharify, Alberto Delmas Lascorz, Patrick Judd, and Andreas Moshovos. Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks. arXiv, 2017.Google Scholar
Paul R Gray, Paul Hurst, Robert G Meyer, and Stephen Lewis. Analysis and design of analog integrated circuits. Wiley, 2001.Google ScholarDigital Library
Pieter Harpe. A 0.0013 mm2 10b 10ms/s sar adc with a 0.0048 mm2 42db-rejection passive fir filter. In 2018 IEEE Custom Integrated Circuits Conference, CICC 2018. Institute of Electrical and Electronics Engineers Inc., 2018.Google Scholar
Facebook AI Research. Caffe2. https://caffe2.ai/.Google Scholar
Angad S Rekhi, Brian Zimmer, Nikola Nedovic, Ningxi Liu, Rangharajan Venkatesan, Miaorong Wang, Brucek Khailany, William J Dally, and C Thomas Gray. Analog/mixed-signal hardware error modeling for deep learning inference. In Proceedings of the 56th Annual Design Automation Conference 2019, page 81. ACM, 2019.Google Scholar
Mohammed Ismail and Terri Fiez. Analog VLSI: signal and information processing, volume 166. McGraw-Hill New York, 1994.Google Scholar
Vaibhav Tripathi and Boris Murmann. Mismatch characterization of small metal fringe capacitors. IEEE Transactions on Circuits and Systems I: Regular Papers, 61 (8): 2236--2242, 2014.Google ScholarCross Ref
Yasuko Eckert, Nuwan Jayasena, and Gabriel H Loh. Thermal feasibility of die-stacked processing in memory. 2014.Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.Google ScholarDigital Library
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009. URL http://image-net.org/.Google ScholarCross Ref
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv, 2014.Google Scholar
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv, 2016.Google Scholar
Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. Rep, 2009.Google Scholar
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1--9, 2015.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.Google ScholarCross Ref
Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.Google Scholar
Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated corpus of english: The penn treebank. Computational linguistics, 1993.Google Scholar
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 1997.Google Scholar
Gao, Pu, Yang, Horowitz, and Kozyrakis]tetris:simulatorMingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. Tetris: Scalable and efficient neural network acceleration with 3d memory. https://github.com/stanford-mast/nn_dataflow, 2017.Google Scholar
Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv, 2016.Google Scholar
Asit K. Mishra, Eriko Nurvitadhi, Jeffrey J. Cook, and Debbie Marr. WRPN: wide reduced-precision networks. arXiv, 2017.Google Scholar
Fengfu Li, Bo Zhang, and Bin Liu. Ternary weight networks. arXiv, 2016.Google Scholar
Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. arXiv preprint arXiv:1807.10029, 2018.Google Scholar
Song, Qian, Li, and Chen]pipelayer:hpca:17Linghao Song, Xuehai Qian, Hai Li, and Yiran Chen. Pipelayer: A pipelined reram-based accelerator for deep learning. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on, pages 541--552. IEEE, 2017.Google ScholarCross Ref
Nvidia tensor rt 5.1. https://developer.nvidia.com/tensorrt.Google Scholar
NCSU. Freepdk45, 2018. URL https://www.eda.ncsu.edu/wiki/FreePDK45.Google Scholar
B. Murmann. ADC Performance Survey 1997--2016. murmann/adcsurvey.html, [Online]. Available. URL http://web.stanford.edu/.Google Scholar
S. Li, K. Chen, J. H. Ahn, J. B. Brockman, and N. P. Jouppi. CACTI-P: Architecture-level Modeling for SRAM-based Structures with Advanced Leakage Reduction Techniques. In ICCAD, 2011.Google ScholarDigital Library
Hybrid Memory Cube Consortium et al. Hybrid memory cube specification 1.0. Last Revision Jan, 2013.Google Scholar
Joe Jeddeloh and Brent Keeth. Hybrid memory cube new dram architecture increases density and performance. In VLSI Technology (VLSIT), 2012 Symposium on, pages 87--88. IEEE, 2012.Google ScholarCross Ref
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NIPS-W, 2017.Google Scholar
Neta Zmora, Guy Jacob, and Gal Novik. Neural network distiller, June 2018. URL https://doi.org/10.5281/zenodo.1297430.Google Scholar
Yun Long, Taesik Na, and Saibal Mukhopadhyay. Reram-based processing-in-memory architecture for recurrent neural network acceleration. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, (99): 1--14, 2018.Google ScholarCross Ref
Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R Stanley Williams, Paolo Faraboschi, John Paul Strachan, Kaushik Roy, and Dejan S Milojicic. Puma: A programmable ultra-efficient memristor-based accelerator for machine learning inference. arXiv preprint arXiv:1901.10351, 2019.Google Scholar
Amir Yazdanbakhsh, Michael Brzozowski, Behnam Khaleghi, Soroush Ghodrati, Kambiz Samadi, Nam Sung Kim, and Hadi Esmaeilzadeh. Flexigan: An end-to-end solution for fpga acceleration of generative adversarial networks. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pages 65--72. IEEE, 2018.Google ScholarCross Ref
Mohammad Samragh, mojan javaheripi, and Farinaz Koushanfar. Encodeep: Realizing bit-flexible encoding for deep neural networks. ACM Transactions on Embedded Computing Systems (TECS).Google Scholar
Bita Darvish Rouhani, Mohammad Samragh, Mojan Javaheripi, Tara Javidi, and Farinaz Koushanfar. Deepfense: Online accelerated defense against adversarial deep learning. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1--8. IEEE, 2018.Google ScholarDigital Library
Shehzeen Hussain, Mojan Javaheripi, Paarth Neekhara, Ryan Kastner, and Farinaz Koushanfar. Fastwave: Accelerating autoregressive convolutional neural networks on fpga. arXiv preprint arXiv:2002.04971, 2020.Google Scholar
Sungju Ryu, Hyungjun Kim, Wooseok Yi, and Jae-Joon Kim. Bitblade: Area and energy-efficient precision-scalable neural network accelerator with bitwise summation. In Proceedings of the 56th Annual Design Automation Conference 2019, pages 1--6, 2019.Google ScholarDigital Library
Soroush Ghodrati, Hardik Sharma, Cliff Young, Nam Sung Kim, and Hadi Esmaeilzadeh. Bit-parallel vector composability for neural acceleration. In Proceedings of the 57th Annual IEEE/ACM Design Automation Conference (DAC), July 2020.Google ScholarDigital Library
Jan Crols and Michel Steyaert. Switched-opamp: An approach to realize full cmos switched-capacitor circuits at very low power supply voltages. IEEE Journal of Solid-State Circuits, 29 (8): 936--942, 1994.Google ScholarCross Ref
John K Fiorenza, Todd Sepke, Peter Holloway, Charles G Sodini, and Hae-Seung Lee. Comparator-based switched-capacitor circuits for scaled cmos technologies. IEEE Journal of Solid-State Circuits, 41 (12): 2658--2668, 2006.Google ScholarCross Ref
Robert W Brodersen, Paul R Gray, and David A Hodges. Mos switched-capacitor filters. Proceedings of the IEEE, 67 (1): 61--75, 1979.Google ScholarCross Ref
Daniel Bankman and Boris Murmann. An 8-bit, 16 input, 3.2 pj/op switched-capacitor dot product circuit in 28-nm fdsoi cmos. In Solid-State Circuits Conference (A-SSCC), 2016 IEEE Asian, pages 21--24. IEEE, 2016.Google ScholarCross Ref
Daisuke Miyashita, Shouhei Kousai, Tomoya Suzuki, and Jun Deguchi. A neuromorphic chip optimized for deep learning and cmos technology with time-domain analog and digital mixed-signal processing. IEEE Journal of Solid-State Circuits, 52 (10): 2679--2689, 2017.Google ScholarCross Ref
Ximing Qiao, Xiong Cao, Huanrui Yang, Linghao Song, and Hai Li. Atomlayer: a universal reram-based cnn accelerator with atomic layer computation. In Proceedings of the 55th Annual Design Automation Conference, page 103. ACM, 2018.Google Scholar
Houxiang Ji, Linghao Song, Li Jiang, Hai Halen Li, and Yiran Chen. Recom: An efficient resistive accelerator for compressed deep neural networks. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, pages 237--240. IEEE, 2018.Google ScholarCross Ref
Bing Li, Linghao Song, Fan Chen, Xuehai Qian, Yiran Chen, and Hai Helen Li. Reram-based accelerator for deep learning. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, pages 815--820. IEEE, 2018.Google ScholarCross Ref
Chen, Li, Chen, Deng, Shen, Liang, and Jiang]chen2017acceleratorLerong Chen, Jiawen Li, Yiran Chen, Qiuping Deng, Jiyuan Shen, Xiaoyao Liang, and Li Jiang. Accelerator-friendly neural-network training: learning variations and defects in rram crossbar. In Proceedings of the Conference on Design, Automation & Test in Europe, pages 19--24. European Design and Automation Association, 2017.Google ScholarCross Ref
Yu Ji, Youyang Zhang, Xinfeng Xie, Shuangchen Li, Peiqi Wang, Xing Hu, Youhui Zhang, and Yuan Xie. Fpsa: A full system stack solution for reconfigurable reram-based nn accelerator architecture. arXiv preprint arXiv:1901.09904, 2019.Google Scholar
Tzu-Hsien Yang, Hsiang-Yun Cheng, Chia-Lin Yang, I Tseng, Han-Wen Hu, Hung-Sheng Chang, Hsiang-Pang Li, et al. Sparse reram engine: joint exploration of activation and weight sparsity in compressed neural networks. In Proceedings of the 46th International Symposium on Computer Architecture, pages 236--249. ACM, 2019.Google ScholarDigital Library

Index Terms

Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic
1. Computer systems organization
  1. Architectures
    1. Other architectures

Recommendations

A 2GS/s 6-bit CMOS time-interleaved ADC for analysis of mixed-signal calibration techniques

A 2-GS/s 6-bit time interleaved (TI) successive approximation register (SAR) analog-to-digital converter (ADC) is designed and fabricated in a 0.13 $$\upmu$$μm CMOS process. The architecture uses 8 time-interleaved track-and-hold amplifiers (THA) and 16 ...
Read More
Low-power mixed-signal CVNS-based 64-bit adder for media signal processing

In this paper, design of a mixed-signal 64-bit adder based on the continuous valued number system (CVNS) is presented. The 64-bit adder is generated by cascading four 16-bit radix-2 CVNS adders. Truncated summation of the CVNS digits reduced the number ...
Read More
Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer Architecture

Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their enormous compute intensity. Fully realizing the potential of acceleration in this domain requires understanding and leveraging algorithmic properties of DNNs. This paper builds upon ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques
September 2020
505 pages
ISBN:9781450380751
DOI:10.1145/3410463
General Chair:
Vivek Sarkar
Georgia Institute of Technology
,
Program Chair:
Hyesoon Kim
Georgia Institute of Technology
Copyright © 2020 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 September 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
accelerators
analog error modeling
analog/mixed-signal computing
bit-partitioning
deep neural networks
dnn
dnn acceleration
mixed-signal acceleration
spatial bit-level regrouping
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate121of471submissions,26%
Upcoming Conference
PACT '24

Sponsor:

sigarch

International Conference on Parallel Architectures and Compilation Techniques

October 14 - 16, 2024

Southern California , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 636
  Total Downloads
- Downloads (Last 12 months)134
- Downloads (Last 6 weeks)22
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic

PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques

ABSTRACT

References

Cited By

Index Terms

Recommendations

A 2GS/s 6-bit CMOS time-interleaved ADC for analysis of mixed-signal calibration techniques

Low-power mixed-signal CVNS-based 64-bit adder for media signal processing

Bit fusion: bit-level dynamically composable architecture for accelerating deep neural networks