ABSTRACT
Deep neural networks (DNN) are increasingly being accelerated on application-specific hardware such as the Google TPU designed especially for deep learning. Timing speculation is a promising approach to further increase the energy efficiency of DNN accelerators. Architectural exploration for timing speculation requires detailed gate-level timing simulations that can be time-consuming for large DNNs which execute millions of multiply-and-accumulate (MAC) operations. In this paper we propose FATE, a new methodology for fast and accurate timing simulations of DNN accelerators like the Google TPU. FATE proposes two novel ideas: (i) DelayNet, a DNN based timing model for MAC units; and (ii) a statistical sampling methodology that reduces the number of MAC operations for which timing simulations are performed. We show that FATE results in between 8 × −58× speed-up in timing simulations, while introducing less than 2% error in classification accuracy estimates. We demonstrate the use of FATE by comparing a conventional DNN accelerator that uses 2's complement (2C) arithmetic with one that uses signed magnitude representation (SMR). We show that that the SMR implementation provides 18% more energy savings for the same classification accuracy than 2C, a result that might be of independent interest.
- [1]. . 2016. Cnvlutin: ineffectual-neuron-free deep neural network computing. In ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 1–13.Google Scholar
- [2]. 2014. Do deep nets really need to be deep?. In Advances in neural information processing systems. 2654–2662.Google Scholar
- [3]. 2007. Improving methods for single-label text categorization. Portugal: Instituto Superior Técnico (2007).Google Scholar
- [4]. . 2010. A dynamically configurable coprocessor for convolutional neural networks. In ACM SIGARCH Computer Architecture News, Vol. 38. ACM, 247–257.Google Scholar
- [5]. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127–138.Google ScholarCross Ref
- [6]. 2010. A unified model for timing speculation: Evaluating the impact of technology scaling, CMOS design style, and fault recovery mechanism. In International Conference on Dependable Systems and Networks. IEEE, 487–496.Google Scholar
- [7]. 2009. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 248–255.Google Scholar
- [8]. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 92–104.Google Scholar
- [9]. 2004. Razor: circuit-level correction of timing errors for low-power operation. IEEE Micro 24, 6 (2004), 10–20.Google ScholarDigital Library
- [10]. 2012. Bubble Razor: An architecture-independent approach to timing-error detection and correction. In International Solid-State Circuits Conference. IEEE, 488–490.Google Scholar
- [11]. 2017. Error propagation aware timing relaxation for approximate near threshold computing. In Proceedings of the 54th Annual Design Automation Conference (DAC). IEEE, 1–6.Google Scholar
- [12]. . 2009. Blueshift: Designing processors for timing speculation from the ground up. In IEEE 15th International Symposium on High Performance Computer Architecture. 213–224.Google Scholar
- [13]. 2017. An assessment of vulnerability of hardware neural networks to dynamic voltage and temperature variations. In Proceedings of the 36th International Conference on Computer-Aided Design. IEEE Press, 945–950.Google Scholar
- [14]. 2017. Clim: A cross-level workload-aware timing error prediction model for functional units. IEEE Trans. Comput. (2017).Google Scholar
- [15]. 2017. In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv: 1704.04760 (2017).Google Scholar
- [16]. . 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725–1732.Google ScholarDigital Library
- [17]. 2018. ZeNA: Zero-Aware Neural Network Accelerator. IEEE Design & Test 35, 1 (2018), 39–46.Google ScholarCross Ref
- [18]. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.Google ScholarDigital Library
- [19]. 2018. Lifetime improvement by exploiting aggressive voltage scaling during runtime of error-resilient applications. Integration 61 (2018), 29–38.Google ScholarDigital Library
- [20]. 2015. 4.6 A1. 93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications. In IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 1–3.Google Scholar
- [21]. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In ACM SIGARCH Computer Architecture News, Vol. 44. IEEE Press, 267–278.Google Scholar
- [22]. 2009. A massively parallel coprocessor for convolutional neural networks. In 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 53–60.Google Scholar
- [23]. 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104–3112.Google ScholarDigital Library
- [24]. 2013. Tagged probabilistic simulation based error probability estimation for better-than-worst case circuit design. In IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC). IEEE, 368–373.Google Scholar
- [25]. 2009. DynaTune: circuit-level optimization for timing speculation considering dynamic path behavior. In Proceedings of International Conference on Computer-Aided Design. ACM, 172–179.Google Scholar
- [26]. 2013. Circuit-level timing error tolerance for low-power DSP filters and transforms. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21, 6 (2013), 989–999.Google Scholar
- [27]. . 2017. 14.3 A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with > 0.1 timing error rate tolerance for IoT applications. In IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 242–243.Google Scholar
- [28]. . 2006. Statistical sampling of microarchitecture simulation. ACM Transactions on Modeling and Computer Simulation (TOMACS) 16, 3 (2006), 197–224.Google ScholarDigital Library
- [29]. 2016. Synergistic timing speculation for multi-threaded programs. In Proceedings of the 53rd Annual Design Automation Conference. ACM, 51.Google Scholar
- [30]. . [n. d.]. Enabling extreme energy efficiency via timing speculation for deep neural network accelerators. ([n. d.]).Google Scholar
- [31]. . 2018. Thun-dervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators. In Proceedings of the 55th Annual Design Automation Conference. ACM, 19.Google Scholar
- [32]. . 2017. BandiTS: dynamic timing speculation using multi-armed bandit based optimization. In Proceedings of the Conference on Design, Automation & Test in Europe. European Design and Automation Association, 922–925.Google Scholar
- [33]. . 2006. Multilabel neural networks with applications to functional genomics and text categorization. IEEE transactions on Knowledge and Data Engineering 18, 10 (2006), 1338–1351.Google ScholarDigital Library
Index Terms
- FATE: Fast and Accurate Timing Error Prediction Framework for Low Power DNN Accelerator Design
Recommendations
FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment. In this paper, we ...
NoC-based DNN accelerator: a future design paradigm
NOCS '19: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-ChipDeep Neural Networks (DNN) have shown significant advantages in many domains such as pattern recognition, prediction, and control optimization. The edge computing demand in the Internet-of-Things era has motivated many kinds of computing platforms to ...
An Error Compensation Technique for Low-Voltage DNN Accelerators
Reducing supply voltages of deep neural network (DNN) accelerators has been of particular interest since it can achieve high energy efficiency for mobile/edge applications. To ensure reliable DNN operations at low voltage, improving the timing error ...
Comments