research-article

FATE: Fast and Accurate Timing Error Prediction Framework for Low Power DNN Accelerator Design

Authors:
Jeff Jun Zhang

New York University

New York University
View Profile

,
Siddharth Garg

New York University

New York University
View Profile

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)Nov 2018Pages 1–8https://doi.org/10.1145/3240765.3240809

Published:05 November 2018Publication History

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pages 1–8

ABSTRACT

Deep neural networks (DNN) are increasingly being accelerated on application-specific hardware such as the Google TPU designed especially for deep learning. Timing speculation is a promising approach to further increase the energy efficiency of DNN accelerators. Architectural exploration for timing speculation requires detailed gate-level timing simulations that can be time-consuming for large DNNs which execute millions of multiply-and-accumulate (MAC) operations. In this paper we propose FATE, a new methodology for fast and accurate timing simulations of DNN accelerators like the Google TPU. FATE proposes two novel ideas: (i) DelayNet, a DNN based timing model for MAC units; and (ii) a statistical sampling methodology that reduces the number of MAC operations for which timing simulations are performed. We show that FATE results in between 8 × −58× speed-up in timing simulations, while introducing less than 2% error in classification accuracy estimates. We demonstrate the use of FATE by comparing a conventional DNN accelerator that uses 2's complement (2C) arithmetic with one that uses signed magnitude representation (SMR). We show that that the SMR implementation provides 18% more energy savings for the same classification accuracy than 2C, a result that might be of independent interest.

References

[1].Albericio Jorge, Judd Patrick, Hetherington Tayler, Aamodt Tor, Enright Jerger Natalie, and Moshovos Andreas. 2016. Cnvlutin: ineffectual-neuron-free deep neural network computing. In ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 1–13.Google Scholar
[2].Ba Jimmy and Caruana Rich 2014. Do deep nets really need to be deep?. In Advances in neural information processing systems. 2654–2662.Google Scholar
[3].Margarida de Jesus Cardoso Cachopo Ana 2007. Improving methods for single-label text categorization. Portugal: Instituto Superior Técnico (2007).Google Scholar
[4].Chakradhar Srimat, Sankaradas Murugan, Jakkula Venkata, and Srihari Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. In ACM SIGARCH Computer Architecture News, Vol. 38. ACM, 247–257.Google Scholar
[5].Chen Yu-Hsin, Krishna Tushar, Emer Joel S, and Sze Vivienne 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127–138.Google ScholarCross Ref
[6].De Kruijf Marc, Nomura Shuou, and Sankaralingam Karthikeyan 2010. A unified model for timing speculation: Evaluating the impact of technology scaling, CMOS design style, and fault recovery mechanism. In International Conference on Dependable Systems and Networks. IEEE, 487–496.Google Scholar
[7].Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li 2009. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 248–255.Google Scholar
[8].Du Zidong, Fasthuber Robert, Chen Tianshi, Ienne Paolo, Li Ling, Luo Tao, Feng Xiaobing, Chen Yunji, and Temam Olivier 2015. ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 92–104.Google Scholar
[9].Ernst Dan, Das Shidhartha, Lee Seokwoo, Blaauw David, Austin Todd, Mudge Trevor, Sung Kim Nam, and Flautner Krisztián 2004. Razor: circuit-level correction of timing errors for low-power operation. IEEE Micro 24, 6 (2004), 10–20.Google ScholarDigital Library
[10].Fojtik Matthew, Fick David, Kim Yejoong, Pinckney Nathaniel, Harris David, Blaauw David, and Sylvester Dennis 2012. Bubble Razor: An architecture-independent approach to timing-error detection and correction. In International Solid-State Circuits Conference. IEEE, 488–490.Google Scholar
[11].Gebregiorgis Anteneh, Kiamehr Saman, and Tahoori Mehdi B 2017. Error propagation aware timing relaxation for approximate near threshold computing. In Proceedings of the 54th Annual Design Automation Conference (DAC). IEEE, 1–6.Google Scholar
[12].Greskamp B., Wan L., Karpuzcu U.R., Cook J.J., Torrellas J., Chen D., and Zilles C.. 2009. Blueshift: Designing processors for timing speculation from the ground up. In IEEE 15th International Symposium on High Performance Computer Architecture. 213–224.Google Scholar
[13].Jiao Xun, Luo Mulong, Lin Jeng-Hau, and Gupta Rajesh K 2017. An assessment of vulnerability of hardware neural networks to dynamic voltage and temperature variations. In Proceedings of the 36th International Conference on Computer-Aided Design. IEEE Press, 945–950.Google Scholar
[14].Jiao Xun, Rahimi Abbas, Jiang Yu, Wang Jianguo, Fatemi Hamed, Pineda de Gyvez Jose, and Gupta Rajesh 2017. Clim: A cross-level workload-aware timing error prediction model for functional units. IEEE Trans. Comput. (2017).Google Scholar
[15].Jouppi Norman P, Young Cliff, Patil Nishant, Patterson David, Agrawal Gaurav, Bajwa Raminder, Bates Sarah, Bhatia Suresh, Boden Nan, Borchers Al, et al. 2017. In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv: 1704.04760 (2017).Google Scholar
[16].Karpathy Andrej, Toderici George, Shetty Sanketh, Leung Thomas, Sukthankar Rahul, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725–1732.Google ScholarDigital Library
[17].Kim Dongyoung, Ahn Junwhan, and Ypp Sungjoo 2018. ZeNA: Zero-Aware Neural Network Accelerator. IEEE Design & Test 35, 1 (2018), 39–46.Google ScholarCross Ref
[18].Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.Google ScholarDigital Library
[19].Nakhaee Farzaneh, Kamal Mehdi, Afzali-Kusha Ali, Pedram Massoud, Fakhraie Sied Mehdi, and Dorosti Hamed 2018. Lifetime improvement by exploiting aggressive voltage scaling during runtime of error-resilient applications. Integration 61 (2018), 29–38.Google ScholarDigital Library
[20].Park Seongwook, Bong Kyeongryeol, Shin Dongjoo, Lee Jinmook, Choi Sungpill, and Ypp Hoi-Jun 2015. 4.6 A1. 93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications. In IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 1–3.Google Scholar
[21].Reagen Brandon, Paul Whatmough, Adolf Robert, Rama Saketh, Lee Hyunkwang, Lee Sae Kyu, Hernández-Lobato José Miguel, Wei Gu-Yeon, and Brooks David 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In ACM SIGARCH Computer Architecture News, Vol. 44. IEEE Press, 267–278.Google Scholar
[22].Sankaradas Murugan, Jakkula Venkata, Cadambi Srihari, Chakradhar Srimat, Durdanovic Igor, Cosatto Eric, and Graf Hans Peter 2009. A massively parallel coprocessor for convolutional neural networks. In 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 53–60.Google Scholar
[23].Sutskever Ilya, Vinyals Oriol, and Le Quoc V 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 3104–3112.Google ScholarDigital Library
[24].Tosson Amr, Garg Siddharth, and Anis Mohab 2013. Tagged probabilistic simulation based error probability estimation for better-than-worst case circuit design. In IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC). IEEE, 368–373.Google Scholar
[25].Wan Lu and Chen Deming 2009. DynaTune: circuit-level optimization for timing speculation considering dynamic path behavior. In Proceedings of International Conference on Computer-Aided Design. ACM, 172–179.Google Scholar
[26].Paul N Whatmough, Das Shidhartha, Bull David M, and Darwazeh Izzat 2013. Circuit-level timing error tolerance for low-power DSP filters and transforms. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21, 6 (2013), 989–999.Google Scholar
[27].Paul N Whatmough, Lee Sae Kyu, Lee Hyunkwang, Rama Saketh, Brooks David, and Gu-Yeon Wei. 2017. 14.3 A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with > 0.1 timing error rate tolerance for IoT applications. In IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 242–243.Google Scholar
[28].Wunderlich Roland E, Wenisch Thomas F, Falsafi Babak, and James C Hoe. 2006. Statistical sampling of microarchitecture simulation. ACM Transactions on Modeling and Computer Simulation (TOMACS) 16, 3 (2006), 197–224.Google ScholarDigital Library
[29].Yasin Atif, Zhang Jeff Jun, Chen Hu, Garg Siddharth, Roy Sanghamitra, and Chakraborty Koushik 2016. Synergistic timing speculation for multi-threaded programs. In Proceedings of the 53rd Annual Design Automation Conference. ACM, 51.Google Scholar
[30].Zhang Jeff, Ghodsi Zahra, Rangineni Kartheek, and Garg Siddharth. [n. d.]. Enabling extreme energy efficiency via timing speculation for deep neural network accelerators. ([n. d.]).Google Scholar
[31].Zhang Jeff, Rangineni Kartheek, Ghodsi Zahra, and Siddharth Garg. 2018. Thun-dervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators. In Proceedings of the 55th Annual Design Automation Conference. ACM, 19.Google Scholar
[32].Zhang Jeff Jun and Siddharth Garg. 2017. BandiTS: dynamic timing speculation using multi-armed bandit based optimization. In Proceedings of the Conference on Design, Automation & Test in Europe. European Design and Automation Association, 922–925.Google Scholar
[33].Zhang Min-Ling and Zhi-Hua Zhou. 2006. Multilabel neural networks with applications to functional genomics and text categorization. IEEE transactions on Knowledge and Data Engineering 18, 10 (2006), 1338–1351.Google ScholarDigital Library

Index Terms

FATE: Fast and Accurate Timing Error Prediction Framework for Low Power DNN Accelerator Design
1. Hardware
  1. Integrated circuits

Index terms have been assigned to the content through auto-classification.

Recommendations

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment. In this paper, we ...
Read More
NoC-based DNN accelerator: a future design paradigm
NOCS '19: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip

Deep Neural Networks (DNN) have shown significant advantages in many domains such as pattern recognition, prediction, and control optimization. The edge computing demand in the Internet-of-Things era has motivated many kinds of computing platforms to ...
Read More
An Error Compensation Technique for Low-Voltage DNN Accelerators
Reducing supply voltages of deep neural network (DNN) accelerators has been of particular interest since it can achieve high energy efficiency for mobile/edge applications. To ensure reliable DNN operations at low voltage, improving the timing error ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
Nov 2018
939 pages

Copyright © 2018
Sponsors
In-Cooperation
Publisher
IEEE Press
Publication History
- Published: 5 November 2018
Permissions
Request permissions about this article.
Request Permissions
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 207
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

FATE: Fast and Accurate Timing Error Prediction Framework for Low Power DNN Accelerator Design

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

NoC-based DNN accelerator: a future design paradigm

An Error Compensation Technique for Low-Voltage DNN Accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

FATE: Fast and Accurate Timing Error Prediction Framework for Low Power DNN Accelerator Design

2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

NoC-based DNN accelerator: a future design paradigm

An Error Compensation Technique for Low-Voltage DNN Accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

Digital Edition

Share this Publication link

Share on Social Media