skip to main content
10.1145/3240765.3240809guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

FATE: Fast and Accurate Timing Error Prediction Framework for Low Power DNN Accelerator Design

Published:05 November 2018Publication History

ABSTRACT

Deep neural networks (DNN) are increasingly being accelerated on application-specific hardware such as the Google TPU designed especially for deep learning. Timing speculation is a promising approach to further increase the energy efficiency of DNN accelerators. Architectural exploration for timing speculation requires detailed gate-level timing simulations that can be time-consuming for large DNNs which execute millions of multiply-and-accumulate (MAC) operations. In this paper we propose FATE, a new methodology for fast and accurate timing simulations of DNN accelerators like the Google TPU. FATE proposes two novel ideas: (i) DelayNet, a DNN based timing model for MAC units; and (ii) a statistical sampling methodology that reduces the number of MAC operations for which timing simulations are performed. We show that FATE results in between 8 × −58× speed-up in timing simulations, while introducing less than 2% error in classification accuracy estimates. We demonstrate the use of FATE by comparing a conventional DNN accelerator that uses 2's complement (2C) arithmetic with one that uses signed magnitude representation (SMR). We show that that the SMR implementation provides 18% more energy savings for the same classification accuracy than 2C, a result that might be of independent interest.

References

  1. [1].Albericio Jorge, Judd Patrick, Hetherington Tayler, Aamodt Tor, Enright Jerger Natalie, and Moshovos Andreas. 2016. Cnvlutin: ineffectual-neuron-free deep neural network computing. In ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 113.Google ScholarGoogle Scholar
  2. [2].Ba Jimmy and Caruana Rich 2014. Do deep nets really need to be deep?. In Advances in neural information processing systems. 26542662.Google ScholarGoogle Scholar
  3. [3].Margarida de Jesus Cardoso Cachopo Ana 2007. Improving methods for single-label text categorization. Portugal: Instituto Superior Técnico (2007).Google ScholarGoogle Scholar
  4. [4].Chakradhar Srimat, Sankaradas Murugan, Jakkula Venkata, and Srihari Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. In ACM SIGARCH Computer Architecture News, Vol. 38. ACM, 247257.Google ScholarGoogle Scholar
  5. [5].Chen Yu-Hsin, Krishna Tushar, Emer Joel S, and Sze Vivienne 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127138.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6].De Kruijf Marc, Nomura Shuou, and Sankaralingam Karthikeyan 2010. A unified model for timing speculation: Evaluating the impact of technology scaling, CMOS design style, and fault recovery mechanism. In International Conference on Dependable Systems and Networks. IEEE, 487496.Google ScholarGoogle Scholar
  7. [7].Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li 2009. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 248255.Google ScholarGoogle Scholar
  8. [8].Du Zidong, Fasthuber Robert, Chen Tianshi, Ienne Paolo, Li Ling, Luo Tao, Feng Xiaobing, Chen Yunji, and Temam Olivier 2015. ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 92104.Google ScholarGoogle Scholar
  9. [9].Ernst Dan, Das Shidhartha, Lee Seokwoo, Blaauw David, Austin Todd, Mudge Trevor, Sung Kim Nam, and Flautner Krisztián 2004. Razor: circuit-level correction of timing errors for low-power operation. IEEE Micro 24, 6 (2004), 1020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10].Fojtik Matthew, Fick David, Kim Yejoong, Pinckney Nathaniel, Harris David, Blaauw David, and Sylvester Dennis 2012. Bubble Razor: An architecture-independent approach to timing-error detection and correction. In International Solid-State Circuits Conference. IEEE, 488490.Google ScholarGoogle Scholar
  11. [11].Gebregiorgis Anteneh, Kiamehr Saman, and Tahoori Mehdi B 2017. Error propagation aware timing relaxation for approximate near threshold computing. In Proceedings of the 54th Annual Design Automation Conference (DAC). IEEE, 16.Google ScholarGoogle Scholar
  12. [12].Greskamp B., Wan L., Karpuzcu U.R., Cook J.J., Torrellas J., Chen D., and Zilles C.. 2009. Blueshift: Designing processors for timing speculation from the ground up. In IEEE 15th International Symposium on High Performance Computer Architecture. 213224.Google ScholarGoogle Scholar
  13. [13].Jiao Xun, Luo Mulong, Lin Jeng-Hau, and Gupta Rajesh K 2017. An assessment of vulnerability of hardware neural networks to dynamic voltage and temperature variations. In Proceedings of the 36th International Conference on Computer-Aided Design. IEEE Press, 945950.Google ScholarGoogle Scholar
  14. [14].Jiao Xun, Rahimi Abbas, Jiang Yu, Wang Jianguo, Fatemi Hamed, Pineda de Gyvez Jose, and Gupta Rajesh 2017. Clim: A cross-level workload-aware timing error prediction model for functional units. IEEE Trans. Comput. (2017).Google ScholarGoogle Scholar
  15. [15].Jouppi Norman P, Young Cliff, Patil Nishant, Patterson David, Agrawal Gaurav, Bajwa Raminder, Bates Sarah, Bhatia Suresh, Boden Nan, Borchers Al, et al. 2017. In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv: 1704.04760 (2017).Google ScholarGoogle Scholar
  16. [16].Karpathy Andrej, Toderici George, Shetty Sanketh, Leung Thomas, Sukthankar Rahul, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 17251732.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17].Kim Dongyoung, Ahn Junwhan, and Ypp Sungjoo 2018. ZeNA: Zero-Aware Neural Network Accelerator. IEEE Design & Test 35, 1 (2018), 3946.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18].Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 10971105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19].Nakhaee Farzaneh, Kamal Mehdi, Afzali-Kusha Ali, Pedram Massoud, Fakhraie Sied Mehdi, and Dorosti Hamed 2018. Lifetime improvement by exploiting aggressive voltage scaling during runtime of error-resilient applications. Integration 61 (2018), 2938.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20].Park Seongwook, Bong Kyeongryeol, Shin Dongjoo, Lee Jinmook, Choi Sungpill, and Ypp Hoi-Jun 2015. 4.6 A1. 93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications. In IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 13.Google ScholarGoogle Scholar
  21. [21].Reagen Brandon, Paul Whatmough, Adolf Robert, Rama Saketh, Lee Hyunkwang, Lee Sae Kyu, Hernández-Lobato José Miguel, Wei Gu-Yeon, and Brooks David 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In ACM SIGARCH Computer Architecture News, Vol. 44. IEEE Press, 267278.Google ScholarGoogle Scholar
  22. [22].Sankaradas Murugan, Jakkula Venkata, Cadambi Srihari, Chakradhar Srimat, Durdanovic Igor, Cosatto Eric, and Graf Hans Peter 2009. A massively parallel coprocessor for convolutional neural networks. In 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 5360.Google ScholarGoogle Scholar
  23. [23].Sutskever Ilya, Vinyals Oriol, and Le Quoc V 2014. Sequence to sequence learning with neural networks. In Advances in neural information processing systems. 31043112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24].Tosson Amr, Garg Siddharth, and Anis Mohab 2013. Tagged probabilistic simulation based error probability estimation for better-than-worst case circuit design. In IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC). IEEE, 368373.Google ScholarGoogle Scholar
  25. [25].Wan Lu and Chen Deming 2009. DynaTune: circuit-level optimization for timing speculation considering dynamic path behavior. In Proceedings of International Conference on Computer-Aided Design. ACM, 172179.Google ScholarGoogle Scholar
  26. [26].Paul N Whatmough, Das Shidhartha, Bull David M, and Darwazeh Izzat 2013. Circuit-level timing error tolerance for low-power DSP filters and transforms. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21, 6 (2013), 989999.Google ScholarGoogle Scholar
  27. [27].Paul N Whatmough, Lee Sae Kyu, Lee Hyunkwang, Rama Saketh, Brooks David, and Gu-Yeon Wei. 2017. 14.3 A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with > 0.1 timing error rate tolerance for IoT applications. In IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 242243.Google ScholarGoogle Scholar
  28. [28].Wunderlich Roland E, Wenisch Thomas F, Falsafi Babak, and James C Hoe. 2006. Statistical sampling of microarchitecture simulation. ACM Transactions on Modeling and Computer Simulation (TOMACS) 16, 3 (2006), 197224.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29].Yasin Atif, Zhang Jeff Jun, Chen Hu, Garg Siddharth, Roy Sanghamitra, and Chakraborty Koushik 2016. Synergistic timing speculation for multi-threaded programs. In Proceedings of the 53rd Annual Design Automation Conference. ACM, 51.Google ScholarGoogle Scholar
  30. [30].Zhang Jeff, Ghodsi Zahra, Rangineni Kartheek, and Garg Siddharth. [n. d.]. Enabling extreme energy efficiency via timing speculation for deep neural network accelerators. ([n. d.]).Google ScholarGoogle Scholar
  31. [31].Zhang Jeff, Rangineni Kartheek, Ghodsi Zahra, and Siddharth Garg. 2018. Thun-dervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators. In Proceedings of the 55th Annual Design Automation Conference. ACM, 19.Google ScholarGoogle Scholar
  32. [32].Zhang Jeff Jun and Siddharth Garg. 2017. BandiTS: dynamic timing speculation using multi-armed bandit based optimization. In Proceedings of the Conference on Design, Automation & Test in Europe. European Design and Automation Association, 922925.Google ScholarGoogle Scholar
  33. [33].Zhang Min-Ling and Zhi-Hua Zhou. 2006. Multilabel neural networks with applications to functional genomics and text categorization. IEEE transactions on Knowledge and Data Engineering 18, 10 (2006), 13381351.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FATE: Fast and Accurate Timing Error Prediction Framework for Low Power DNN Accelerator Design
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image Guide Proceedings
      2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
      Nov 2018
      939 pages

      Copyright © 2018

      Publisher

      IEEE Press

      Publication History

      • Published: 5 November 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Qualifiers

      • research-article