skip to main content
10.1145/2463209.2488859acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Quantitative evaluation of soft error injection techniques for robust system design

Published:29 May 2013Publication History

ABSTRACT

Choosing the correct error injection technique is of primary importance in simulation-based design and evaluation of robust systems that are resilient to soft errors. Many low-level (e.g., flip-flop-level) error injection techniques are generally used for small systems due to long execution times and significant memory requirements. High-level error injections at the architecture or memory levels are generally fast but can be inaccurate. Unfortunately, there exists very little research literature on quantitative analysis of the inaccuracies associated with high-level error injection techniques. In this paper, we use simulation and emulation results to understand the accuracy trade-offs associated with a variety of high-level error injection techniques. A detailed analysis of error propagation explains the causes of high degrees of inaccuracies associated with error injection techniques at higher levels of abstraction.

References

  1. {Arlat 03} J. Arlat et al., "Comparison of Physical and Software-Implemented Fault Injection Techniques," IEEE Trans. Computers, vol. 52, no. 9, pp. 1115--1133, Sept. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. {Borkar 11} S. Borkar and A. A. Chien, "The Future of Microprocessors," Commun. ACM, vol. 54, no. 5, pp. 67--77, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. {Chen 06} G. Chen, G. Chen, M. Kandemir, N. Vijaykrishnan, and M. J. Irwin, "Object Duplication for Improving Reliability," Proc. Asia and South Pacific Design Automation Conf., pp. 140--145, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. {Chen 08} D. Chen, G. Jacques-Silva, Z. Kalbarczyk, R. K. Iyer, and B. Mealey, "Error Behavior Comparison of Multiple Computing Systems: A Case Study Using Linux on Pentium, Solaris on SPARC, and AIX on POWER," Proc. IEEE Pac. Rim Intl. Symp. Dependable Computing, pp. 339--346, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. {Choi 90} G. S. Choi, R. K. Iyer, and V. A. Carreno, "Simulated Fault Injection: A Methodology to Evaluate Fault Tolerant Microprocessor Architectures," IEEE Trans. Reliability, vol. 39, no. 4, pp. 486--491, Oct. 1990.Google ScholarGoogle ScholarCross RefCross Ref
  6. {Davis 09} J. D. Davis, C. P. Thacker, and C. Chang, "BEE3: Revitalizing Computer Architecture Research," Microsoft Tech. Rep. MSR-TR-2009-45, 2009.Google ScholarGoogle Scholar
  7. {DeHon 10} A. DeHon, H. M. Quinn, and N. P. Carter, "Vision for Cross-Layer Optimization to Address the Dual Challenges of Energy and Reliability," Proc. Design, Automation and Test in Europe, pp.1017--1022, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. {Feng 10} S. Feng, S. Gupta, A. Ansari, and S. Mahlke, "Shoestring: Probabilistic Soft Error Reliability on the Cheap," Proc. Intl. Conf. Architectural Support for Programming Languages and Operating Systems, pp. 385--396, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. {Fleming 86} P. J. Fleming and J. J. Wallace, "How not to lie with statistics: the correct way to summarize benchmark results," Commun. ACM, vol. 29, no. 3, pp. 218--221, March 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. {Gem5} "The gem5 Simulator System," http://www.m5sim.orgGoogle ScholarGoogle Scholar
  11. {Gu 04} W. Gu, Z. Kalbarczyk, R. K. Iyer, "Error Sensitivity of the Linux Kernel Executing on PowerPC G4 and Pentium 4 Processors," Proc. Intl. Conf. on Dependable Systems and Networks, pp. 887--896, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. {Howard 10} J. Howard et al., "A 48-Core IA-32 Message-Passing Processor with DVFS in 45nm CMOS," Proc. IEEE Intl. Solid-State Circuits Conf., pp. 108--109, 2010.Google ScholarGoogle Scholar
  13. {Kalbarczyk 99} Z. Kalbarczyk et al., "Hierarchical Simulation Approach to Accurate Fault Modeling for System Dependability Evaluation," IEEE Trans. Software Engineering, vol. 25, no. 5, pp. 619--632, Sept.--Oct. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. {Kanawati 93} G. A. Kanawati, N. A. Kanawati, and J. A. Abraham, "EMAX: An Automatic Extractor of High-Level Error Models," Proc. AIAA Computing Aerospace Conf., pp. 1297--1306, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  15. {KleinOsowski 02} AJ KleinOsowski, D. J. Lilja, "MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research," IEEE Computer Architecture Letters, vol. 1, no. 1, p. 7, Jan.--Dec. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. {Leon} Aeroflex Gaisler, "Leon3 Processor," http://www.gaisler.com.Google ScholarGoogle Scholar
  17. {McCluskey 71} E. J. McCluskey and F. W. Clegg, "Fault Equivalence in Combinational Logic Networks," IEEE Trans. Computers, vol. 20, no. 11, pp. 1286--1293, Nov. 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. {McCluskey 00} E. J. McCluskey and C.-W. Tseng, "Stuck-Fault Tests vs. Actual Defects," IEEE Intl. Test Conf., pp. 336--343, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. {Maniatakos 11} M. Maniatakos, N. Karimi, C. Tirumurti, A. Jas, and Y. Makris, "Instruction-Level Impact Analysis of Low-Level Faults in a Modern Microprocessor Controller," IEEE Trans. Computers, vol. 60, no. 9, pp. 1260--1273, Sept. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. {Michalak 12} S. E. Michalak et al., "Assessment of the Impact of Cosmic-Ray-Induced Neutrons on Hardware in the Roadrunner Supercomputer," IEEE Trans. Device and Materials Reliability, vol. 12, no. 2, pp. 445--454, June 2012.Google ScholarGoogle ScholarCross RefCross Ref
  21. {Miskov-Zivanov 10} N. Miskov-Zivanov, D. Marculescu, "Multiple Transient Faults in Combinational and Sequential Circuits: A Systematic Approach," IEEE Trans. Comput.-Aided Des. Integr. Circuits and Syst., vol. 29, no. 10, pp. 1614--1627, Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. {Mitra 10} S. Mitra, K. Brelsford, and P. N. Sanda, "Cross-Layer Resilience Challenges: Metrics and Optimization," Proc. Design, Automation and Test in Europe, pp. 1029--1034, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. {OpenSPARC} "OpenSPARC: World's First Free 64-bit Microprocessor," http://www.opensparc.net.Google ScholarGoogle Scholar
  24. {Pellegrini 12} A. Pellegrini et al., "CrashTest'ing SWAT: Accurate, Gate-Level Evaluation of Symptom-Based Resiliency Solutions," Proc. Design, Automation and Test in Europe, pp. 1106--1109, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. {Pattabiraman 11} K. Pattabiraman, G. P. Saggese, D. Chen, Z. T. Kalbarczyk, and R. K. Iyer "Automated Derivation of Application-Specific Error Detectors Using Dynamic Analysis," IEEE Trans. Dependable and Secure Computing, vol. 8, no. 5, pp. 640--655, Sept.--Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. {Ramachandran 08} P. Ramachandran, P. Kudva, J. Kellington, J. Schumann, and P. Sanda, "Statistical Fault Injection," Proc. IEEE Intl. Conf. Dependable Systems and Networks, pp. 122--127, 2008.Google ScholarGoogle Scholar
  27. {Racunas 07} P. Racunas, K. Constantinides, S. Manne, and S. S. Mukherjee, "Perturbation-based Fault Screening," Proc. IEEE Intl. Symp. High Performance Computer Architecture, pp. 169--180, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. {Rebaudengo 02} M. Rebaudengo, M. S. Reorda, and M. Violante, "Analysis of SEU effects in a pipelined processor," Proc. IEEE Intl. On-Line Testing Workshop, pp. 112--116, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. {Rimen 94} M. Rimen, J. Ohlsson, and J. Torin, "On microprocessor error behavior modeling," Proc. IEEE Intl. Symp. Fault-Tolerant Computing, pp. 76--85, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  30. {Sanda 08} P. N. Sanda et al., "Soft-error resilience of the IBM POWER6 processor," IBM Journal of Research and Development, vol. 52, no. 3, pp. 275--284, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. {Seifert 10} N. Seifert, "Radiation-induced soft errors: A chip-level modeling per- spective," Foundat. Trends® in Electron. Design Autom., vol. 4, no. 2-3, pp. 99--221, Feb. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. {Seifert 12} N. Seifert et al., "Soft Error Susceptibilities of 22 nm Tri-Gate Devices," IEEE Trans. Nucl. Sci., vol. 59, no. 6, pp. 2666--2673, Dec. 2012.Google ScholarGoogle ScholarCross RefCross Ref
  33. {Wang 04} N. J. Wang, J. Quek, T. M. Rafacz, and S. J. Patel, "Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline," Proc. Intl. Conf. on Dependable Systems and Networks, pp. 61--70, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. {Wang 07} N. J. Wang, A. Mahesri, and S. J. Patel, "Examining ACE Analysis Reliability Estimates Using Fault-Injection," Proc. Intl. Symp. Computer Architecture, pp. 460--469, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. {Yim 10} K. S. Yim, Z. Kalbarczyk, and R. K. Iyer, "Measurement-based Analysis of Fault and Error Sensitivities of Dynamic Memory," Proc. IEEE/IFIP Intl. Conf. on Dependable Systems and Networks, pp. 431--436, 2010.Google ScholarGoogle Scholar
  36. {Zhang 10} Y. Zhang, J. W. Lee, N. P. Johnson, and D. I. August, "DAFT: Decoupled Acyclic Fault Tolerance," Proc. Intl. Conf. Parallel Architectures and Compilation Techniques, pp. 87--98, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Quantitative evaluation of soft error injection techniques for robust system design

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          DAC '13: Proceedings of the 50th Annual Design Automation Conference
          May 2013
          1285 pages
          ISBN:9781450320719
          DOI:10.1145/2463209

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 May 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate1,770of5,499submissions,32%

          Upcoming Conference

          DAC '24
          61st ACM/IEEE Design Automation Conference
          June 23 - 27, 2024
          San Francisco , CA , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader