skip to main content
10.1145/1250662.1250726acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
Article

Dynamic prediction of architectural vulnerability from microarchitectural state

Published:09 June 2007Publication History

ABSTRACT

Transient faults due to particle strikes are a key challenge in microprocessor design. Driven by exponentially increasing transistor counts, per-chip faults are a growing burden. To protect against soft errors, redundancy techniques such as redundant multithreading (RMT) are often used. However, these techniques assume that the probability that a structural fault will result in a soft error (i.e., the Architectural Vulnerability Factor (AVF)) is 100 percent, unnecessarily draining processor resources. Due to the high cost of redundancy, there have been efforts to throttle RMT at runtime. To date, these methods have not incorporated an AVF model and therefore tend to be ad hoc. Unfortunately, computing the AVF of complex microprocessor structures (e.g., the ISQ) can be quite involved.

To provide probabilistic guarantees about fault tolerance, we have created a rigorous characterization of AVF behavior that can be easily implemented in hardware. We experimentally demonstrate AVF variability within and across the SPEC2000 benchmarks and identify strong correlations between structural AVF values and a small set of processor metrics. Using these simple indicators as predictors, we create a proof-of-concept RMT implementation that demonstrates that AVF prediction can be used to maintain a low fault tolerance level without significant performance impact.

References

  1. D. Bernick and et al. NonStop® Advanced Architecture. In Proceedings of the InternationalConference on Dependable Systems and Networks (DSN), pages 12--21, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Biswas, P. Racunas, R. Cheveresan, J. S. Emer, S. S. Mukherjee, and R. Rangan. Computing architectural vulnerability factors for address-based structures. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 532--543, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Burger and T. Austin. The SimpleScalar Toolset, Version 3.0. http://www.simplescalar.com.Google ScholarGoogle Scholar
  4. C.L. Chen and M.Y. Hsiao. Error-Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review. IBM Journal of Research and Development, 28(2):124--134, March 1984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Duesterwald, C. Cascaval, and S. Dwarkadas. Characterizing and predicting program behavior and its variability. In PACT '03: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, page 220, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Eeckhout, H. Vandierendonck, and K. D. Bosschere. Workload design: Selecting representative program-input pairs. In PACT '02: Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, pages 83--94, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. X. Fu, J. Poe, T. Li, and J. Fortes. Characterizing Microarchitecture Soft Error Vulnerability Phase Behavior. In Proceedings of the International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), September 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Gomaa, C. Scarbrough, T. Vijaykumar, and I. Pomeranz. Transient-Fault Recovery for Chip Multiprocessors. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 98--109, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. A. Gomaa and T. N. Vijaykumar. Opportunistic transient-fault detection. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 172--183, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Grunwald, A. Klauser, S. Manne, and A. R. Pleszkun. Confidence estimation for speculation control. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 122--131, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Hoste, A. Phansalkar, L. Eeckhout, A. Georges, L. K. John, and K. D. Bosschere. Performance prediction based on inherent program similarity. In PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pages 114--122, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. I. Jolliffe. Principal Component Analysis. Springer, 2002.Google ScholarGoogle Scholar
  13. S. Kumar and A. Aggarwal. Reduced Resource Redundancy for Concurrent Error Detection Techniques in High Performance Microprocessors. In Proceedings of the International Conference on High Performance Computer Architecture (HPCA), pages 212--221, February 2006.Google ScholarGoogle Scholar
  14. N. Madan and R. Balasubramonian. A First-Order Analysis of Power Overheads of Redundant Multi-Threading. In Proceedings of the Workshop on the System Effects of Logic Soft Errors (SELSE), April 2006.Google ScholarGoogle Scholar
  15. S. Mukherjee, M. Kontz, and S. Reinhardt. Detailed Design and Evaluation of Redundant Multithreading Alternatives. In International Symposium on Computer Architecture (ISCA), pages 99--110, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, and T. Austin. A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 29--40, December 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Multiple SimPoints. http://www.cse.ucsd.edu/~calder/simpoint/multiplestandardsimpoints.htm.Google ScholarGoogle Scholar
  18. A. Parashar, S. Gurumurthi, and A. Sivasubramaniam. A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 376--386, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Parashar, S. Gurumurthi, and A. Sivasubramaniam. SlicK: Slice-based Locality Exploitation for Efficient Redundant Multithreading. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 95--105, October 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Rashid, E. Tan, M. Huang, and D. Albonesi. Exploiting Coarse-Grained Verification Parallelism for Power-Efficient Fault Tolerance. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 315--325, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Reddy, S. Parthasarathy, and E. Rotenberg. Understanding Prediction-Based Partial Redundant Threading for Low-Overhead, High-Coverage Fault Tolerance. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 83--94, October 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Reinhardt and S. Mukherjee. Transient Fault Detection via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 25--36, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. August. SWIFT: Software Implemented Fault Tolerance. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), March 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Reis, J. Chang, N. Vachharajani, R. Rangan, D. August, and S. Mukherjee. Design and Evaluation of Hybrid Fault-Detection Systems. In Proceedings of the International Symposium on Computer Architecture (ISCA), June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. Rotenberg. AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors. In proceedings of the International Symposium on Fault-Tolerant Computing (FTCS), pages 84--91, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Sheaffer, D. Luebke, and K. Skadron. The visual vulnerability spectrum: Characterizing architectural vulnerability for graphics hardware. In Proceedings of the 2006 Graphics Hardware Workshop, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically Characterizing Large Scale Program Behavior. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Shivakumar, M. Kistler, S. Keckler, D. Burger, and L. Alvisi. Modeling the Effect of Technology Trends on Soft Error Rate of Combinational Logic. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. J. Slegel, R. M. A. III, M. A. Check, B. C. Giamei, B. W. Krumm, C. A. Krygowski, W. H. Li, J. S. Liptay, J. D. MacDougall, T. J. McPherson, J. A Navarro, E. M. Schwarz, K. Shum, and C. F. Webb. Ibm's s/390 g5 microprocessor design. IEEE Micro, 19(2):12--23, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Smolens, B. Gold, J. Kim, B. Falsafi, J. Hoe, and A. Nowatzyk. Fingerprinting: Bounding Soft-Error Detection Latency and Bandwidth. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 224--234, October 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Smolens, J. Kim, J. Hoe, and B. Falsafi. Efficient Resource Sharing in Concurrent Error Detecting Superscalar Microarchitectures. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 257--268, December 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. SPEC CPU2000. http://www.spec.org/cpu2000/.Google ScholarGoogle Scholar
  33. T. Vijaykumar, I. Pomeranz, and K. Cheng. Transient-Fault Recovery via Simultaneous Multithreading. In Proceedings of the International Symposium on Computer Architecture (ISCA), pages 87--98, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Wood. Data integrity concepts, features, and technology. White Paper, Tandem Division, Compaq Computer Corporation.Google ScholarGoogle Scholar
  35. J. Zeigler. Terrestrial Cosmic Rays. IBM Journal of Research and Development, 40(1):19--39, January 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dynamic prediction of architectural vulnerability from microarchitectural state

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture
      June 2007
      542 pages
      ISBN:9781595937063
      DOI:10.1145/1250662
      • General Chair:
      • Dean Tullsen,
      • Program Chair:
      • Brad Calder
      • cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 35, Issue 2
        May 2007
        527 pages
        ISSN:0163-5964
        DOI:10.1145/1273440
        Issue’s Table of Contents

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate543of3,203submissions,17%

      Upcoming Conference

      ISCA '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader