skip to main content
10.1145/1555754.1555771acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Memory mapped ECC: low-cost error protection for last level caches

Published:20 June 2009Publication History

ABSTRACT

This paper presents a novel technique, Memory Mapped ECC, which reduces the cost of providing error correction for SRAM caches. It is important to limit such overheads as processor resources become constrained and error propensity increases. The continuing decrease in SRAM cell size and the growing capacity of caches increases the likelihood of errors in SRAM arrays. To address this, redundant information can be used to correct a value after an error occurs. Information redundancy is typically provided through error-correcting codes (ECC), which append bits to every SRAM row and increase the array's area and energy consumption. We make three observations regarding error protection and utilize them in our architecture: (1) much of the data in a cache is replicated throughout the hierarchy and is inherently redundant; (2) error-detection is necessary for every cache access and is cheaper than error correction, which is very infrequent; (3) redundant information for correction need not be stored in high-cost SRAM. Our unique architecture only dedicates SRAM for error detection while the ECC bits are stored within the memory hierarchy as data. We associate a physical memory address with each cache line for ECC storage and rely on locality to minimize the impact. The cache is dynamically and transparently partitioned between data and ECC with the fraction of ECC growing with the number of dirty cache lines. We show that this has little impact on both performance (1.3% average and < 4%) and memory traffic (3%) across a range of memory-intensive applications.

References

  1. H. Ando, K. Seki, S. Sakashita, M. Aihara, R. Kan, K. Imada, M. Itoh, M. Nagai, Y. Tosaka, K. Takahisa, and K. Hatanaka. Accelerated Testing of a 90nm SPARC64 V Microprocessor for Neutron SER. In Proceedings of IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE), April 2007.Google ScholarGoogle Scholar
  2. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. Technical Report TR-811-08, Princeton University, January 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. H. Calhoun and A. P. Chandrakasan. A 256kb Sub-threshold SRAM in 65nm CMOS. In Proceedings of the International Solid-State Circuits Conference (ISSCC), February 2006.Google ScholarGoogle ScholarCross RefCross Ref
  4. L. Chang, D. M. Fried, J. Hergenrother, J. W. Sleight, R. H. Dennard, R. K. Montoye, L. Sekaric, S. J. McNab, A. W. Topol, C. D. Adams, K. W. Guarini, and W. Haensch. Stable SRAM Cell Design for the 32nm Node and Beyond. In Digest of Technical Papers of Symposium on VLSI Technology, June 2005.Google ScholarGoogle Scholar
  5. C. L. Chen and M. Y. Hsiao. Error-correcting Ccodes for Semiconductor Memory Applications: A State-of-the-art Review. IBM Journal of Research and Development, 28(2):124--134, March 1984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. Derhacobian, V. A. Vardanian, and Y. Zorian. Embedded Memory Reliability: The SER Challenge. In Proceedings of the Records of the 2004 International Workshop on Memory Technology, Design, and Testing, August 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Digital Equipment Corp. Alpha 21264 Microprocessor Hardware Reference Manual, July 1999.Google ScholarGoogle Scholar
  8. G. Hamerly, E. Perelman, J. Lau, and B. Calder. SimPoint 3.0: Faster and More Flexible Program Analysis. In Proceedings of Workshop on Modeling, Benchmarking and Simulation, June 2005.Google ScholarGoogle Scholar
  9. R. W. Hamming. Error Correcting and Error Detecting Codes. Bell System Technical Journal, 29:147--160, April 1950.Google ScholarGoogle Scholar
  10. M. Y. Hsiao. A Class of Optimal Minimum Odd-weight-column SEC-DED codes. IBM Journal of Reserach and Development, 14:395--301, 1970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Huynh. White Paper: The AMD Athlon MP Processor with 512KB L2 Cache, May 2003.Google ScholarGoogle Scholar
  12. C. N. Keltcher, K. J. McGrath, A. Ahmed, and P. Conway. The AMD Opteron processor for multiprocessor servers. IEEE Micro, 23(2):66--76, March-April 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J. C. Hoe. Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding. In Proceedings of the 40th IEEE/ACM International Symposium on Microarchitecture (MICRO), December 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Kim. Area-Efficient Error Protection for Caches. In Proceedings of the Conference on Design Automation and Test in Europe (DATE), March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Kim and A. K. Somani. Area Efficient Architectures for Information Integrity in Cache Memories. In Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA), May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. P. Kulkarni, K. Kim, and K. roy. A 160mV Robust Schmitt Trigger Based Subthreshold SRAM. IEEE Journal of Solid-State Circuits, 42(10):2303--2313, October 2007.Google ScholarGoogle ScholarCross RefCross Ref
  17. H.-H. S. Lee, G. S. Tyson, and M. K. Farrens. Eager Writeback - A Technique for Improving Bandwidth Utilization. In Proceedings of the 33rd annual IEEE/ACM international Symposium on Microarchitecture (MICRO), November/December 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Li, V. S. Degalahal, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin. Soft Error and Energy Consumption Interactions: A Data Cache Perspective. In Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), August 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Lin and D. J. C. Jr. Error Control Coding: Fundamentals and Applications. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1983.Google ScholarGoogle Scholar
  20. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. SIMICS: A Full System Simulation Platform. IEEE Computer, 35:50--58, February 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Maiz, S. Hareland, K. Zhang, and P. Armstrong. Characterization of Multi-Bit Soft Error Events in Advanced SRAMs. In Technical Digest of IEEE International Electron Devices Meeting (IEDM), December 2003.Google ScholarGoogle Scholar
  23. M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News (CAN), 33:92--99, November 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Osada, K. Yamaguchi, and Y. Saitoh. SRAM Immunity to Cosmic-Ray-Induced Multierrors based on Analysis of an Induced Parasitic Bipolar Effect. IEEE Journal of Solid-State Circuits, 39:827--833, May 2004.Google ScholarGoogle ScholarCross RefCross Ref
  25. A. M. Patel and M. Y. Hsiao. An Adaptive Error Correction Scheme for Computer Memory System. In Proceedings of the fall joint computer conference, part I, December 1972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. I. S. Reed and G. Solomon. Polynomial Codes Over Certain Finite Fields. Journal of Society for Industrial and Applied Mathematics, 8:300--304, June 1960.Google ScholarGoogle Scholar
  27. N. N. Sadler and D. J. Sorin. Choosing an Error Protection Scheme for a Microprocessor's L1 Data Cache. In Proceedings of International Conference on Computer Design (ICCD), October 2006.Google ScholarGoogle ScholarCross RefCross Ref
  28. N. Seifert, V. Zia, and B. Gill. Assessing the Impact of Scaling on the Efficacy of Spatial Redundancy based Mitigation Schemes for Terrestrial Applications. In Proceedings of IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE), April 2007.Google ScholarGoogle Scholar
  29. C. Slayman. Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations. IEEE Transactions on Device and Materials Reliability, 5:397--404, September 2005.Google ScholarGoogle ScholarCross RefCross Ref
  30. Standard Performance Evaluation Corporation. SPEC CPU 2006. http://www.spec.org/cpu2006/, 2006.Google ScholarGoogle Scholar
  31. J. Standards. JESD89 Measurement and Reporting of Alpha Particles and Terrestrial Cosmic Ray-Induced Soft Errors in Semiconductor Devices, JESD89-1 System Soft Error Rate (SSER) Method and JESD89-2 Test Method for Alpha Source Accelerated Soft Error Rate, 2001.Google ScholarGoogle Scholar
  32. Sun Microsystems Inc. OpenSPARC T2 System-On-Chip (SOC) Microarchitecture Specification, May 2008.Google ScholarGoogle Scholar
  33. J. M. Tendler, J. S. Dodson, J. S. F. Jr., H. Le, and B. Sinharoy. POWER4 System Microarchitecture. IBM Journal of Research and Development, 46(1):5--25, January 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical report, HP Laboratories, April 2008.Google ScholarGoogle Scholar
  35. D. Wang, B. Ganesh, N. Tuaycharoen, K. Baynes, A. Jaleel, and B. Jacob. DRAMsim: A memory-system simulator. SIGARCH Computer Architecture News (CAN), 33:100--107, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Wilkerson, H. Gao, A. R. Alameldeen, Z. Chishti, M. Khellah, and S.-L. Lu. Trading Off Cache Capacity for Reliability to Enable Low Voltage Operation. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA), June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA), June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Wuu, D. Weiss, C. Morganti, and M. Dreesen. The asynchronous 24MB on-chip level-3 cache for a dual-core Itanium-family processor. In Proceedings of the International Solid-State Circuits Conference (ISSCC), February 2005.Google ScholarGoogle ScholarCross RefCross Ref
  39. W. Zhang. Replication Cache: A Small Fully Associative Cache to Improve Data Cache Reliability. IEEE Transactions on Computer, 54(12):1547--1555, December 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. W. Zhang, S. Gurumurthi, M. Kandemir, and A. Sivasubramaniam. ICR: In-Cache Replication for Enhancing Data Cache Reliability. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), June 2003.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Memory mapped ECC: low-cost error protection for last level caches

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
          June 2009
          510 pages
          ISBN:9781605585260
          DOI:10.1145/1555754
          • cover image ACM SIGARCH Computer Architecture News
            ACM SIGARCH Computer Architecture News  Volume 37, Issue 3
            June 2009
            495 pages
            ISSN:0163-5964
            DOI:10.1145/1555815
            Issue’s Table of Contents

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 June 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate543of3,203submissions,17%

          Upcoming Conference

          ISCA '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader