ABSTRACT
This paper presents a novel technique, Memory Mapped ECC, which reduces the cost of providing error correction for SRAM caches. It is important to limit such overheads as processor resources become constrained and error propensity increases. The continuing decrease in SRAM cell size and the growing capacity of caches increases the likelihood of errors in SRAM arrays. To address this, redundant information can be used to correct a value after an error occurs. Information redundancy is typically provided through error-correcting codes (ECC), which append bits to every SRAM row and increase the array's area and energy consumption. We make three observations regarding error protection and utilize them in our architecture: (1) much of the data in a cache is replicated throughout the hierarchy and is inherently redundant; (2) error-detection is necessary for every cache access and is cheaper than error correction, which is very infrequent; (3) redundant information for correction need not be stored in high-cost SRAM. Our unique architecture only dedicates SRAM for error detection while the ECC bits are stored within the memory hierarchy as data. We associate a physical memory address with each cache line for ECC storage and rely on locality to minimize the impact. The cache is dynamically and transparently partitioned between data and ECC with the fraction of ECC growing with the number of dirty cache lines. We show that this has little impact on both performance (1.3% average and < 4%) and memory traffic (3%) across a range of memory-intensive applications.
- H. Ando, K. Seki, S. Sakashita, M. Aihara, R. Kan, K. Imada, M. Itoh, M. Nagai, Y. Tosaka, K. Takahisa, and K. Hatanaka. Accelerated Testing of a 90nm SPARC64 V Microprocessor for Neutron SER. In Proceedings of IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE), April 2007.Google Scholar
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. Technical Report TR-811-08, Princeton University, January 2008.Google ScholarDigital Library
- B. H. Calhoun and A. P. Chandrakasan. A 256kb Sub-threshold SRAM in 65nm CMOS. In Proceedings of the International Solid-State Circuits Conference (ISSCC), February 2006.Google ScholarCross Ref
- L. Chang, D. M. Fried, J. Hergenrother, J. W. Sleight, R. H. Dennard, R. K. Montoye, L. Sekaric, S. J. McNab, A. W. Topol, C. D. Adams, K. W. Guarini, and W. Haensch. Stable SRAM Cell Design for the 32nm Node and Beyond. In Digest of Technical Papers of Symposium on VLSI Technology, June 2005.Google Scholar
- C. L. Chen and M. Y. Hsiao. Error-correcting Ccodes for Semiconductor Memory Applications: A State-of-the-art Review. IBM Journal of Research and Development, 28(2):124--134, March 1984.Google ScholarDigital Library
- N. Derhacobian, V. A. Vardanian, and Y. Zorian. Embedded Memory Reliability: The SER Challenge. In Proceedings of the Records of the 2004 International Workshop on Memory Technology, Design, and Testing, August 2004. Google ScholarDigital Library
- Digital Equipment Corp. Alpha 21264 Microprocessor Hardware Reference Manual, July 1999.Google Scholar
- G. Hamerly, E. Perelman, J. Lau, and B. Calder. SimPoint 3.0: Faster and More Flexible Program Analysis. In Proceedings of Workshop on Modeling, Benchmarking and Simulation, June 2005.Google Scholar
- R. W. Hamming. Error Correcting and Error Detecting Codes. Bell System Technical Journal, 29:147--160, April 1950.Google Scholar
- M. Y. Hsiao. A Class of Optimal Minimum Odd-weight-column SEC-DED codes. IBM Journal of Reserach and Development, 14:395--301, 1970.Google ScholarDigital Library
- J. Huynh. White Paper: The AMD Athlon MP Processor with 512KB L2 Cache, May 2003.Google Scholar
- C. N. Keltcher, K. J. McGrath, A. Ahmed, and P. Conway. The AMD Opteron processor for multiprocessor servers. IEEE Micro, 23(2):66--76, March-April 2003. Google ScholarDigital Library
- J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J. C. Hoe. Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding. In Proceedings of the 40th IEEE/ACM International Symposium on Microarchitecture (MICRO), December 2007. Google ScholarDigital Library
- S. Kim. Area-Efficient Error Protection for Caches. In Proceedings of the Conference on Design Automation and Test in Europe (DATE), March 2006. Google ScholarDigital Library
- S. Kim and A. K. Somani. Area Efficient Architectures for Information Integrity in Cache Memories. In Proceedings of the 26th Annual International Symposium on Computer Architecture (ISCA), May 1999. Google ScholarDigital Library
- J. P. Kulkarni, K. Kim, and K. roy. A 160mV Robust Schmitt Trigger Based Subthreshold SRAM. IEEE Journal of Solid-State Circuits, 42(10):2303--2313, October 2007.Google ScholarCross Ref
- H.-H. S. Lee, G. S. Tyson, and M. K. Farrens. Eager Writeback - A Technique for Improving Bandwidth Utilization. In Proceedings of the 33rd annual IEEE/ACM international Symposium on Microarchitecture (MICRO), November/December 2000. Google ScholarDigital Library
- L. Li, V. S. Degalahal, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin. Soft Error and Energy Consumption Interactions: A Data Cache Perspective. In Proceedings of International Symposium on Low Power Electronics and Design (ISLPED), August 2004. Google ScholarDigital Library
- S. Lin and D. J. C. Jr. Error Control Coding: Fundamentals and Applications. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1983.Google Scholar
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2005. Google ScholarDigital Library
- P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. SIMICS: A Full System Simulation Platform. IEEE Computer, 35:50--58, February 2002. Google ScholarDigital Library
- J. Maiz, S. Hareland, K. Zhang, and P. Armstrong. Characterization of Multi-Bit Soft Error Events in Advanced SRAMs. In Technical Digest of IEEE International Electron Devices Meeting (IEDM), December 2003.Google Scholar
- M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News (CAN), 33:92--99, November 2005. Google ScholarDigital Library
- K. Osada, K. Yamaguchi, and Y. Saitoh. SRAM Immunity to Cosmic-Ray-Induced Multierrors based on Analysis of an Induced Parasitic Bipolar Effect. IEEE Journal of Solid-State Circuits, 39:827--833, May 2004.Google ScholarCross Ref
- A. M. Patel and M. Y. Hsiao. An Adaptive Error Correction Scheme for Computer Memory System. In Proceedings of the fall joint computer conference, part I, December 1972. Google ScholarDigital Library
- I. S. Reed and G. Solomon. Polynomial Codes Over Certain Finite Fields. Journal of Society for Industrial and Applied Mathematics, 8:300--304, June 1960.Google Scholar
- N. N. Sadler and D. J. Sorin. Choosing an Error Protection Scheme for a Microprocessor's L1 Data Cache. In Proceedings of International Conference on Computer Design (ICCD), October 2006.Google ScholarCross Ref
- N. Seifert, V. Zia, and B. Gill. Assessing the Impact of Scaling on the Efficacy of Spatial Redundancy based Mitigation Schemes for Terrestrial Applications. In Proceedings of IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE), April 2007.Google Scholar
- C. Slayman. Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations. IEEE Transactions on Device and Materials Reliability, 5:397--404, September 2005.Google ScholarCross Ref
- Standard Performance Evaluation Corporation. SPEC CPU 2006. http://www.spec.org/cpu2006/, 2006.Google Scholar
- J. Standards. JESD89 Measurement and Reporting of Alpha Particles and Terrestrial Cosmic Ray-Induced Soft Errors in Semiconductor Devices, JESD89-1 System Soft Error Rate (SSER) Method and JESD89-2 Test Method for Alpha Source Accelerated Soft Error Rate, 2001.Google Scholar
- Sun Microsystems Inc. OpenSPARC T2 System-On-Chip (SOC) Microarchitecture Specification, May 2008.Google Scholar
- J. M. Tendler, J. S. Dodson, J. S. F. Jr., H. Le, and B. Sinharoy. POWER4 System Microarchitecture. IBM Journal of Research and Development, 46(1):5--25, January 2002. Google ScholarDigital Library
- S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical report, HP Laboratories, April 2008.Google Scholar
- D. Wang, B. Ganesh, N. Tuaycharoen, K. Baynes, A. Jaleel, and B. Jacob. DRAMsim: A memory-system simulator. SIGARCH Computer Architecture News (CAN), 33:100--107, September 2005. Google ScholarDigital Library
- C. Wilkerson, H. Gao, A. R. Alameldeen, Z. Chishti, M. Khellah, and S.-L. Lu. Trading Off Cache Capacity for Reliability to Enable Low Voltage Operation. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA), June 2008. Google ScholarDigital Library
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture (ISCA), June 1995. Google ScholarDigital Library
- J. Wuu, D. Weiss, C. Morganti, and M. Dreesen. The asynchronous 24MB on-chip level-3 cache for a dual-core Itanium-family processor. In Proceedings of the International Solid-State Circuits Conference (ISSCC), February 2005.Google ScholarCross Ref
- W. Zhang. Replication Cache: A Small Fully Associative Cache to Improve Data Cache Reliability. IEEE Transactions on Computer, 54(12):1547--1555, December 2005. Google ScholarDigital Library
- W. Zhang, S. Gurumurthi, M. Kandemir, and A. Sivasubramaniam. ICR: In-Cache Replication for Enhancing Data Cache Reliability. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), June 2003.Google ScholarCross Ref
Index Terms
- Memory mapped ECC: low-cost error protection for last level caches
Recommendations
Virtualized and flexible ECC for main memory
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systemsWe present a general scheme for virtualizing main memory error-correction mechanisms, which map redundant information needed to correct errors into the memory namespace itself. We rely on this basic idea, which increases flexibility to increase error ...
Memory mapped ECC: low-cost error protection for last level caches
This paper presents a novel technique, Memory Mapped ECC, which reduces the cost of providing error correction for SRAM caches. It is important to limit such overheads as processor resources become constrained and error propensity increases. The ...
Flexible cache error protection using an ECC FIFO
SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and AnalysisWe present ECC FIFO, a mechanism enabling two-tiered last-level cache error protection using an arbitrarily strong tier-2 code without increasing on-chip storage. Instead of adding redundant ECC information to each cache line, our ECC FIFO mechanism off-...
Comments