Abstract
Emerging nonvolatile memories (NVMs) suffer from low write endurance, resulting in early cell failures (hard errors), which reduce memory lifetime. It was recognized early on that conventional error-correcting codes (ECCs), which are designed for soft errors, are a poor choice for addressing hard errors in NVMs. This led to the evolution of hard error correction schemes like dynamically replicated memory (DRM), error-correcting pointers (ECPs), SAFER, FREE-p, PAYG, and Zombie memory to improve NVM lifetime. Whereas these approaches made significant inroads in addressing hard errors and low memory lifetime in NVMs, overcoming the challenges of underutilization of error-correcting resources and/or implementation overhead (e.g., codec latency, hardware support) remain areas of active research and development.
This article proposes error-correcting strings (ECSs) as a high-utilization, low-latency solution for hard error correction in single-/multi-/triple-level cell (SLC/MLC/TLC) NVMs. At its core, ECS adopts a base-offset approach to store pointers to the failed memory cells; in this work, base is the address of the first failed cell in a memory block and offsets are the distances between successive failed cells in that memory block. Unlike ECP, which uses fixed-length pointers, ECS uses variable-length offsets to point to the failed cells, thereby realizing more pointers to tolerate more hard errors per memory block. Further, this article proposes eXtended-ECS (XECS), a page-level error correction architecture, which employs dynamic on-demand ECS allocation and opportunistic pattern-based data compression to improve NVM lifetime by 2× over ECP-6 for comparable overhead and negligible impact to system performance. Finally, this article demonstrates that ECS is a drop-in replacement for ECP to extend the lifetime of state-of-the-art ECP-based techniques like PAYG and Zombie memory; ECS is also compatible with MLC/TLC NVMs, where it complements drift-induced soft error reduction techniques like ECC and incomplete data mapping to simultaneously extend NVM lifetime.
- 2011. International Technology Roadmap for Semiconductors.Google Scholar
- Alaa R. Alameldeen, I. Wagner, Z. Chishti, W. Wu, C. Wilkerson, and S. L. Lu. 2011. Energy-efficient cache design using variable-strength error-correcting codes. In Proc. Intl. Symposium on Computer Architecture. Google ScholarDigital Library
- Alaa R. Alameldeen and David A. Wood. 2004. Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches. Department of Computer Science, University of Wisconsin-Madison, Technical Report.Google Scholar
- Manu Awasthi, Manjunath Shevgoor, Kshitij Sudan, Bipin Rajendran, Rajeev Balasubramonian, and Viji Srinivasan. 2012. Efficient scrub mechanisms for error-prone emerging memories. In Proc. Intl. Symposium on High-Performance Computer Architecture. Google ScholarDigital Library
- Rodolfo Azevedo, John D. Davis, Karin Strauss, Parikshit Gopalan, Mark Manasse, and Sergey Yekhanin. 2013. Zombie memory: Extending memory lifetime by reviving dead blocks. In Proc. Intl. Symposium on Computer Architecture. Google ScholarDigital Library
- Geoffrey W. Burr, Matthew J. Breitwisch, Michele Franceschini, Davide Garetto, Kailash Gopalakrishnan, Bryan Jackson, Bülent Kurdi, Chung Lam, Luis A. Lastras, Alvaro Padilla, Bipin Rajendran, Simone Raoux, and Rohit S. Shenoy. 2010. Phase change memory technology. Journal of Vacuum Science and Technology B 28(2010), 223--262.Google ScholarCross Ref
- Yu Cai, Saugata Ghose, Erich F. Haratsch, Yixin Luo, and Onur Mutlu. 2017. Error characterization, mitigation, and recovery in flash-memory-based solid-state drives. Proceedings of the IEEE (2017).Google ScholarCross Ref
- Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai. 2013. Threshold voltage distribution in MLC NAND flash memory: Characterization, analysis, and modeling. In 2013 Design, Automation Test in Europe Conference Exhibition (DATE’13). Google ScholarDigital Library
- Yu Cai, Yixin Luo, Saugata Ghose, and Onur Mutlu. 2015a. Read disturb errors in MLC NAND flash memory: Characterization, mitigation, and recovery. In Proc. Intl. Conference on Dependable Systems and Networks. Google ScholarDigital Library
- Yu Cai, Yixin Luo, Erich F. Haratsch, Ken Mai, and Onur Mutlu. 2015b. Data retention in MLC NAND flash memory: Characterization, optimization, and recovery. In Proc. Intl. Symposium on High Performance Computer Architecture.Google ScholarCross Ref
- Sheng-Wei Cheng, Yuan-Hao Chang, Tseng-Yi Chen, Yu-Fen Chang, Hsin-Wen Wei, and Wei-Kuan Shih. 2016. Efficient warranty-aware wear leveling for embedded systems with PCM main memory. IEEE Transactions on Very Large Scale Integration Systems 24 (2016), 2535--2547.Google ScholarCross Ref
- Sangyeun Cho and Hyunjin Lee. 2009. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In Proc. Intl. Symposium on Microarchitecture. Google ScholarDigital Library
- Youngdon Choi, Ickhyun Song, Mu-Hui Park, Hoeju Chung, Sanghoan Chang, Beakhyoung Cho, Jinyoung Kim, Younghoon Oh, Duckmin Kwon, Jung Sunwoo, J. Shin, Y. Rho, C. Lee, M. G. Kang, J. Lee, Y. Kwon, S. Kim, J. Kim, Y. J. Lee, Q. Wang, S. Cha, S. Ahn, H. Horii, J. Lee, K. Kim, H. Joo, K. Lee, Y. T. Lee, J. Yoo, and G. Jeong. 2012. A 20nm 1.8V 8Gb PRAM with 40MB/s program bandwidth. In IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC’12).Google Scholar
- Jie Fan, Song Jiang, Jiwu Shu, Youhui Zhang, and Weimin Zhen. 2013. Aegis: Partitioning data block for efficient recovery of stuck-at-faults in phase change memory. In Proc. Intl. Symposium on Microarchitecture. Google ScholarDigital Library
- Alexandre P. Ferreira, Miao Zhou, Santiago Bock, Bruce Childers, Rami Melhem, and Daniel Mossé. 2010. Increasing PCM main memory lifetime. In Proc. Design, Automation 8 Test in Europe Conference 8 Exhibition. Google ScholarDigital Library
- Aya Fukami, Saugata Ghose, Yixin Luo, Yu Cai, and Onur Mutlu. 2017. Improving the reliability of chip-off forensic analysis of NAND flash memory devices. Digital Investigation (2017). Google ScholarDigital Library
- John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. In ACM SIGARCH Computer Architecture News. Google ScholarDigital Library
- Engin Ipek, Jeremy Condit, Edmund B. Nightingale, Doug Burger, and Thomas Moscibroda. 2010. Dynamically replicated memory: Building reliable systems from nanoscale resistive memories. In Proc. Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
- A. N. Jacobvitz, R. Calderbank, and D. J. Sorin. 2013. Coset coding to extend the lifetime of memory. In Proc. Intl. Symposium on High-Performance Computer Architecture. Google ScholarDigital Library
- Lei Jiang, Youtao Zhang, and Jun Yang. 2012. ER: Elastic RESET for low power and long endurance MLC based phase change memory. In Proc. Intl. Symposium on Low Power Electronics and Design. Google ScholarDigital Library
- Sachhidh Kannan, Jeyavijayan Rajendran, Ramesh Karri, and Ozgur Sinanoglu. 2013. Sneak-path testing of crossbar-based non-volatile random access memories. IEEE Transactions on Nanotechnology 12 (2013), 413--426. Google ScholarDigital Library
- Jungrae Kim, Michael Sullivan, Seong-Lyong Gong, and Mattan Erez. 2015. Frugal ECC: Efficient and versatile memory error protection through fine-grained compression. In Proc. Intl. Conference for High Performance Computing, Networking, Storage and Analysis. Google ScholarDigital Library
- Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable DRAM alternative. In Proc. Intl. Symposium on Computer Architecture. Google ScholarDigital Library
- Shu Lin and Daniel J. Costello. 2004. Error Control Coding. Pearson Higher Education.Google Scholar
- Duo Liu, Tianzheng Wang, Yi Wang, Zili Shao, Qingfeng Zhuge, and Edwin Sha. 2013. Curling-PCM: Application-specific wear leveling for phase change memory based embedded systems. In Proc. Intl. Asia and South Pacific Design Automation Conference.Google Scholar
- Ren-Shuo Liu, Meng-Yen Chuang, Chia-Lin Yang, Cheng-Hsuan Li, Kin-Chu Ho, and Hsiang-Pang Li. 2014. EC-Cache: Exploiting error locality to optimize LDPC in NAND flash-based SSDs. In Proc. Design Automation Conference. Google ScholarDigital Library
- Yixin Luo, Saugata Ghose, Yu Cai, Erich F. Haratsch, and Onur Mutlu. 2016. Enabling accurate and practical online flash channel modeling for modern MLC NAND flash memory. IEEE Journal on Selected Areas in Communications 34 (2016), 2294--2311. Google ScholarDigital Library
- Rakan Maddah, Sangyeun Cho, and Rami Melhem. 2013. Power of one bit: Increasing error correction capability with data inversion. In Proc. Intl Pacific Rim Symposium on Dependable Computing. Google ScholarDigital Library
- Rakan Maddah, Sangyeun Cho, and Rami Melhem. 2016. Symbol shifting: Tolerating more faults in PCM blocks. IEEE Transactions on Computing 65 (2016), 2270--2283.Google ScholarCross Ref
- Rami Melhem, Rakan Maddah, and Sangyeun Cho. 2012. RDIS: A recursively defined invertible set scheme to tolerate multiple stuck-at faults in resistive memory. In Proc. Intl. Conference on Dependable Systems and Networks. Google ScholarDigital Library
- Justin Meza, Qiang Wu, Sanjev Kumar, and Onur Mutlu. 2015a. A large-scale study of flash memory failures in the field. In ACM SIGMETRICS Performance Evaluation Review. Google ScholarDigital Library
- Justin Meza, Qiang Wu, Sanjeev Kumar, and Onur Mutlu. 2015b. Revisiting memory errors in large-scale production data centers: Analysis and modeling of new trends from the field. In Proc. Intl. Conference on Dependable Systems and Networks. Google ScholarDigital Library
- Sparsh Mittal. 2017. A survey of soft-error mitigation techniques for non-volatile memories. Computers (2017).Google Scholar
- Dimin Niu, Qiaosha Zou, Cong Xu, and Yuan Xie. 2013. Low power multi-level cell resistive memory design with incomplete data mapping. In Proc. Intl. Conference on Computer Design.Google ScholarCross Ref
- Poovaiah M. Palangappa, Jiayin Li, and Kartik Mohanram. 2016. WOM-Code solutions for low latency and high endurance in phase change memory. IEEE Transactions on Computing 4 (2016), 1025--1040. Google ScholarDigital Library
- Poovaiah M. Palangappa and Kartik Mohanram. 2015. Flip-Mirror-Rotate: An architecture for bit-write reduction and wear leveling in non-volatile memories. In Proc. Great Lakes Symposium on VLSI. Google ScholarDigital Library
- Poovaiah M. Palangappa and Kartik Mohanram. 2016. CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM. In Proc. Intl. Symposium on High-Performance Computer Architecture.Google Scholar
- David J. Palframan, Nam Sung Kim, and Mikko H. Lipasti. 2015. COP: To compress and protect main memory. In Proc. Intl. Symposium on Computer Architecture. Google ScholarDigital Library
- Avadh Patel, Furat Afram, and Kanad Ghose. 2011. Marss-x86: A qemu-based micro-architectural and systems simulator for x86 multicore processors. In Proc. Design Automation Conference.Google ScholarDigital Library
- G. Pekhimenko, E. Bolotin, N. Vijaykumar, O. Mutlu, T. C. Mowry, and S. W. Keckler. 2016. A case for toggle-aware compression for GPU systems. In Proc. Intl. Symposium on High-Performance Computer Architecture.Google Scholar
- Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proc. Intl. Conference on Parallel Architectures and Compilation Techniques. Google ScholarDigital Library
- Moinuddin Qureshi. 2011. Pay-As-You-Go: Low-overhead hard error correction for phase change memories. In Proc. Intl. Symposium Microarchitecture. Google ScholarDigital Library
- Moinuddin Qureshi, John Karidis, Michele Franceschini, Vijayalakshmi Srinivasan, Luis Lastras, and Bulent Abali. 2009a. Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In Proc. Intl. Symposium on Microarchitecture. Google ScholarDigital Library
- Moinuddin Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009b. Scalable high performance main memory system using phase change memory technology. In Proc. Intl. Symposium on Computer Architecture. Google ScholarDigital Library
- Stuart Schechter, Gabriel H. Loh, Karin Straus, and Doug Burger. 2010. Use ECP, not ECC, for hard failures in resistive memories. In Proc. Intl. Symposium Computer Architecture. Google ScholarDigital Library
- Bianca Schroeder and Garth Gibson. 2010. A large-scale study of failures in high-performance computing systems. IEEE Transactions on Dependable and Secure Computing 7 (2010), 337--350. Google ScholarDigital Library
- Bianca Schroeder, Raghav Lagisetty, and Arif Merchant. 2016. Flash reliability in production: The expected and the unexpected. In USENIX Conference on File and Storage Technologies. Google ScholarDigital Library
- Bianca Schroeder, Eduardo Pinheiro, and Wolf-Dietrich Weber. 2009. DRAM errors in the wild: A large-scale field study. In ACM SIGMETRICS Performance Evaluation Review. Google ScholarDigital Library
- Nak Hee Seong, Dong Hyuk Woo, and Hsien-Hsin S. Lee. 2010a. Security refresh: Prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping. Proc. Intl. Symposium on Computer Architecture (2010). Google ScholarDigital Library
- Nak Hee Seong, Dong Hyuk Woo, Vijayalakshmi Srinivasan, Jude A. Rivers, and Hsien-Hsin S. Lee. 2010b. SAFER: Stuck-at-fault error recovery for memories. In Proc. Intl. Symposium on Microarchitecture. Google ScholarDigital Library
- Nak Hee Seong, Sungkap Yeo, and Hsien-Hsin S. Lee. 2013. Tri-level-cell phase change memory: Toward an efficient and reliable memory system. In Proc. Intl. Symposium on Computer Architecture. Google ScholarDigital Library
- Vilas Sridharan, Nathan DeBardeleben, Sean Blanchard, Kurt B. Ferreira, Jon Stearley, John Shalf, and Sudhanva Gurumurthi. 2015. Memory errors in modern systems: The good, the bad, and the ugly. In Proc. Intl. Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
- Shivam Swami and Kartik Mohanram. 2016. ER: Energy efficient error recovery for multi/triple-Level cell non-volatile memories. In Proc. Intl. Conference on VLSI Design. Google ScholarDigital Library
- Shivam Swami and Kartik Mohanram. 2017. Reliable non-volatile memories: Techniques and measures. IEEE Design 8 Test (2017).Google Scholar
- Cong Xu, Dimin Niu, Naveen Muralimanohar, Norman P. Jouppi, and Yuan Xie. 2013. Understanding the trade-offs in multi-level cell ReRAM memory design. In Proc. Design Automation Conference. Google ScholarDigital Library
- Chun Jason Xue, Youtao Zhang, Yiran Chen, Guangyu Sun, J. Jianhua Yang, and Hai Li. 2011. Emerging non-volatile memories: Opportunities and challenges. In Proc. Intl. Conference on Hardware/Software Codesign and System Synthesis. Google ScholarDigital Library
- Byung-Do Yang, Jae-Eun Lee, Jang-Su Kim, Junghyun Cho, Seung-Yun Lee, and Byoung-Gon Yu. 2007. A low power phase change random access memory using a data-comparison write scheme. In Proc. Intl. Symposium on Circuits and Systems.Google ScholarCross Ref
- Doe Hyun Yoon, Jichuan Chang, Robert S. Schreiber, and Norman P. Jouppi. 2013. Practical non-volatile multi-level cell phase change memory. In Proc. Intl. Conference on High Performance Computing. Google ScholarDigital Library
- Doe Hyun Yoon, Naveen Muralimanohar, Jichuan Chang, Parthasarathy Ranganathan, Norman P. Jouppi, and Mattan Erez. 2011. FREE-p: Protecting non-volatile memory against both hard and soft errors. In Proc. Intl. Symposium on High-Performance Computer Architecture. Google ScholarDigital Library
- Vinson Young, Prashant Nair, and Moinuddin Qureshi. 2015. DEUCE: Write-efficient encryption for non-volatile memories. In Proc. Intl. Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
- Jianhui Yue and Yifeng Zhu. 2013. Accelerating write by exploiting PCM asymmetries. In Proc. Intl. Symposium on High-Performance Computer Architecture. Google ScholarDigital Library
- Lunkai Zhang, Brian Neely, Diana Franklin, Dmitri Strukov, Yuan Xie, and Frederic T. Chong. 2016. Mellow writes: Extending lifetime in resistive memories through selective slow write backs. In Proc. Intl. Symposium on Computer Architecture. Google ScholarDigital Library
- Mingzhe Zhang, Lunkai Zhang, Lei Jiang, Zhiyong Liu, and Frederic T. Chong. 2017. Balancing performance and lifetime of MLC PCM by using a region retention monitor. In Proc. Intl. Symposium on High-Performance Computer Architecture.Google Scholar
- Wangyuan Zhang and Tao Li. 2009a. Characterizing and mitigating the impact of process variations on phase change based memory systems. In Proc. Intl. Symposium on Microarchitecture. Google ScholarDigital Library
- Wangyuan Zhang and Tao Li. 2009b. Exploring phase change memory and 3D die-stacking for power/thermal friendly, fast and durable memory architectures. In Proc. Intl. Conference on Parallel Architectures and Compilation Techniques. Google ScholarDigital Library
- Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In Proc. Intl. Symposium on Computer Architecture. Google ScholarDigital Library
Index Terms
- ECS: Error-Correcting Strings for Lifetime Improvements in Nonvolatile Memories
Recommendations
CompEx++: Compression-Expansion Coding for Energy, Latency, and Lifetime Improvements in MLC/TLC NVMs
Multilevel/triple-level cell nonvolatile memories (MLC/TLC NVMs) such as phase-change memory (PCM) and resistive RAM (RRAM) are the subject of active research and development as replacement candidates for DRAM, which is limited by its high refresh power ...
A Novel Memory Block Management Scheme for PCM Using WOM-Code
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and SystemsPhase Change Memory (PCM) is a promising DRAM replacement in embedded systems due to its attractive characteristics including low static power consumption and high density. However, long write latency is one of the major drawbacks in current PCM ...
Energy efficient Phase Change Memory based main memory for future high performance systems
IGCC '11: Proceedings of the 2011 International Green Computing Conference and WorkshopsPhase Change Memory (PCM) has recently attracted a lot of attention as a scalable alternative to DRAM for main memory systems. As the need for high-density memory increases, DRAM has proven to be less attractive from the point of view of scaling and ...
Comments