skip to main content
research-article
Free Access

ECS: Error-Correcting Strings for Lifetime Improvements in Nonvolatile Memories

Published:13 December 2017Publication History
Skip Abstract Section

Abstract

Emerging nonvolatile memories (NVMs) suffer from low write endurance, resulting in early cell failures (hard errors), which reduce memory lifetime. It was recognized early on that conventional error-correcting codes (ECCs), which are designed for soft errors, are a poor choice for addressing hard errors in NVMs. This led to the evolution of hard error correction schemes like dynamically replicated memory (DRM), error-correcting pointers (ECPs), SAFER, FREE-p, PAYG, and Zombie memory to improve NVM lifetime. Whereas these approaches made significant inroads in addressing hard errors and low memory lifetime in NVMs, overcoming the challenges of underutilization of error-correcting resources and/or implementation overhead (e.g., codec latency, hardware support) remain areas of active research and development.

This article proposes error-correcting strings (ECSs) as a high-utilization, low-latency solution for hard error correction in single-/multi-/triple-level cell (SLC/MLC/TLC) NVMs. At its core, ECS adopts a base-offset approach to store pointers to the failed memory cells; in this work, base is the address of the first failed cell in a memory block and offsets are the distances between successive failed cells in that memory block. Unlike ECP, which uses fixed-length pointers, ECS uses variable-length offsets to point to the failed cells, thereby realizing more pointers to tolerate more hard errors per memory block. Further, this article proposes eXtended-ECS (XECS), a page-level error correction architecture, which employs dynamic on-demand ECS allocation and opportunistic pattern-based data compression to improve NVM lifetime by 2× over ECP-6 for comparable overhead and negligible impact to system performance. Finally, this article demonstrates that ECS is a drop-in replacement for ECP to extend the lifetime of state-of-the-art ECP-based techniques like PAYG and Zombie memory; ECS is also compatible with MLC/TLC NVMs, where it complements drift-induced soft error reduction techniques like ECC and incomplete data mapping to simultaneously extend NVM lifetime.

References

  1. 2011. International Technology Roadmap for Semiconductors.Google ScholarGoogle Scholar
  2. Alaa R. Alameldeen, I. Wagner, Z. Chishti, W. Wu, C. Wilkerson, and S. L. Lu. 2011. Energy-efficient cache design using variable-strength error-correcting codes. In Proc. Intl. Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alaa R. Alameldeen and David A. Wood. 2004. Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches. Department of Computer Science, University of Wisconsin-Madison, Technical Report.Google ScholarGoogle Scholar
  4. Manu Awasthi, Manjunath Shevgoor, Kshitij Sudan, Bipin Rajendran, Rajeev Balasubramonian, and Viji Srinivasan. 2012. Efficient scrub mechanisms for error-prone emerging memories. In Proc. Intl. Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Rodolfo Azevedo, John D. Davis, Karin Strauss, Parikshit Gopalan, Mark Manasse, and Sergey Yekhanin. 2013. Zombie memory: Extending memory lifetime by reviving dead blocks. In Proc. Intl. Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Geoffrey W. Burr, Matthew J. Breitwisch, Michele Franceschini, Davide Garetto, Kailash Gopalakrishnan, Bryan Jackson, Bülent Kurdi, Chung Lam, Luis A. Lastras, Alvaro Padilla, Bipin Rajendran, Simone Raoux, and Rohit S. Shenoy. 2010. Phase change memory technology. Journal of Vacuum Science and Technology B 28(2010), 223--262.Google ScholarGoogle ScholarCross RefCross Ref
  7. Yu Cai, Saugata Ghose, Erich F. Haratsch, Yixin Luo, and Onur Mutlu. 2017. Error characterization, mitigation, and recovery in flash-memory-based solid-state drives. Proceedings of the IEEE (2017).Google ScholarGoogle ScholarCross RefCross Ref
  8. Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai. 2013. Threshold voltage distribution in MLC NAND flash memory: Characterization, analysis, and modeling. In 2013 Design, Automation Test in Europe Conference Exhibition (DATE’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yu Cai, Yixin Luo, Saugata Ghose, and Onur Mutlu. 2015a. Read disturb errors in MLC NAND flash memory: Characterization, mitigation, and recovery. In Proc. Intl. Conference on Dependable Systems and Networks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yu Cai, Yixin Luo, Erich F. Haratsch, Ken Mai, and Onur Mutlu. 2015b. Data retention in MLC NAND flash memory: Characterization, optimization, and recovery. In Proc. Intl. Symposium on High Performance Computer Architecture.Google ScholarGoogle ScholarCross RefCross Ref
  11. Sheng-Wei Cheng, Yuan-Hao Chang, Tseng-Yi Chen, Yu-Fen Chang, Hsin-Wen Wei, and Wei-Kuan Shih. 2016. Efficient warranty-aware wear leveling for embedded systems with PCM main memory. IEEE Transactions on Very Large Scale Integration Systems 24 (2016), 2535--2547.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sangyeun Cho and Hyunjin Lee. 2009. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In Proc. Intl. Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Youngdon Choi, Ickhyun Song, Mu-Hui Park, Hoeju Chung, Sanghoan Chang, Beakhyoung Cho, Jinyoung Kim, Younghoon Oh, Duckmin Kwon, Jung Sunwoo, J. Shin, Y. Rho, C. Lee, M. G. Kang, J. Lee, Y. Kwon, S. Kim, J. Kim, Y. J. Lee, Q. Wang, S. Cha, S. Ahn, H. Horii, J. Lee, K. Kim, H. Joo, K. Lee, Y. T. Lee, J. Yoo, and G. Jeong. 2012. A 20nm 1.8V 8Gb PRAM with 40MB/s program bandwidth. In IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC’12).Google ScholarGoogle Scholar
  14. Jie Fan, Song Jiang, Jiwu Shu, Youhui Zhang, and Weimin Zhen. 2013. Aegis: Partitioning data block for efficient recovery of stuck-at-faults in phase change memory. In Proc. Intl. Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alexandre P. Ferreira, Miao Zhou, Santiago Bock, Bruce Childers, Rami Melhem, and Daniel Mossé. 2010. Increasing PCM main memory lifetime. In Proc. Design, Automation 8 Test in Europe Conference 8 Exhibition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Aya Fukami, Saugata Ghose, Yixin Luo, Yu Cai, and Onur Mutlu. 2017. Improving the reliability of chip-off forensic analysis of NAND flash memory devices. Digital Investigation (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. In ACM SIGARCH Computer Architecture News. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Engin Ipek, Jeremy Condit, Edmund B. Nightingale, Doug Burger, and Thomas Moscibroda. 2010. Dynamically replicated memory: Building reliable systems from nanoscale resistive memories. In Proc. Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. N. Jacobvitz, R. Calderbank, and D. J. Sorin. 2013. Coset coding to extend the lifetime of memory. In Proc. Intl. Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lei Jiang, Youtao Zhang, and Jun Yang. 2012. ER: Elastic RESET for low power and long endurance MLC based phase change memory. In Proc. Intl. Symposium on Low Power Electronics and Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sachhidh Kannan, Jeyavijayan Rajendran, Ramesh Karri, and Ozgur Sinanoglu. 2013. Sneak-path testing of crossbar-based non-volatile random access memories. IEEE Transactions on Nanotechnology 12 (2013), 413--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jungrae Kim, Michael Sullivan, Seong-Lyong Gong, and Mattan Erez. 2015. Frugal ECC: Efficient and versatile memory error protection through fine-grained compression. In Proc. Intl. Conference for High Performance Computing, Networking, Storage and Analysis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable DRAM alternative. In Proc. Intl. Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shu Lin and Daniel J. Costello. 2004. Error Control Coding. Pearson Higher Education.Google ScholarGoogle Scholar
  25. Duo Liu, Tianzheng Wang, Yi Wang, Zili Shao, Qingfeng Zhuge, and Edwin Sha. 2013. Curling-PCM: Application-specific wear leveling for phase change memory based embedded systems. In Proc. Intl. Asia and South Pacific Design Automation Conference.Google ScholarGoogle Scholar
  26. Ren-Shuo Liu, Meng-Yen Chuang, Chia-Lin Yang, Cheng-Hsuan Li, Kin-Chu Ho, and Hsiang-Pang Li. 2014. EC-Cache: Exploiting error locality to optimize LDPC in NAND flash-based SSDs. In Proc. Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yixin Luo, Saugata Ghose, Yu Cai, Erich F. Haratsch, and Onur Mutlu. 2016. Enabling accurate and practical online flash channel modeling for modern MLC NAND flash memory. IEEE Journal on Selected Areas in Communications 34 (2016), 2294--2311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Rakan Maddah, Sangyeun Cho, and Rami Melhem. 2013. Power of one bit: Increasing error correction capability with data inversion. In Proc. Intl Pacific Rim Symposium on Dependable Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Rakan Maddah, Sangyeun Cho, and Rami Melhem. 2016. Symbol shifting: Tolerating more faults in PCM blocks. IEEE Transactions on Computing 65 (2016), 2270--2283.Google ScholarGoogle ScholarCross RefCross Ref
  30. Rami Melhem, Rakan Maddah, and Sangyeun Cho. 2012. RDIS: A recursively defined invertible set scheme to tolerate multiple stuck-at faults in resistive memory. In Proc. Intl. Conference on Dependable Systems and Networks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Justin Meza, Qiang Wu, Sanjev Kumar, and Onur Mutlu. 2015a. A large-scale study of flash memory failures in the field. In ACM SIGMETRICS Performance Evaluation Review. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Justin Meza, Qiang Wu, Sanjeev Kumar, and Onur Mutlu. 2015b. Revisiting memory errors in large-scale production data centers: Analysis and modeling of new trends from the field. In Proc. Intl. Conference on Dependable Systems and Networks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sparsh Mittal. 2017. A survey of soft-error mitigation techniques for non-volatile memories. Computers (2017).Google ScholarGoogle Scholar
  34. Dimin Niu, Qiaosha Zou, Cong Xu, and Yuan Xie. 2013. Low power multi-level cell resistive memory design with incomplete data mapping. In Proc. Intl. Conference on Computer Design.Google ScholarGoogle ScholarCross RefCross Ref
  35. Poovaiah M. Palangappa, Jiayin Li, and Kartik Mohanram. 2016. WOM-Code solutions for low latency and high endurance in phase change memory. IEEE Transactions on Computing 4 (2016), 1025--1040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Poovaiah M. Palangappa and Kartik Mohanram. 2015. Flip-Mirror-Rotate: An architecture for bit-write reduction and wear leveling in non-volatile memories. In Proc. Great Lakes Symposium on VLSI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Poovaiah M. Palangappa and Kartik Mohanram. 2016. CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM. In Proc. Intl. Symposium on High-Performance Computer Architecture.Google ScholarGoogle Scholar
  38. David J. Palframan, Nam Sung Kim, and Mikko H. Lipasti. 2015. COP: To compress and protect main memory. In Proc. Intl. Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Avadh Patel, Furat Afram, and Kanad Ghose. 2011. Marss-x86: A qemu-based micro-architectural and systems simulator for x86 multicore processors. In Proc. Design Automation Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G. Pekhimenko, E. Bolotin, N. Vijaykumar, O. Mutlu, T. C. Mowry, and S. W. Keckler. 2016. A case for toggle-aware compression for GPU systems. In Proc. Intl. Symposium on High-Performance Computer Architecture.Google ScholarGoogle Scholar
  41. Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proc. Intl. Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Moinuddin Qureshi. 2011. Pay-As-You-Go: Low-overhead hard error correction for phase change memories. In Proc. Intl. Symposium Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Moinuddin Qureshi, John Karidis, Michele Franceschini, Vijayalakshmi Srinivasan, Luis Lastras, and Bulent Abali. 2009a. Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In Proc. Intl. Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Moinuddin Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009b. Scalable high performance main memory system using phase change memory technology. In Proc. Intl. Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Stuart Schechter, Gabriel H. Loh, Karin Straus, and Doug Burger. 2010. Use ECP, not ECC, for hard failures in resistive memories. In Proc. Intl. Symposium Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Bianca Schroeder and Garth Gibson. 2010. A large-scale study of failures in high-performance computing systems. IEEE Transactions on Dependable and Secure Computing 7 (2010), 337--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Bianca Schroeder, Raghav Lagisetty, and Arif Merchant. 2016. Flash reliability in production: The expected and the unexpected. In USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Bianca Schroeder, Eduardo Pinheiro, and Wolf-Dietrich Weber. 2009. DRAM errors in the wild: A large-scale field study. In ACM SIGMETRICS Performance Evaluation Review. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Nak Hee Seong, Dong Hyuk Woo, and Hsien-Hsin S. Lee. 2010a. Security refresh: Prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping. Proc. Intl. Symposium on Computer Architecture (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Nak Hee Seong, Dong Hyuk Woo, Vijayalakshmi Srinivasan, Jude A. Rivers, and Hsien-Hsin S. Lee. 2010b. SAFER: Stuck-at-fault error recovery for memories. In Proc. Intl. Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Nak Hee Seong, Sungkap Yeo, and Hsien-Hsin S. Lee. 2013. Tri-level-cell phase change memory: Toward an efficient and reliable memory system. In Proc. Intl. Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Vilas Sridharan, Nathan DeBardeleben, Sean Blanchard, Kurt B. Ferreira, Jon Stearley, John Shalf, and Sudhanva Gurumurthi. 2015. Memory errors in modern systems: The good, the bad, and the ugly. In Proc. Intl. Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Shivam Swami and Kartik Mohanram. 2016. ER: Energy efficient error recovery for multi/triple-Level cell non-volatile memories. In Proc. Intl. Conference on VLSI Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Shivam Swami and Kartik Mohanram. 2017. Reliable non-volatile memories: Techniques and measures. IEEE Design 8 Test (2017).Google ScholarGoogle Scholar
  55. Cong Xu, Dimin Niu, Naveen Muralimanohar, Norman P. Jouppi, and Yuan Xie. 2013. Understanding the trade-offs in multi-level cell ReRAM memory design. In Proc. Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Chun Jason Xue, Youtao Zhang, Yiran Chen, Guangyu Sun, J. Jianhua Yang, and Hai Li. 2011. Emerging non-volatile memories: Opportunities and challenges. In Proc. Intl. Conference on Hardware/Software Codesign and System Synthesis. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Byung-Do Yang, Jae-Eun Lee, Jang-Su Kim, Junghyun Cho, Seung-Yun Lee, and Byoung-Gon Yu. 2007. A low power phase change random access memory using a data-comparison write scheme. In Proc. Intl. Symposium on Circuits and Systems.Google ScholarGoogle ScholarCross RefCross Ref
  58. Doe Hyun Yoon, Jichuan Chang, Robert S. Schreiber, and Norman P. Jouppi. 2013. Practical non-volatile multi-level cell phase change memory. In Proc. Intl. Conference on High Performance Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Doe Hyun Yoon, Naveen Muralimanohar, Jichuan Chang, Parthasarathy Ranganathan, Norman P. Jouppi, and Mattan Erez. 2011. FREE-p: Protecting non-volatile memory against both hard and soft errors. In Proc. Intl. Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Vinson Young, Prashant Nair, and Moinuddin Qureshi. 2015. DEUCE: Write-efficient encryption for non-volatile memories. In Proc. Intl. Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Jianhui Yue and Yifeng Zhu. 2013. Accelerating write by exploiting PCM asymmetries. In Proc. Intl. Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Lunkai Zhang, Brian Neely, Diana Franklin, Dmitri Strukov, Yuan Xie, and Frederic T. Chong. 2016. Mellow writes: Extending lifetime in resistive memories through selective slow write backs. In Proc. Intl. Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Mingzhe Zhang, Lunkai Zhang, Lei Jiang, Zhiyong Liu, and Frederic T. Chong. 2017. Balancing performance and lifetime of MLC PCM by using a region retention monitor. In Proc. Intl. Symposium on High-Performance Computer Architecture.Google ScholarGoogle Scholar
  64. Wangyuan Zhang and Tao Li. 2009a. Characterizing and mitigating the impact of process variations on phase change based memory systems. In Proc. Intl. Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Wangyuan Zhang and Tao Li. 2009b. Exploring phase change memory and 3D die-stacking for power/thermal friendly, fast and durable memory architectures. In Proc. Intl. Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In Proc. Intl. Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ECS: Error-Correcting Strings for Lifetime Improvements in Nonvolatile Memories

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 14, Issue 4
      December 2017
      600 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/3154814
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 December 2017
      • Revised: 1 September 2017
      • Accepted: 1 September 2017
      • Received: 1 March 2017
      Published in taco Volume 14, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader