skip to main content
research-article

Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change Memories

Published:07 October 2019Publication History
Skip Abstract Section

Abstract

Phase-change memory (PCM) devices have multiple banks to serve memory requests in parallel. Unfortunately, if two requests go to the same bank, they have to be served one after another, leading to lower system performance. We observe that a modern PCM bank is implemented as a collection of partitions that operate mostly independently while sharing a few global peripheral structures, which include the sense amplifiers (to read) and the write drivers (to write). Based on this observation, we propose PALP, a new mechanism that enables partition-level parallelism within each PCM bank, and exploits such parallelism by using the memory controller’s access scheduling decisions. PALP consists of three new contributions. First, we introduce new PCM commands to enable parallelism in a bank’s partitions in order to resolve the read-write bank conflicts, with no changes needed to PCM logic or its interface. Second, we propose simple circuit modifications that introduce a new operating mode for the write drivers, in addition to their default mode of serving write requests. When configured in this new mode, the write drivers can resolve the read-read bank conflicts, working jointly with the sense amplifiers. Finally, we propose a new access scheduling mechanism in PCM that improves performance by prioritizing those requests that exploit partition-level parallelism over other requests, including the long outstanding ones. While doing so, the memory controller also guarantees starvation-freedom and the PCM’s running-average-power-limit (RAPL).

We evaluate PALP with workloads from the MiBench and SPEC CPU2017 Benchmark suites. Our results show that PALP reduces average PCM access latency by 23%, and improves average system performance by 28% compared to the state-of-the-art approaches.

References

  1. Shoaib Akram, Jennifer B. Sartor, Kathryn S. McKinley, and Lieven Eeckhout. 2018. Write-rationing garbage collection for hybrid memories. In Programming Language Design and Implementation (PLDI).Google ScholarGoogle Scholar
  2. Mohammad Arjomand, Mahmut T. Kandemir, Anand Sivasubramaniam, and Chita R. Das. 2016. Boosting access parallelism to PCM-based main memory. In International Symposium on Computer Architecture (ISCA).Google ScholarGoogle Scholar
  3. Alasdair Armstrong, Thomas Bauereiss, Brian Campbell, Alastair Reid, Kathryn E. Gray, Robert M. Norton, Prashanth Mundkur, Mark Wassell, Jon French, Christopher Pulte, Shaked Flur, Ian Stark, Neel Krishnaswami, and Peter Sewell. 2019. ISA semantics for ARMv8-a, RISC-v, and CHERI-MIPS. Proceedings of the ACM on Programming Languages (POPL) 3, Article 71 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. JEDEC Solid State Technology Association et al. 2012. JEDEC standard: DDR4 SDRAM. JESD79-4, Sep (2012).Google ScholarGoogle Scholar
  5. Gerald John Barkley, Daniele Vimercati, and Pierguido Garofalo. 2017. Apparatus and methods to perform read-while write (RWW) operations. US Patent App. 15/688,667.Google ScholarGoogle Scholar
  6. Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Santiago Bock, Bruce R. Childers, Rami Melhem, and Daniel Mossé. 2016. Concurrent migration of multiple optoptpages in software-managed hybrid main memory. In International Conference on Computer Design (ICCD).Google ScholarGoogle Scholar
  8. James Bucek, Klaus-Dieter Lange, et al. 2018. SPEC CPU2017: Next-generation compute benchmark. In International Conference on Performance Engineering (ICPE).Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Castellani, G. Navarro, V. Sousa, P. Zuliani, R. Annunziata, M. Borghi, L. Perniola, and G. Reimbold. 2016. Comparative analysis of program/read disturb robustness for GeSbTe-based phase-change memory devices. In International Memory Workshop (IMW).Google ScholarGoogle Scholar
  10. Karthik Chandrasekar, Christian Weis, Yonghui Li, Benny Akesson, Norbert Wehn, and Kees Goossens. 2012. DRAMPower: Open-source DRAM power 8 energy estimation tool. URL: http://www.drampower.info (2012).Google ScholarGoogle Scholar
  11. Baek-Hyung Cho, Woo-Yeong Cho, Hyung-Rok Oh, and Byung-Gil Choi. 2005. Programming method of controlling the amount of write current applied to phase change memory device and write driver circuit therefor. US Patent 6,885,602.Google ScholarGoogle Scholar
  12. Sangyeun Cho and Hyunjin Lee. 2009. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In Symposium on Microarchitecture (MICRO).Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Howard David, Eugene Gorbatov, Ulf R Hanebutte, Rahul Khanna, and Christian Le. 2010. RAPL: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISLPED).Google ScholarGoogle Scholar
  14. Gaurav Dhiman, Raid Ayoub, and Tajana Rosing. 2009. PDRAM: A hybrid PRAM and DRAM main memory system. In Design Automation Conference (DAC).Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cyrille Dray and Liqiong Wei. 2018. High voltage tolerant word-line driver. US Patent 9,875,783.Google ScholarGoogle Scholar
  16. Khaled El Emam, Walcelio Melo, and Jean-Normand Drouin. 1997. SPICE: The Theory and Practice of Software Process Improvement and Capability Determination.Google ScholarGoogle Scholar
  17. Jean-Jacques Fagot, Philippe Boivin, Vincenzo Della-Marca, Jeremie Postel-Pellerin, Damien Deleruyelle, Olivier Weber, Emmanuel Richard, and Franck Arnaud. 2018. Low cost diode as selector device for embedded phase change memory in advanced FD-SOI technology. In International Memory Workshop (IMW).Google ScholarGoogle ScholarCross RefCross Ref
  18. Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali, and Onur Mutlu. 2019. Demystifying complex workload-DRAM interactions: An experimental study. In SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Akira Goda, Tommaso Vali, Carmine Miccoli, and Pranav Kalavade. 2018. Programming memory devices. US Patent App. 15/477,048.Google ScholarGoogle Scholar
  20. Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Workshop on Workload Characterization (WWC).Google ScholarGoogle Scholar
  21. Tae Jun Ham, Bharath K. Chelepalli, Neng Xue, and Benjamin C. Lee. 2013. Disintegrated control for energy-efficient and heterogeneous memory systems. In High Performance Computer Architecture (HPCA).Google ScholarGoogle Scholar
  22. Hasan Hassan, Minesh Patel, Jeremie S. Kim, A. Giray Yaglikci, Nandita Vijaykumar, Nika Mansouri Ghiasi, Saugata Ghose, and Onur Mutlu. 2019. CROW: A low-cost substrate for improving DRAM performance, energy efficiency, and reliability. In International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, and Onur Mutlu. 2016. ChargeCache: Reducing DRAM latency by exploiting row access locality. In High Performance Computer Architecture (HPCA).Google ScholarGoogle Scholar
  24. Jingtong Hu, Chun Jason Xue, Qingfeng Zhuge, Wei-Che Tseng, and Edwin H.-M. Sha. 2013. Write activity reduction on non-volatile main memories for embedded chip multiprocessors. ACM Transactions on Embedded Computing 12, 3 (2013).Google ScholarGoogle Scholar
  25. Yazhi Huang, Tiantian Liu, and Chun Jason Xue. 2011. Register aloptlocation for write activity minimization on non-volatile main memory. In Asia South Pacific Design Automation Conference (ASP-DAC).Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lei Jiang, Youtao Zhang, Bruce R. Childers, and Jun Yang. 2012. FPB: Fine-grained power budgeting to improve write throughput of multi-level cell phase change memory. In Symposium on Microarchitecture (MICRO).Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yoongu Kim, Dongsu Han, Onur Mutlu, and Mor Harchol-Balter. 2010. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In High Performance Computer Architecture (HPCA).Google ScholarGoogle Scholar
  28. Yoongu Kim, Vivek Seshadri, Donghyuk Lee, Jamie Liu, and Onur Mutlu. 2012. A case for exploiting subarray-level parallelism (SALP) in DRAM. In International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarCross RefCross Ref
  29. Yoongu Kim, Weikun Yang, and Onur Mutlu. 2016. Ramulator: A fast and extensible DRAM simulator. Computer Architecture Letters 15, 1 (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2010. Phase change memory architecture and the quest for scalability. Commun. ACM 53, 7 (2010).Google ScholarGoogle Scholar
  32. B. C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, O. Mutlu, and D. Burger. 2010. Phase-change technology and the future of main memory. IEEE Micro 30, 1 (2010).Google ScholarGoogle Scholar
  33. Chang Joo Lee, Veynu Narasiman, Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. 2010. DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. (2010).Google ScholarGoogle Scholar
  34. Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, and Onur Mutlu. 2017. Utility-based hybrid memory management. In Conference on Cluster Computing (CLUSTER).Google ScholarGoogle ScholarCross RefCross Ref
  35. Ye-Jyun Lin, Chia-Lin Yang, Hsiang-Pang Li, and Cheng-Yuan Michael Wang. 2017. A hybrid dram/pcm buffer cache architecture for smartphones with qos consideration. ACM Transactions on Design Automation of Electronic Systems 22, 2 (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sihang Liu, Aasheesh Kolli, Jinglei Ren, and Samira Khan. 2018. Crash consistency in encrypted non-volatile main memory systems. In High Performance Computer Architecture (HPCA).Google ScholarGoogle Scholar
  37. Hsiang-Lan Lung, Christopher P. Miller, Chia-Jung Chen, Scott C. Lewis, Jack Morrish, Tony Perri, Richard C. Jordan, Hsin-Yi Ho, Tu-Shun Chen, Wei-Chih Chien, et al. 2016. A double-data-rate 2 (DDR2) interface phase-change memory with 533MB/s read-write data rate and 37.5 ns access latency for memory-type storage class memory applications. In International Memory Workshop (IMW).Google ScholarGoogle Scholar
  38. Justin Meza, Jichuan Chang, HanBin Yoon, Onur Mutlu, and Parthasarathy Ranganathan. 2012. Enabling efficient and scalable hybrid memories using fine-granularity DRAM cache management. Computer Architecture Letters 11, 2 (2012).Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Justin Meza, Yixin Luo, Samira Khan, Jishen Zhao, Yuan Xie, and Onur Mutlu. 2013. A case for efficient hardware/software cooperative management of storage and memory. In Proceedings of the Workshop on Energy-Efficient Design (WEED).Google ScholarGoogle Scholar
  40. D. D. R Micron. 2014. SDRAM, 4Gb: x4, x8, x16 DDR4 SDRAM features, white paper. Micron Technology, Inc (2014).Google ScholarGoogle Scholar
  41. Gabriele Navarro, Guillaume Bourgeois, Julia Kluge, Anna Lisa Serra, Anthonin Verdy, Julien Garrione, Marie-Claire Cyrille, Nicolas Bernier, Audrey Jannaud, Chiara Sabbione, et al. 2018. Phase-change memory: Performance, roles and challenges. In International Memory Workshop (IMW).Google ScholarGoogle ScholarCross RefCross Ref
  42. T. Nirschl, J. B. Philipp, T. D. Happ, Geoffrey W. Burr, B. Rajendran, M.-H. Lee, A. Schrott, M. Yang, M. Breitwisch, C.-F. Chen, et al. 2007. Write strategies for 2 and 4-bit multi-level phase-change memory. In International Electron Devices Meeting (IEDM).Google ScholarGoogle ScholarCross RefCross Ref
  43. S. Ovshinsky. 1968. Reversible electrical switching phenomena in disordered structures. Physical Review Letters (1968).Google ScholarGoogle Scholar
  44. Reena Panda, Shuang Song, Joseph Dean, and Lizy K. John. 2018. Wait of a decade: Did SPEC CPU 2017 broaden the performance horizon?. In High Performance Computer Architecture (HPCA).Google ScholarGoogle Scholar
  45. Bahareh Pourshirazi, Majed Valad Beigi, Zhichun Zhu, and Gokhan Memik. 2018. WALL: A writeback-aware LLC management for PCM-based main memory systems. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE).Google ScholarGoogle Scholar
  46. Bahareh Pourshirazi, Majed Valad Beigi, Zhichun Zhu, and Gokhan Memik. 2019. Writeback-aware LLC management for PCM-based main memory systems. ACM Transactions on Design Automation of Electronic Systems 24, 2 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Moinuddin K. Qureshi, Michele M. Franceschini, Ashish Jagmohan, and Luis A. Lastras. 2012. PreSET: Improving performance of phase change memories by exploiting asymmetry in write times. In International Symposium on Computer Architecture (ISCA).Google ScholarGoogle Scholar
  48. Moinuddin K. Qureshi, Michele M. Franceschini, Luis A. Lastras-Montaño, and John P. Karidis. 2010. Morphable memory system: A robust architecture for exploiting multi-level phase change memories. In International Symposium on Computer Architecture (ISCA).Google ScholarGoogle Scholar
  49. Moinuddin K. Qureshi, Michele M. Franceschini, and Luis A. Lastras-Montano. 2010. Improving read performance of phase change memories via write cancellation and write pausing. In High Performance Computer Architecture (HPCA).Google ScholarGoogle Scholar
  50. Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In International Symposium on Computer Architecture (ISCA).Google ScholarGoogle Scholar
  51. Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens. 2000. Memory access scheduling. In International Symposium on Computer Architecture (ISCA).Google ScholarGoogle Scholar
  52. Bal S. Sandhu, Cezary Pietrzyk, and George McNeil Lattimore. 2018. Memory write driver, method and system. US Patent App. 15/904,848.Google ScholarGoogle Scholar
  53. P. Schuddinck, M. Badaroglu, M. Stucchi, S. Demuynck, A. Hikavyy, M. Garcia-Bardon, A. Mercha, A. Mallik, T. Chiarella, S. Kubicek, et al. 2012. Standard cell level parasitics assessment in 20nm BPL and 14nm BFF. In International Electron Devices Meeting (IEDM).Google ScholarGoogle ScholarCross RefCross Ref
  54. Nak Hee Seong, Dong Hyuk Woo, and Hsien-Hsin S. Lee. 2010. Security refresh: Prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized optaddress mapping. In International Symposium on Computer Architecture (ISCA).Google ScholarGoogle Scholar
  55. Vivek Seshadri, Abhishek Bhowmick, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2014. The dirty-block index. In International Symposium on Computer Architecture (ISCA).Google ScholarGoogle Scholar
  56. Saurabh Sinha, Greg Yeric, Vikas Chandra, Brian Cline, and Yu Cao. 2012. Exploring sub-20nm FinFET design with predictive technology models. In Design Automation Conference (DAC).Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Shihao Song, Anup Das, Onur Mutlu, and Nagrajan Kandasamy. 2019. Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change Memories. https://github.com/drexel-DISCO/PALP.Google ScholarGoogle Scholar
  58. Jeffrey Stuecheli, Dimitris Kaseridis, David Daly, Hillery C. Hunter, and Lizy K. John. 2010. The virtual write queue: Coordinating DRAM and last-level cache policies. In International Symposium on Computer Architecture (ISCA).Google ScholarGoogle Scholar
  59. C. Villa. 2018. PCM array architecture and management. In Phase Change Memory.Google ScholarGoogle Scholar
  60. Corrado Villa, Duane Mills, Gerald Barkley, Hari Giduturi, Stefan Schippers, and Daniele Vimercati. 2010. A 45nm 1Gb 1.8 V phase-change memory. In International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).Google ScholarGoogle Scholar
  61. Zhe Wang, Shuchang Shan, Ting Cao, Junli Gu, Yi Xu, Shuai Mu, Yuan Xie, and Daniel A. Jiménez. 2013. WADE: Writeback-aware dynamic cache management for NVM-based main memory system. ACM Transactions on Architecture and Code Optimization (TACO) 10, 4 (2013).Google ScholarGoogle Scholar
  62. Fei Xia, Dejun Jiang, Jin Xiong, Mingyu Chen, Lixin Zhang, and Ninghui Sun. 2014. DWC: Dynamic write consolidation for phase change memory systems. In International Conference on Supercomputing (ICS).Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Shiying Xiong and Jeffrey Bokor. 2003. Sensitivity of double-gate and FinFETDevices to process variations. IEEE Transactions on Electron Devices 50, 11 (2003).Google ScholarGoogle Scholar
  64. HanBin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael A. Harding, and Onur Mutlu. 2012. Row buffer locality aware caching policies for hybrid memories. In International Conference on Computer Design (ICCD).Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Hanbin Yoon, Justin Meza, Naveen Muralimanohar, Norman P. Jouppi, and Onur Mutlu. 2015. Efficient data mapping and buffering techniques for multilevel cell phase-change memories. ACM Transactions on Architecture and Code Optimization (TACO) 11, 4 (2015).Google ScholarGoogle Scholar
  66. Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, Onur Mutlu, and Srinivas Devadas. 2017. Banshee: Bandwidth-efficient DRAM caching via software/hardware cooperation. In Symposium on Microarchitecture (MICRO).Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Jianhui Yue and Yifeng Zhu. 2013. Accelerating write by exploiting PCM asymmetries. In High Performance Computer Architecture (HPCA).Google ScholarGoogle Scholar
  68. Jianhui Yue and Yifeng Zhu. 2013. Exploiting subarrays inside a bank to improve phase change memory performance. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE).Google ScholarGoogle Scholar
  69. Lunkai Zhang, Brian Neely, Diana Franklin, Dmitri Strukov, Yuan Xie, and Frederic T Chong. 2016. Mellow writes: Extending lifetime in resistive memories through selective slow write backs. In International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Jishen Zhao, Onur Mutlu, and Yuan Xie. 2014. FIRM: Fair and high-performance memory control for persistent memory systems. In Symposium on Microarchitecture (MICRO).Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Wen Zhou, Dan Feng, Yu Hua, Jingning Liu, Fangting Huang, and Yu Chen. 2016. An efficient parallel scheduling scheme on multi-partition PCM architecture. In International Symposium on Low Power Electronics and Design (ISLPED).Google ScholarGoogle Scholar

Index Terms

  1. Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change Memories

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 18, Issue 5s
        Special Issue ESWEEK 2019, CASES 2019, CODES+ISSS 2019 and EMSOFT 2019
        October 2019
        1423 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/3365919
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 October 2019
        • Accepted: 1 July 2019
        • Revised: 1 June 2019
        • Received: 1 April 2019
        Published in tecs Volume 18, Issue 5s

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format