research-article

Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change Memories

Authors:
Shihao Song

Drexel University, Philadelphia, Pennsylvania

Drexel University, Philadelphia, Pennsylvania
View Profile

,
Anup Das

Drexel University, Philadelphia, Pennsylvania

Drexel University, Philadelphia, Pennsylvania

0000-0002-5673-2636
View Profile

,
Onur Mutlu

ETH Zürich, Zürich, Switzerland

ETH Zürich, Zürich, Switzerland
View Profile

,
Nagarajan Kandasamy

Drexel University, Philadelphia, Pennsylvania

Drexel University, Philadelphia, Pennsylvania
View Profile

Authors Info & Claims

ACM Transactions on Embedded Computing Systems Volume 18 Issue 5sArticle No.: 53pp 1–25https://doi.org/10.1145/3358180

Published:07 October 2019Publication History

ACM Transactions on Embedded Computing Systems

Abstract

Phase-change memory (PCM) devices have multiple banks to serve memory requests in parallel. Unfortunately, if two requests go to the same bank, they have to be served one after another, leading to lower system performance. We observe that a modern PCM bank is implemented as a collection of partitions that operate mostly independently while sharing a few global peripheral structures, which include the sense amplifiers (to read) and the write drivers (to write). Based on this observation, we propose PALP, a new mechanism that enables partition-level parallelism within each PCM bank, and exploits such parallelism by using the memory controller’s access scheduling decisions. PALP consists of three new contributions. First, we introduce new PCM commands to enable parallelism in a bank’s partitions in order to resolve the read-write bank conflicts, with no changes needed to PCM logic or its interface. Second, we propose simple circuit modifications that introduce a new operating mode for the write drivers, in addition to their default mode of serving write requests. When configured in this new mode, the write drivers can resolve the read-read bank conflicts, working jointly with the sense amplifiers. Finally, we propose a new access scheduling mechanism in PCM that improves performance by prioritizing those requests that exploit partition-level parallelism over other requests, including the long outstanding ones. While doing so, the memory controller also guarantees starvation-freedom and the PCM’s running-average-power-limit (RAPL).

We evaluate PALP with workloads from the MiBench and SPEC CPU2017 Benchmark suites. Our results show that PALP reduces average PCM access latency by 23%, and improves average system performance by 28% compared to the state-of-the-art approaches.

References

Shoaib Akram, Jennifer B. Sartor, Kathryn S. McKinley, and Lieven Eeckhout. 2018. Write-rationing garbage collection for hybrid memories. In Programming Language Design and Implementation (PLDI).Google Scholar
Mohammad Arjomand, Mahmut T. Kandemir, Anand Sivasubramaniam, and Chita R. Das. 2016. Boosting access parallelism to PCM-based main memory. In International Symposium on Computer Architecture (ISCA).Google Scholar
Alasdair Armstrong, Thomas Bauereiss, Brian Campbell, Alastair Reid, Kathryn E. Gray, Robert M. Norton, Prashanth Mundkur, Mark Wassell, Jon French, Christopher Pulte, Shaked Flur, Ian Stark, Neel Krishnaswami, and Peter Sewell. 2019. ISA semantics for ARMv8-a, RISC-v, and CHERI-MIPS. Proceedings of the ACM on Programming Languages (POPL) 3, Article 71 (2019).Google ScholarDigital Library
JEDEC Solid State Technology Association et al. 2012. JEDEC standard: DDR4 SDRAM. JESD79-4, Sep (2012).Google Scholar
Gerald John Barkley, Daniele Vimercati, and Pierguido Garofalo. 2017. Apparatus and methods to perform read-while write (RWW) operations. US Patent App. 15/688,667.Google Scholar
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, et al. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011).Google ScholarDigital Library
Santiago Bock, Bruce R. Childers, Rami Melhem, and Daniel Mossé. 2016. Concurrent migration of multiple optoptpages in software-managed hybrid main memory. In International Conference on Computer Design (ICCD).Google Scholar
James Bucek, Klaus-Dieter Lange, et al. 2018. SPEC CPU2017: Next-generation compute benchmark. In International Conference on Performance Engineering (ICPE).Google ScholarDigital Library
N. Castellani, G. Navarro, V. Sousa, P. Zuliani, R. Annunziata, M. Borghi, L. Perniola, and G. Reimbold. 2016. Comparative analysis of program/read disturb robustness for GeSbTe-based phase-change memory devices. In International Memory Workshop (IMW).Google Scholar
Karthik Chandrasekar, Christian Weis, Yonghui Li, Benny Akesson, Norbert Wehn, and Kees Goossens. 2012. DRAMPower: Open-source DRAM power 8 energy estimation tool. URL: http://www.drampower.info (2012).Google Scholar
Baek-Hyung Cho, Woo-Yeong Cho, Hyung-Rok Oh, and Byung-Gil Choi. 2005. Programming method of controlling the amount of write current applied to phase change memory device and write driver circuit therefor. US Patent 6,885,602.Google Scholar
Sangyeun Cho and Hyunjin Lee. 2009. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In Symposium on Microarchitecture (MICRO).Google ScholarDigital Library
Howard David, Eugene Gorbatov, Ulf R Hanebutte, Rahul Khanna, and Christian Le. 2010. RAPL: Memory power estimation and capping. In International Symposium on Low Power Electronics and Design (ISLPED).Google Scholar
Gaurav Dhiman, Raid Ayoub, and Tajana Rosing. 2009. PDRAM: A hybrid PRAM and DRAM main memory system. In Design Automation Conference (DAC).Google ScholarDigital Library
Cyrille Dray and Liqiong Wei. 2018. High voltage tolerant word-line driver. US Patent 9,875,783.Google Scholar
Khaled El Emam, Walcelio Melo, and Jean-Normand Drouin. 1997. SPICE: The Theory and Practice of Software Process Improvement and Capability Determination.Google Scholar
Jean-Jacques Fagot, Philippe Boivin, Vincenzo Della-Marca, Jeremie Postel-Pellerin, Damien Deleruyelle, Olivier Weber, Emmanuel Richard, and Franck Arnaud. 2018. Low cost diode as selector device for embedded phase change memory in advanced FD-SOI technology. In International Memory Workshop (IMW).Google ScholarCross Ref
Saugata Ghose, Tianshi Li, Nastaran Hajinazar, Damla Senol Cali, and Onur Mutlu. 2019. Demystifying complex workload-DRAM interactions: An experimental study. In SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems.Google ScholarDigital Library
Akira Goda, Tommaso Vali, Carmine Miccoli, and Pranav Kalavade. 2018. Programming memory devices. US Patent App. 15/477,048.Google Scholar
Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Workshop on Workload Characterization (WWC).Google Scholar
Tae Jun Ham, Bharath K. Chelepalli, Neng Xue, and Benjamin C. Lee. 2013. Disintegrated control for energy-efficient and heterogeneous memory systems. In High Performance Computer Architecture (HPCA).Google Scholar
Hasan Hassan, Minesh Patel, Jeremie S. Kim, A. Giray Yaglikci, Nandita Vijaykumar, Nika Mansouri Ghiasi, Saugata Ghose, and Onur Mutlu. 2019. CROW: A low-cost substrate for improving DRAM performance, energy efficiency, and reliability. In International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, and Onur Mutlu. 2016. ChargeCache: Reducing DRAM latency by exploiting row access locality. In High Performance Computer Architecture (HPCA).Google Scholar
Jingtong Hu, Chun Jason Xue, Qingfeng Zhuge, Wei-Che Tseng, and Edwin H.-M. Sha. 2013. Write activity reduction on non-volatile main memories for embedded chip multiprocessors. ACM Transactions on Embedded Computing 12, 3 (2013).Google Scholar
Yazhi Huang, Tiantian Liu, and Chun Jason Xue. 2011. Register aloptlocation for write activity minimization on non-volatile main memory. In Asia South Pacific Design Automation Conference (ASP-DAC).Google ScholarDigital Library
Lei Jiang, Youtao Zhang, Bruce R. Childers, and Jun Yang. 2012. FPB: Fine-grained power budgeting to improve write throughput of multi-level cell phase change memory. In Symposium on Microarchitecture (MICRO).Google ScholarDigital Library
Yoongu Kim, Dongsu Han, Onur Mutlu, and Mor Harchol-Balter. 2010. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In High Performance Computer Architecture (HPCA).Google Scholar
Yoongu Kim, Vivek Seshadri, Donghyuk Lee, Jamie Liu, and Onur Mutlu. 2012. A case for exploiting subarray-level parallelism (SALP) in DRAM. In International Symposium on Computer Architecture (ISCA).Google ScholarCross Ref
Yoongu Kim, Weikun Yang, and Onur Mutlu. 2016. Ramulator: A fast and extensible DRAM simulator. Computer Architecture Letters 15, 1 (2016).Google ScholarDigital Library
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. In International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2010. Phase change memory architecture and the quest for scalability. Commun. ACM 53, 7 (2010).Google Scholar
B. C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, O. Mutlu, and D. Burger. 2010. Phase-change technology and the future of main memory. IEEE Micro 30, 1 (2010).Google Scholar
Chang Joo Lee, Veynu Narasiman, Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. 2010. DRAM-aware last-level cache writeback: Reducing write-caused interference in memory systems. (2010).Google Scholar
Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, and Onur Mutlu. 2017. Utility-based hybrid memory management. In Conference on Cluster Computing (CLUSTER).Google ScholarCross Ref
Ye-Jyun Lin, Chia-Lin Yang, Hsiang-Pang Li, and Cheng-Yuan Michael Wang. 2017. A hybrid dram/pcm buffer cache architecture for smartphones with qos consideration. ACM Transactions on Design Automation of Electronic Systems 22, 2 (2017).Google ScholarDigital Library
Sihang Liu, Aasheesh Kolli, Jinglei Ren, and Samira Khan. 2018. Crash consistency in encrypted non-volatile main memory systems. In High Performance Computer Architecture (HPCA).Google Scholar
Hsiang-Lan Lung, Christopher P. Miller, Chia-Jung Chen, Scott C. Lewis, Jack Morrish, Tony Perri, Richard C. Jordan, Hsin-Yi Ho, Tu-Shun Chen, Wei-Chih Chien, et al. 2016. A double-data-rate 2 (DDR2) interface phase-change memory with 533MB/s read-write data rate and 37.5 ns access latency for memory-type storage class memory applications. In International Memory Workshop (IMW).Google Scholar
Justin Meza, Jichuan Chang, HanBin Yoon, Onur Mutlu, and Parthasarathy Ranganathan. 2012. Enabling efficient and scalable hybrid memories using fine-granularity DRAM cache management. Computer Architecture Letters 11, 2 (2012).Google ScholarDigital Library
Justin Meza, Yixin Luo, Samira Khan, Jishen Zhao, Yuan Xie, and Onur Mutlu. 2013. A case for efficient hardware/software cooperative management of storage and memory. In Proceedings of the Workshop on Energy-Efficient Design (WEED).Google Scholar
D. D. R Micron. 2014. SDRAM, 4Gb: x4, x8, x16 DDR4 SDRAM features, white paper. Micron Technology, Inc (2014).Google Scholar
Gabriele Navarro, Guillaume Bourgeois, Julia Kluge, Anna Lisa Serra, Anthonin Verdy, Julien Garrione, Marie-Claire Cyrille, Nicolas Bernier, Audrey Jannaud, Chiara Sabbione, et al. 2018. Phase-change memory: Performance, roles and challenges. In International Memory Workshop (IMW).Google ScholarCross Ref
T. Nirschl, J. B. Philipp, T. D. Happ, Geoffrey W. Burr, B. Rajendran, M.-H. Lee, A. Schrott, M. Yang, M. Breitwisch, C.-F. Chen, et al. 2007. Write strategies for 2 and 4-bit multi-level phase-change memory. In International Electron Devices Meeting (IEDM).Google ScholarCross Ref
S. Ovshinsky. 1968. Reversible electrical switching phenomena in disordered structures. Physical Review Letters (1968).Google Scholar
Reena Panda, Shuang Song, Joseph Dean, and Lizy K. John. 2018. Wait of a decade: Did SPEC CPU 2017 broaden the performance horizon?. In High Performance Computer Architecture (HPCA).Google Scholar
Bahareh Pourshirazi, Majed Valad Beigi, Zhichun Zhu, and Gokhan Memik. 2018. WALL: A writeback-aware LLC management for PCM-based main memory systems. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE).Google Scholar
Bahareh Pourshirazi, Majed Valad Beigi, Zhichun Zhu, and Gokhan Memik. 2019. Writeback-aware LLC management for PCM-based main memory systems. ACM Transactions on Design Automation of Electronic Systems 24, 2 (2019).Google ScholarDigital Library
Moinuddin K. Qureshi, Michele M. Franceschini, Ashish Jagmohan, and Luis A. Lastras. 2012. PreSET: Improving performance of phase change memories by exploiting asymmetry in write times. In International Symposium on Computer Architecture (ISCA).Google Scholar
Moinuddin K. Qureshi, Michele M. Franceschini, Luis A. Lastras-Montaño, and John P. Karidis. 2010. Morphable memory system: A robust architecture for exploiting multi-level phase change memories. In International Symposium on Computer Architecture (ISCA).Google Scholar
Moinuddin K. Qureshi, Michele M. Franceschini, and Luis A. Lastras-Montano. 2010. Improving read performance of phase change memories via write cancellation and write pausing. In High Performance Computer Architecture (HPCA).Google Scholar
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In International Symposium on Computer Architecture (ISCA).Google Scholar
Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens. 2000. Memory access scheduling. In International Symposium on Computer Architecture (ISCA).Google Scholar
Bal S. Sandhu, Cezary Pietrzyk, and George McNeil Lattimore. 2018. Memory write driver, method and system. US Patent App. 15/904,848.Google Scholar
P. Schuddinck, M. Badaroglu, M. Stucchi, S. Demuynck, A. Hikavyy, M. Garcia-Bardon, A. Mercha, A. Mallik, T. Chiarella, S. Kubicek, et al. 2012. Standard cell level parasitics assessment in 20nm BPL and 14nm BFF. In International Electron Devices Meeting (IEDM).Google ScholarCross Ref
Nak Hee Seong, Dong Hyuk Woo, and Hsien-Hsin S. Lee. 2010. Security refresh: Prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized optaddress mapping. In International Symposium on Computer Architecture (ISCA).Google Scholar
Vivek Seshadri, Abhishek Bhowmick, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2014. The dirty-block index. In International Symposium on Computer Architecture (ISCA).Google Scholar
Saurabh Sinha, Greg Yeric, Vikas Chandra, Brian Cline, and Yu Cao. 2012. Exploring sub-20nm FinFET design with predictive technology models. In Design Automation Conference (DAC).Google ScholarDigital Library
Shihao Song, Anup Das, Onur Mutlu, and Nagrajan Kandasamy. 2019. Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change Memories. https://github.com/drexel-DISCO/PALP.Google Scholar
Jeffrey Stuecheli, Dimitris Kaseridis, David Daly, Hillery C. Hunter, and Lizy K. John. 2010. The virtual write queue: Coordinating DRAM and last-level cache policies. In International Symposium on Computer Architecture (ISCA).Google Scholar
C. Villa. 2018. PCM array architecture and management. In Phase Change Memory.Google Scholar
Corrado Villa, Duane Mills, Gerald Barkley, Hari Giduturi, Stefan Schippers, and Daniele Vimercati. 2010. A 45nm 1Gb 1.8 V phase-change memory. In International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).Google Scholar
Zhe Wang, Shuchang Shan, Ting Cao, Junli Gu, Yi Xu, Shuai Mu, Yuan Xie, and Daniel A. Jiménez. 2013. WADE: Writeback-aware dynamic cache management for NVM-based main memory system. ACM Transactions on Architecture and Code Optimization (TACO) 10, 4 (2013).Google Scholar
Fei Xia, Dejun Jiang, Jin Xiong, Mingyu Chen, Lixin Zhang, and Ninghui Sun. 2014. DWC: Dynamic write consolidation for phase change memory systems. In International Conference on Supercomputing (ICS).Google ScholarDigital Library
Shiying Xiong and Jeffrey Bokor. 2003. Sensitivity of double-gate and FinFETDevices to process variations. IEEE Transactions on Electron Devices 50, 11 (2003).Google Scholar
HanBin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael A. Harding, and Onur Mutlu. 2012. Row buffer locality aware caching policies for hybrid memories. In International Conference on Computer Design (ICCD).Google ScholarDigital Library
Hanbin Yoon, Justin Meza, Naveen Muralimanohar, Norman P. Jouppi, and Onur Mutlu. 2015. Efficient data mapping and buffering techniques for multilevel cell phase-change memories. ACM Transactions on Architecture and Code Optimization (TACO) 11, 4 (2015).Google Scholar
Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, Onur Mutlu, and Srinivas Devadas. 2017. Banshee: Bandwidth-efficient DRAM caching via software/hardware cooperation. In Symposium on Microarchitecture (MICRO).Google ScholarDigital Library
Jianhui Yue and Yifeng Zhu. 2013. Accelerating write by exploiting PCM asymmetries. In High Performance Computer Architecture (HPCA).Google Scholar
Jianhui Yue and Yifeng Zhu. 2013. Exploiting subarrays inside a bank to improve phase change memory performance. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE).Google Scholar
Lunkai Zhang, Brian Neely, Diana Franklin, Dmitri Strukov, Yuan Xie, and Frederic T Chong. 2016. Mellow writes: Extending lifetime in resistive memories through selective slow write backs. In International Symposium on Computer Architecture (ISCA).Google ScholarDigital Library
Jishen Zhao, Onur Mutlu, and Yuan Xie. 2014. FIRM: Fair and high-performance memory control for persistent memory systems. In Symposium on Microarchitecture (MICRO).Google ScholarDigital Library
Wen Zhou, Dan Feng, Yu Hua, Jingning Liu, Fangting Huang, and Yu Chen. 2016. An efficient parallel scheduling scheme on multi-partition PCM architecture. In International Symposium on Low Power Electronics and Design (ISLPED).Google Scholar

Index Terms

Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change Memories
1. Computer systems organization
  1. Architectures
2. Hardware
  1. Emerging technologies
    1. Memory and dense storage

Recommendations

Exploiting Phase-Change Memory in Cooperative Caches
SBAC-PAD '12: Proceedings of the 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing

Modern servers require large main memories, which so far have been enabled by improvements in DRAM density. However, the scalability of DRAM is approaching its limit, so Phase-Change Memory (PCM) is being considered as an alternative technology. PCM is ...
Read More
Exploiting phase-change technology in server memory systems
Read More
Express Read in MLC Phase Change Memories

In the era of big data, the capability of computer systems must be enhanced to support 2.5 quintillion byte/day data delivery. Among the components of a computer system, main memory has a great impact on overall system performance. DRAM technology has ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Embedded Computing Systems Volume 18, Issue 5s
Special Issue ESWEEK 2019, CASES 2019, CODES+ISSS 2019 and EMSOFT 2019
October 2019
1423 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3365919
Editor:
Sandeep K. Shukla
Indian Institute of Technology, India
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 7 October 2019
- Accepted: 1 July 2019
- Revised: 1 June 2019
- Received: 1 April 2019
Published in tecs Volume 18, Issue 5s

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Phase-change memories (PCM)
sense amplifiers
write drivers
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 252
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change Memories

ACM Transactions on Embedded Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

Exploiting Phase-Change Memory in Cooperative Caches

Exploiting phase-change technology in server memory systems

Express Read in MLC Phase Change Memories