HAP: Hybrid-Memory-Aware Partition in Shared Last-Level Cache

Authors:
Wei Wei

State Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Beijing China

State Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Beijing China
View Profile

,
Dejun Jiang

State Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Beijing China

State Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Beijing China
View Profile

,
Jin Xiong

State Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Beijing China

State Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Beijing China
View Profile

,
Mingyu Chen

State Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Beijing China

State Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Beijing China
View Profile

ACM Transactions on Architecture and Code Optimization Volume 14 Issue 3Article No.: 24pp 1–25https://doi.org/10.1145/3106340

Published:06 September 2017Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Data-center servers benefit from large-capacity memory systems to run multiple processes simultaneously. Hybrid DRAM-NVM memory is attractive for increasing memory capacity by exploiting the scalability of Non-Volatile Memory (NVM). However, current LLC policies are unaware of hybrid memory. Cache misses to NVM introduce high cost due to long NVM latency. Moreover, evicting dirty NVM data suffer from long write latency. We propose hybrid memory aware cache partitioning to dynamically adjust cache spaces and give NVM dirty data more chances to reside in LLC. Experimental results show Hybrid-memory-Aware Partition (HAP) improves performance by 46.7% and reduces energy consumption by 21.9% on average against LRU management. Moreover, HAP averagely improves performance by 9.3% and reduces energy consumption by 6.4% against a state-of-the-art cache mechanism.

References

2014. The Graph 500 list. Retrieved August 2014 from http://www.graph500.org/.Google Scholar
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (2011), 1--7. Google ScholarDigital Library
Santiago Bock, Bruce R. Childers, Rami Melhem, and Daniel Mossé. 2014. Concurrent page migration for mobile systems with OS-managed hybrid memory. In Proceedings of ACM Conference on Computing Frontiers. 31:1--31:10. Google ScholarDigital Library
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2 (2011), 27:1--27:27. Google ScholarDigital Library
Standard Performance Evaluation Corporation. 2011. SPEC CPU2006. Retrieved September 2011 from http://www.spec.org/cpu2006/index.html.Google Scholar
B. Dieny, R. Sousa, S. Bandiera, M. Souza, S. Auffret, B. Rodmacq, J. P. Nozieres, J. Herault, E. Gapihan, I. L. Prejbeanu, C. Portemont, K. Mackay, and B. Camb. 2011. Extended scalability and functionalities of MRAM based on thermally assisted writing. In Proceedings of the International Electron Devices Meeting. 1--3.Google Scholar
Alexandre P. Ferreira, Miao Zhou, Santiago Bock, Bruce Childers, Rami Melhem, and Daniel Mossé. 2010. Increasing PCM main memory lifetime. In Proceedings of the Conference on Design, Automation and Test in Europe. 914--919. Google ScholarDigital Library
T. J. Ham, B. K. Chelepalli, N. Xue, and B. C. Lee. 2013. Disintegrated control for energy-efficient and heterogeneous memory systems. In Proceedings of the International Symposium on High Performance Computer Architecture. 424--435. Google ScholarDigital Library
Ahmad Hassan, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2015. Software-managed energy-efficient hybrid DRAM/NVM main memory. In Proceedings of International Conference on Computing Frontiers. Article 23, 8 pages. Google ScholarDigital Library
Intel. 2016. 3D XPoint Unveiled-The Next Breakthrough in Memory Technology. Retrieve from http://www.intel.com/content/www/us/en/architecture-and-technology/3d-xpoint-unveiled-video.html.Google Scholar
Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the International Symposium on Computer Architecture. 60--71. Google ScholarDigital Library
J. Jeong and M. Dubois. 2003. Cost-sensitive cache replacement algorithms. In Proceedings of the High-Performance Computer Architecture. 327--337. Google ScholarDigital Library
A. Kawahara, R. Azuma, Y. Ikeda, K. Kawai, Y. Katoh, K. Tanabe, T. Nakamura, Y. Sumimoto, N. Yamada, N. Nakai, S. Sakamoto, Y. Hayakawa, K. Tsuji, S. Yoneda, A. Himeno, K. Origasa, K. Shimakawa, T. Takagi, T. Mikawa, and K. Aono. 2012. An 8Mb multi-layered cross-point ReRAM macro with 443MB/s write throughput. In Proceedings of the Solid-State Circuits Conference Digest of Technical Papers. 432--434.Google Scholar
Chang Joo Lee, Veynu Narasiman, Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. 2010. DRAM-aware Last-level Cache Writeback: Reducing Write-caused Interference in Memory Systems. Technical Report. CMU.Google Scholar
Hyung Gyu Lee, Seungcheol Baek, C. Nicopoulos, and Jongman Kim. 2011. An energy- and performance-aware DRAM cache architecture for hybrid DRAM/PCM main memory systems. In Proceedigns of the International Conference on Computer Design. 381--387. Google ScholarDigital Library
H. H. S. Lee, G. S. Tyson, and M. K. Farrens. 2000. Eager writeback-a technique for improving bandwidth utilization. In Proceedings of the International Symposium on Microarchitecture. 11--21. Google ScholarDigital Library
Jaekyu Lee and Hyesoon Kim. 2012. TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture. In Proceedings of the High Performance Computer Architecture. 1--12. Google ScholarDigital Library
Soyoon Lee, Hyokyung Bahn, and S. H. Noh. 2011. Characterizing memory write references for efficient management of hybrid PCM and DRAM memory. In Proceedings of the Modeling, Analysis and Simulation of Computer and Telecommunication Systems. 168--175. Google ScholarDigital Library
Dong Li, Jeffrey S. Vetter, Gabriel Marin, Collin McCurdy, Cristian Cira, Zhuo Liu, and Weikuan Yu. 2012. Identifying opportunities for byte-addressable non-volatile memory in extreme-scale scientific applications. In Proceedings of the International Parallel and Distributed Processing Symposium. 945--956. Google ScholarDigital Library
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the Programming Language Design and Implementation. ACM, New York, NY, 190--200. Google ScholarDigital Library
Piotr R. Luszczek, David H. Bailey, Jack J. Dongarra, Jeremy Kepner, Robert F. Lucas, Rolf Rabenseifner, and Daisuke Takahashi. 2006. The HPC Challenge (HPCC) benchmark suite. In Proceedings of the Conference on Supercomputing. ACM, Article 213. Google ScholarDigital Library
M. Moreto, F. J. Cazorla, A. Ramirez, and M. Valero. 2007. MLP-aware dynamic cache partitioning. In Proceedings of the Parallel Architecture and Compilation Techniques. 418--418. Google ScholarDigital Library
Onur Mutlu, Rachata Ausavarungnirun, Rachael A. Harding, Justin Meza, and HanBin Yoon. 2012. Row buffer locality aware caching policies for hybrid memories. In Proceedings of the International Conference on Computer Design. 337--344. Google ScholarDigital Library
R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A Choudhary. 2006. MineBench: A benchmark suite for data mining workloads. In Proceedings of the IEEE International Symposium on Workload Characterization. 182--188.Google ScholarCross Ref
T. Nirschl, J. B. Phipp, T. D. Happ, G. W. Burr, B. Rajendran, M.-H. Lee, A. Schrott, M. Yang, M. Breitwisch, C.-F. Chen, E. Joseph, M. Lamorey, R. Cheek, S.-H. Chen, S. Zaidi, S. Raoux, Y. C. Chen, Y. Zhu, R. Bergmann, H.-L. Lung, and C. Lam. Dec. 2007. Write strategies for 2 and 4-bit multi-level phase-change memory. In Proceedings of the Electron Devices Meeting. 461--464.Google Scholar
Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the International Symposium on Computer Architecture. 381--391. Google ScholarDigital Library
Moinuddin K. Qureshi, John Karidis, Michele Franceschini, Vijayalakshmi Srinivasan, Luis Lastras, and Bulent Abali. 2009. Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In Proceedings of the International Symposium on Microarchitecture. 14--23. Google ScholarDigital Library
Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, and Yale N. Patt. 2006. A case for MLP-aware cache replacement. In Proceedings of the International Symposium on Computer Architecture. 167--178. Google ScholarDigital Library
Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the International Symposium on Microarchitecture. 423--432. Google ScholarDigital Library
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the International Symposium on Computer Architecture. 10. Google ScholarDigital Library
Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing. 85--95. Google ScholarDigital Library
P. Rosenfeld, E. Cooper-Balis, and B.nr Jacob. 2011. DRAMSim2: A cycle accurate memory system simulator. IEEE Comput. Arch. Lett. 10, 1 (2011), 16--19. Google ScholarDigital Library
Reza Salkhordeh and Hossein Asadi. 2016. An operating system level data migration scheme in hybrid DRAM-NVM memory architecture. In Proceedings of the Conference on Design, Automation 8 Test in Europe. 936--941. Google ScholarDigital Library
Chris Wilkerson Samira Khan, Alaa R. Alameldeen. 2014. Improving cache performance by exploiting read-write disparity. In Proceedings of the High-Performance Computer Architecture.Google Scholar
V. Seshadri, A. Bhowmick, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2014. The dirty-block index. In Proceedings of the International Symposium on Computer Architecture (ISCA’1). 157--168. Google ScholarDigital Library
Vivek Seshadri, Onur Mutlu, Michael A. Kozuch, and Todd C. Mowry. 2012. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In Proceedings of the Parallel Architectures and Compilation Techniques. 355--366. Google ScholarDigital Library
M. Still. 2005. The Definitive Guide to ImageMagick. Apress. Retrieved from http://books.google.com.hk/books?id=6KVZ8Ya6a8cC. Google ScholarDigital Library
Jeffrey Stuecheli, Dimitris Kaseridis, David Daly, Hillery C. Hunter, and Lizy K. John. 2010. The virtual write queue: Coordinating DRAM and last-level cache policies. In Proceedings of the International Symposium on Computer Architecture. ACM, 72--82. Google ScholarDigital Library
G. E. Suh, L. Rudolph, and S. Devadas. 2004. Dynamic partitioning of shared cache memory. J. Supercomput. 28, 1 (2004), 7--26. Google ScholarDigital Library
Z. Wang, S. M. Khan, and D. A. Jimnez. 2012. Improving writeback efficiency with decoupled last-write prediction. In Proceedings of the International Symposium on Computer Architecture (ISCA’12). 309--320. Google ScholarDigital Library
Zhe Wang, Shuchang Shan, Ting Cao, Junli Gu, Yi Xu, Shuai Mu, Yuan Xie, and Daniel A. Jiménez. 2013. WADE: Writeback-aware dynamic cache management for NVM-based main memory system. ACM Trans. Arch. Code Optimiz. 10 (Dec. 2013), 51:1--51:21. Google ScholarDigital Library
W. Wei, D. Jiang, S. A. McKee, J. Xiong, and M. Chen. 2015. Exploiting program semantics to place data in hybrid memory. In Proceedings of the International Conference on Parallel Architecture and Compilation. 163--173. 1089-795X Google ScholarDigital Library
W. Wei, D. Jiang, J. Xiong, and M. Chen. 2014. HAP: Hybrid-memory-aware partition in shared last-level cache. In Proceeding of International Conference on Computer Design. 28--35.Google Scholar
Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the International Symposium on Computer Architecture. 24--36. Google ScholarDigital Library
Yuejian Xie and Gabriel H. Loh. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the International Symposium on Computer Architecture. 174--183. Google ScholarDigital Library
Chun Jason Xue, Youtao Zhang, Yiran Chen, Guangyu Sun, J. Jianhua Yang, and Hai Li. 2011. Emerging non-volatile memories: Opportunities and challenges. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis. 325--334. Google ScholarDigital Library
Dong Ye, A. Pavuluri, C. A. Waldspurger, B. Tsang, B. Rychlik, and S. Woo. 2008. Prototyping a hybrid main memory using a virtual machine monitor. In Proceedings of the International Conference on Computer Design. 272--279.Google Scholar
Hanbin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael Harding, and Onur Mutlu. 2011. Row Buffer Locality-Aware Data Placementin Hybrid Memories. Technical Report No. 2011-005. Dept. of Electrical and Computer Engineering, Carnegie Mellon University.Google Scholar
Deshan Zhang, Lei Ju, Mengying Zhao, Xiang Gao, and Zhiping Jia. 2016. Write-back aware shared last-level cache management for hybrid main memory. In Proceedings of the Annual Design Automation Conference. 172:1--172:6. Google ScholarDigital Library
Wangyuan Zhang and Tao Li. 2009. Exploring phase change memory and 3D die-stacking for power/thermal friendly, fast and durable memory architectures. In Proceedings of the Parallel Architectures and Compilation Techniques. 101--112. Google ScholarDigital Library
X. Zhang, Q. Hu, D. Wang, C. Li, and H. Wang. 2011. A read-write aware replacement policy for phase change memory.Google Scholar
Y. Zhang and S. Swanson. 2015. A study of application performance with non-volatile main memory. In Proceedings of the Symposium on Mass Storage Systems and Technologies. 1--10.Google Scholar
Miao Zhou, Yu Du, Bruce Childers, Rami Melhem, and Daniel Mossé. 2012. Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems. ACM Trans. Arch. Code Optimiz. 8, 4 (Jan. 2012), 53:1--53:21. Google ScholarDigital Library
Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the International Symposium on Computer Architecture. 14--23. Google ScholarDigital Library
Yuanyuan Zhou, James Philbin, and Kai Li. 2001. The multi-queue replacement algorithm for second level buffer caches. In Proceedings of the USENIX Annual Technical Conference. 91--104. Google ScholarDigital Library

Index Terms

HAP: Hybrid-Memory-Aware Partition in Shared Last-Level Cache

Recommendations

MBZip: Multiblock Data Compression

Compression techniques at the last-level cache and the DRAM play an important role in improving system performance by increasing their effective capacities. A compressed block in DRAM also reduces the transfer time over the memory bus to the caches, ...
Read More
Shared Last-Level Cache Management and Memory Scheduling for GPGPUs with Hybrid Main Memory

Memory intensive workloads become increasingly popular on general purpose graphics processing units (GPGPUs), and impose great challenges on the GPGPU memory subsystem design. On the other hand, with the recent development of non-volatile memory (NVM) ...
Read More
The Case for Associative DRAM Caches
MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems

In-package DRAM caches are a promising new development that may enable the continued scaling of main memory by facilitating the creation of multi-level memory systems that can effectively utilize dense non-volatile memory technologies. However, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Architecture and Code Optimization Volume 14, Issue 3
September 2017
278 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3132652
Editor:
Koen De Bosschere
Ghent University
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 September 2017
- Accepted: 1 June 2017
- Revised: 1 May 2017
- Received: 1 December 2016
Published in taco Volume 14, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CPU cache
Hybrid memory
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 856
  Total Downloads
- Downloads (Last 12 months)84
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HAP: Hybrid-Memory-Aware Partition in Shared Last-Level Cache

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

MBZip: Multiblock Data Compression

Shared Last-Level Cache Management and Memory Scheduling for GPGPUs with Hybrid Main Memory

The Case for Associative DRAM Caches