Abstract
Data-center servers benefit from large-capacity memory systems to run multiple processes simultaneously. Hybrid DRAM-NVM memory is attractive for increasing memory capacity by exploiting the scalability of Non-Volatile Memory (NVM). However, current LLC policies are unaware of hybrid memory. Cache misses to NVM introduce high cost due to long NVM latency. Moreover, evicting dirty NVM data suffer from long write latency. We propose hybrid memory aware cache partitioning to dynamically adjust cache spaces and give NVM dirty data more chances to reside in LLC. Experimental results show Hybrid-memory-Aware Partition (HAP) improves performance by 46.7% and reduces energy consumption by 21.9% on average against LRU management. Moreover, HAP averagely improves performance by 9.3% and reduces energy consumption by 6.4% against a state-of-the-art cache mechanism.
- 2014. The Graph 500 list. Retrieved August 2014 from http://www.graph500.org/.Google Scholar
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (2011), 1--7. Google ScholarDigital Library
- Santiago Bock, Bruce R. Childers, Rami Melhem, and Daniel Mossé. 2014. Concurrent page migration for mobile systems with OS-managed hybrid memory. In Proceedings of ACM Conference on Computing Frontiers. 31:1--31:10. Google ScholarDigital Library
- Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2 (2011), 27:1--27:27. Google ScholarDigital Library
- Standard Performance Evaluation Corporation. 2011. SPEC CPU2006. Retrieved September 2011 from http://www.spec.org/cpu2006/index.html.Google Scholar
- B. Dieny, R. Sousa, S. Bandiera, M. Souza, S. Auffret, B. Rodmacq, J. P. Nozieres, J. Herault, E. Gapihan, I. L. Prejbeanu, C. Portemont, K. Mackay, and B. Camb. 2011. Extended scalability and functionalities of MRAM based on thermally assisted writing. In Proceedings of the International Electron Devices Meeting. 1--3.Google Scholar
- Alexandre P. Ferreira, Miao Zhou, Santiago Bock, Bruce Childers, Rami Melhem, and Daniel Mossé. 2010. Increasing PCM main memory lifetime. In Proceedings of the Conference on Design, Automation and Test in Europe. 914--919. Google ScholarDigital Library
- T. J. Ham, B. K. Chelepalli, N. Xue, and B. C. Lee. 2013. Disintegrated control for energy-efficient and heterogeneous memory systems. In Proceedings of the International Symposium on High Performance Computer Architecture. 424--435. Google ScholarDigital Library
- Ahmad Hassan, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2015. Software-managed energy-efficient hybrid DRAM/NVM main memory. In Proceedings of International Conference on Computing Frontiers. Article 23, 8 pages. Google ScholarDigital Library
- Intel. 2016. 3D XPoint Unveiled-The Next Breakthrough in Memory Technology. Retrieve from http://www.intel.com/content/www/us/en/architecture-and-technology/3d-xpoint-unveiled-video.html.Google Scholar
- Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the International Symposium on Computer Architecture. 60--71. Google ScholarDigital Library
- J. Jeong and M. Dubois. 2003. Cost-sensitive cache replacement algorithms. In Proceedings of the High-Performance Computer Architecture. 327--337. Google ScholarDigital Library
- A. Kawahara, R. Azuma, Y. Ikeda, K. Kawai, Y. Katoh, K. Tanabe, T. Nakamura, Y. Sumimoto, N. Yamada, N. Nakai, S. Sakamoto, Y. Hayakawa, K. Tsuji, S. Yoneda, A. Himeno, K. Origasa, K. Shimakawa, T. Takagi, T. Mikawa, and K. Aono. 2012. An 8Mb multi-layered cross-point ReRAM macro with 443MB/s write throughput. In Proceedings of the Solid-State Circuits Conference Digest of Technical Papers. 432--434.Google Scholar
- Chang Joo Lee, Veynu Narasiman, Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. 2010. DRAM-aware Last-level Cache Writeback: Reducing Write-caused Interference in Memory Systems. Technical Report. CMU.Google Scholar
- Hyung Gyu Lee, Seungcheol Baek, C. Nicopoulos, and Jongman Kim. 2011. An energy- and performance-aware DRAM cache architecture for hybrid DRAM/PCM main memory systems. In Proceedigns of the International Conference on Computer Design. 381--387. Google ScholarDigital Library
- H. H. S. Lee, G. S. Tyson, and M. K. Farrens. 2000. Eager writeback-a technique for improving bandwidth utilization. In Proceedings of the International Symposium on Microarchitecture. 11--21. Google ScholarDigital Library
- Jaekyu Lee and Hyesoon Kim. 2012. TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture. In Proceedings of the High Performance Computer Architecture. 1--12. Google ScholarDigital Library
- Soyoon Lee, Hyokyung Bahn, and S. H. Noh. 2011. Characterizing memory write references for efficient management of hybrid PCM and DRAM memory. In Proceedings of the Modeling, Analysis and Simulation of Computer and Telecommunication Systems. 168--175. Google ScholarDigital Library
- Dong Li, Jeffrey S. Vetter, Gabriel Marin, Collin McCurdy, Cristian Cira, Zhuo Liu, and Weikuan Yu. 2012. Identifying opportunities for byte-addressable non-volatile memory in extreme-scale scientific applications. In Proceedings of the International Parallel and Distributed Processing Symposium. 945--956. Google ScholarDigital Library
- Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the Programming Language Design and Implementation. ACM, New York, NY, 190--200. Google ScholarDigital Library
- Piotr R. Luszczek, David H. Bailey, Jack J. Dongarra, Jeremy Kepner, Robert F. Lucas, Rolf Rabenseifner, and Daisuke Takahashi. 2006. The HPC Challenge (HPCC) benchmark suite. In Proceedings of the Conference on Supercomputing. ACM, Article 213. Google ScholarDigital Library
- M. Moreto, F. J. Cazorla, A. Ramirez, and M. Valero. 2007. MLP-aware dynamic cache partitioning. In Proceedings of the Parallel Architecture and Compilation Techniques. 418--418. Google ScholarDigital Library
- Onur Mutlu, Rachata Ausavarungnirun, Rachael A. Harding, Justin Meza, and HanBin Yoon. 2012. Row buffer locality aware caching policies for hybrid memories. In Proceedings of the International Conference on Computer Design. 337--344. Google ScholarDigital Library
- R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A Choudhary. 2006. MineBench: A benchmark suite for data mining workloads. In Proceedings of the IEEE International Symposium on Workload Characterization. 182--188.Google ScholarCross Ref
- T. Nirschl, J. B. Phipp, T. D. Happ, G. W. Burr, B. Rajendran, M.-H. Lee, A. Schrott, M. Yang, M. Breitwisch, C.-F. Chen, E. Joseph, M. Lamorey, R. Cheek, S.-H. Chen, S. Zaidi, S. Raoux, Y. C. Chen, Y. Zhu, R. Bergmann, H.-L. Lung, and C. Lam. Dec. 2007. Write strategies for 2 and 4-bit multi-level phase-change memory. In Proceedings of the Electron Devices Meeting. 461--464.Google Scholar
- Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the International Symposium on Computer Architecture. 381--391. Google ScholarDigital Library
- Moinuddin K. Qureshi, John Karidis, Michele Franceschini, Vijayalakshmi Srinivasan, Luis Lastras, and Bulent Abali. 2009. Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In Proceedings of the International Symposium on Microarchitecture. 14--23. Google ScholarDigital Library
- Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, and Yale N. Patt. 2006. A case for MLP-aware cache replacement. In Proceedings of the International Symposium on Computer Architecture. 167--178. Google ScholarDigital Library
- Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the International Symposium on Microarchitecture. 423--432. Google ScholarDigital Library
- Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the International Symposium on Computer Architecture. 10. Google ScholarDigital Library
- Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing. 85--95. Google ScholarDigital Library
- P. Rosenfeld, E. Cooper-Balis, and B.nr Jacob. 2011. DRAMSim2: A cycle accurate memory system simulator. IEEE Comput. Arch. Lett. 10, 1 (2011), 16--19. Google ScholarDigital Library
- Reza Salkhordeh and Hossein Asadi. 2016. An operating system level data migration scheme in hybrid DRAM-NVM memory architecture. In Proceedings of the Conference on Design, Automation 8 Test in Europe. 936--941. Google ScholarDigital Library
- Chris Wilkerson Samira Khan, Alaa R. Alameldeen. 2014. Improving cache performance by exploiting read-write disparity. In Proceedings of the High-Performance Computer Architecture.Google Scholar
- V. Seshadri, A. Bhowmick, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2014. The dirty-block index. In Proceedings of the International Symposium on Computer Architecture (ISCA’1). 157--168. Google ScholarDigital Library
- Vivek Seshadri, Onur Mutlu, Michael A. Kozuch, and Todd C. Mowry. 2012. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In Proceedings of the Parallel Architectures and Compilation Techniques. 355--366. Google ScholarDigital Library
- M. Still. 2005. The Definitive Guide to ImageMagick. Apress. Retrieved from http://books.google.com.hk/books?id=6KVZ8Ya6a8cC. Google ScholarDigital Library
- Jeffrey Stuecheli, Dimitris Kaseridis, David Daly, Hillery C. Hunter, and Lizy K. John. 2010. The virtual write queue: Coordinating DRAM and last-level cache policies. In Proceedings of the International Symposium on Computer Architecture. ACM, 72--82. Google ScholarDigital Library
- G. E. Suh, L. Rudolph, and S. Devadas. 2004. Dynamic partitioning of shared cache memory. J. Supercomput. 28, 1 (2004), 7--26. Google ScholarDigital Library
- Z. Wang, S. M. Khan, and D. A. Jimnez. 2012. Improving writeback efficiency with decoupled last-write prediction. In Proceedings of the International Symposium on Computer Architecture (ISCA’12). 309--320. Google ScholarDigital Library
- Zhe Wang, Shuchang Shan, Ting Cao, Junli Gu, Yi Xu, Shuai Mu, Yuan Xie, and Daniel A. Jiménez. 2013. WADE: Writeback-aware dynamic cache management for NVM-based main memory system. ACM Trans. Arch. Code Optimiz. 10 (Dec. 2013), 51:1--51:21. Google ScholarDigital Library
- W. Wei, D. Jiang, S. A. McKee, J. Xiong, and M. Chen. 2015. Exploiting program semantics to place data in hybrid memory. In Proceedings of the International Conference on Parallel Architecture and Compilation. 163--173. 1089-795X Google ScholarDigital Library
- W. Wei, D. Jiang, J. Xiong, and M. Chen. 2014. HAP: Hybrid-memory-aware partition in shared last-level cache. In Proceeding of International Conference on Computer Design. 28--35.Google Scholar
- Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the International Symposium on Computer Architecture. 24--36. Google ScholarDigital Library
- Yuejian Xie and Gabriel H. Loh. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the International Symposium on Computer Architecture. 174--183. Google ScholarDigital Library
- Chun Jason Xue, Youtao Zhang, Yiran Chen, Guangyu Sun, J. Jianhua Yang, and Hai Li. 2011. Emerging non-volatile memories: Opportunities and challenges. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis. 325--334. Google ScholarDigital Library
- Dong Ye, A. Pavuluri, C. A. Waldspurger, B. Tsang, B. Rychlik, and S. Woo. 2008. Prototyping a hybrid main memory using a virtual machine monitor. In Proceedings of the International Conference on Computer Design. 272--279.Google Scholar
- Hanbin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael Harding, and Onur Mutlu. 2011. Row Buffer Locality-Aware Data Placementin Hybrid Memories. Technical Report No. 2011-005. Dept. of Electrical and Computer Engineering, Carnegie Mellon University.Google Scholar
- Deshan Zhang, Lei Ju, Mengying Zhao, Xiang Gao, and Zhiping Jia. 2016. Write-back aware shared last-level cache management for hybrid main memory. In Proceedings of the Annual Design Automation Conference. 172:1--172:6. Google ScholarDigital Library
- Wangyuan Zhang and Tao Li. 2009. Exploring phase change memory and 3D die-stacking for power/thermal friendly, fast and durable memory architectures. In Proceedings of the Parallel Architectures and Compilation Techniques. 101--112. Google ScholarDigital Library
- X. Zhang, Q. Hu, D. Wang, C. Li, and H. Wang. 2011. A read-write aware replacement policy for phase change memory.Google Scholar
- Y. Zhang and S. Swanson. 2015. A study of application performance with non-volatile main memory. In Proceedings of the Symposium on Mass Storage Systems and Technologies. 1--10.Google Scholar
- Miao Zhou, Yu Du, Bruce Childers, Rami Melhem, and Daniel Mossé. 2012. Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems. ACM Trans. Arch. Code Optimiz. 8, 4 (Jan. 2012), 53:1--53:21. Google ScholarDigital Library
- Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the International Symposium on Computer Architecture. 14--23. Google ScholarDigital Library
- Yuanyuan Zhou, James Philbin, and Kai Li. 2001. The multi-queue replacement algorithm for second level buffer caches. In Proceedings of the USENIX Annual Technical Conference. 91--104. Google ScholarDigital Library
Index Terms
- HAP: Hybrid-Memory-Aware Partition in Shared Last-Level Cache
Recommendations
MBZip: Multiblock Data Compression
Compression techniques at the last-level cache and the DRAM play an important role in improving system performance by increasing their effective capacities. A compressed block in DRAM also reduces the transfer time over the memory bus to the caches, ...
Shared Last-Level Cache Management and Memory Scheduling for GPGPUs with Hybrid Main Memory
Memory intensive workloads become increasingly popular on general purpose graphics processing units (GPGPUs), and impose great challenges on the GPGPU memory subsystem design. On the other hand, with the recent development of non-volatile memory (NVM) ...
The Case for Associative DRAM Caches
MEMSYS '16: Proceedings of the Second International Symposium on Memory SystemsIn-package DRAM caches are a promising new development that may enable the continued scaling of main memory by facilitating the creation of multi-level memory systems that can effectively utilize dense non-volatile memory technologies. However, ...
Comments