skip to main content
research-article
Free Access

HAP: Hybrid-Memory-Aware Partition in Shared Last-Level Cache

Authors Info & Claims
Published:06 September 2017Publication History
Skip Abstract Section

Abstract

Data-center servers benefit from large-capacity memory systems to run multiple processes simultaneously. Hybrid DRAM-NVM memory is attractive for increasing memory capacity by exploiting the scalability of Non-Volatile Memory (NVM). However, current LLC policies are unaware of hybrid memory. Cache misses to NVM introduce high cost due to long NVM latency. Moreover, evicting dirty NVM data suffer from long write latency. We propose hybrid memory aware cache partitioning to dynamically adjust cache spaces and give NVM dirty data more chances to reside in LLC. Experimental results show Hybrid-memory-Aware Partition (HAP) improves performance by 46.7% and reduces energy consumption by 21.9% on average against LRU management. Moreover, HAP averagely improves performance by 9.3% and reduces energy consumption by 6.4% against a state-of-the-art cache mechanism.

References

  1. 2014. The Graph 500 list. Retrieved August 2014 from http://www.graph500.org/.Google ScholarGoogle Scholar
  2. Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 simulator. SIGARCH Comput. Archit. News 39, 2 (2011), 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Santiago Bock, Bruce R. Childers, Rami Melhem, and Daniel Mossé. 2014. Concurrent page migration for mobile systems with OS-managed hybrid memory. In Proceedings of ACM Conference on Computing Frontiers. 31:1--31:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2 (2011), 27:1--27:27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Standard Performance Evaluation Corporation. 2011. SPEC CPU2006. Retrieved September 2011 from http://www.spec.org/cpu2006/index.html.Google ScholarGoogle Scholar
  6. B. Dieny, R. Sousa, S. Bandiera, M. Souza, S. Auffret, B. Rodmacq, J. P. Nozieres, J. Herault, E. Gapihan, I. L. Prejbeanu, C. Portemont, K. Mackay, and B. Camb. 2011. Extended scalability and functionalities of MRAM based on thermally assisted writing. In Proceedings of the International Electron Devices Meeting. 1--3.Google ScholarGoogle Scholar
  7. Alexandre P. Ferreira, Miao Zhou, Santiago Bock, Bruce Childers, Rami Melhem, and Daniel Mossé. 2010. Increasing PCM main memory lifetime. In Proceedings of the Conference on Design, Automation and Test in Europe. 914--919. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. J. Ham, B. K. Chelepalli, N. Xue, and B. C. Lee. 2013. Disintegrated control for energy-efficient and heterogeneous memory systems. In Proceedings of the International Symposium on High Performance Computer Architecture. 424--435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ahmad Hassan, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2015. Software-managed energy-efficient hybrid DRAM/NVM main memory. In Proceedings of International Conference on Computing Frontiers. Article 23, 8 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Intel. 2016. 3D XPoint Unveiled-The Next Breakthrough in Memory Technology. Retrieve from http://www.intel.com/content/www/us/en/architecture-and-technology/3d-xpoint-unveiled-video.html.Google ScholarGoogle Scholar
  11. Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the International Symposium on Computer Architecture. 60--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Jeong and M. Dubois. 2003. Cost-sensitive cache replacement algorithms. In Proceedings of the High-Performance Computer Architecture. 327--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Kawahara, R. Azuma, Y. Ikeda, K. Kawai, Y. Katoh, K. Tanabe, T. Nakamura, Y. Sumimoto, N. Yamada, N. Nakai, S. Sakamoto, Y. Hayakawa, K. Tsuji, S. Yoneda, A. Himeno, K. Origasa, K. Shimakawa, T. Takagi, T. Mikawa, and K. Aono. 2012. An 8Mb multi-layered cross-point ReRAM macro with 443MB/s write throughput. In Proceedings of the Solid-State Circuits Conference Digest of Technical Papers. 432--434.Google ScholarGoogle Scholar
  14. Chang Joo Lee, Veynu Narasiman, Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. 2010. DRAM-aware Last-level Cache Writeback: Reducing Write-caused Interference in Memory Systems. Technical Report. CMU.Google ScholarGoogle Scholar
  15. Hyung Gyu Lee, Seungcheol Baek, C. Nicopoulos, and Jongman Kim. 2011. An energy- and performance-aware DRAM cache architecture for hybrid DRAM/PCM main memory systems. In Proceedigns of the International Conference on Computer Design. 381--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. H. S. Lee, G. S. Tyson, and M. K. Farrens. 2000. Eager writeback-a technique for improving bandwidth utilization. In Proceedings of the International Symposium on Microarchitecture. 11--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jaekyu Lee and Hyesoon Kim. 2012. TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture. In Proceedings of the High Performance Computer Architecture. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Soyoon Lee, Hyokyung Bahn, and S. H. Noh. 2011. Characterizing memory write references for efficient management of hybrid PCM and DRAM memory. In Proceedings of the Modeling, Analysis and Simulation of Computer and Telecommunication Systems. 168--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dong Li, Jeffrey S. Vetter, Gabriel Marin, Collin McCurdy, Cristian Cira, Zhuo Liu, and Weikuan Yu. 2012. Identifying opportunities for byte-addressable non-volatile memory in extreme-scale scientific applications. In Proceedings of the International Parallel and Distributed Processing Symposium. 945--956. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the Programming Language Design and Implementation. ACM, New York, NY, 190--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Piotr R. Luszczek, David H. Bailey, Jack J. Dongarra, Jeremy Kepner, Robert F. Lucas, Rolf Rabenseifner, and Daisuke Takahashi. 2006. The HPC Challenge (HPCC) benchmark suite. In Proceedings of the Conference on Supercomputing. ACM, Article 213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Moreto, F. J. Cazorla, A. Ramirez, and M. Valero. 2007. MLP-aware dynamic cache partitioning. In Proceedings of the Parallel Architecture and Compilation Techniques. 418--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Onur Mutlu, Rachata Ausavarungnirun, Rachael A. Harding, Justin Meza, and HanBin Yoon. 2012. Row buffer locality aware caching policies for hybrid memories. In Proceedings of the International Conference on Computer Design. 337--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A Choudhary. 2006. MineBench: A benchmark suite for data mining workloads. In Proceedings of the IEEE International Symposium on Workload Characterization. 182--188.Google ScholarGoogle ScholarCross RefCross Ref
  25. T. Nirschl, J. B. Phipp, T. D. Happ, G. W. Burr, B. Rajendran, M.-H. Lee, A. Schrott, M. Yang, M. Breitwisch, C.-F. Chen, E. Joseph, M. Lamorey, R. Cheek, S.-H. Chen, S. Zaidi, S. Raoux, Y. C. Chen, Y. Zhu, R. Bergmann, H.-L. Lung, and C. Lam. Dec. 2007. Write strategies for 2 and 4-bit multi-level phase-change memory. In Proceedings of the Electron Devices Meeting. 461--464.Google ScholarGoogle Scholar
  26. Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the International Symposium on Computer Architecture. 381--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Moinuddin K. Qureshi, John Karidis, Michele Franceschini, Vijayalakshmi Srinivasan, Luis Lastras, and Bulent Abali. 2009. Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In Proceedings of the International Symposium on Microarchitecture. 14--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, and Yale N. Patt. 2006. A case for MLP-aware cache replacement. In Proceedings of the International Symposium on Computer Architecture. 167--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the International Symposium on Microarchitecture. 423--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the International Symposium on Computer Architecture. 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Luiz E. Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing. 85--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Rosenfeld, E. Cooper-Balis, and B.nr Jacob. 2011. DRAMSim2: A cycle accurate memory system simulator. IEEE Comput. Arch. Lett. 10, 1 (2011), 16--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Reza Salkhordeh and Hossein Asadi. 2016. An operating system level data migration scheme in hybrid DRAM-NVM memory architecture. In Proceedings of the Conference on Design, Automation 8 Test in Europe. 936--941. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Chris Wilkerson Samira Khan, Alaa R. Alameldeen. 2014. Improving cache performance by exploiting read-write disparity. In Proceedings of the High-Performance Computer Architecture.Google ScholarGoogle Scholar
  35. V. Seshadri, A. Bhowmick, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2014. The dirty-block index. In Proceedings of the International Symposium on Computer Architecture (ISCA’1). 157--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Vivek Seshadri, Onur Mutlu, Michael A. Kozuch, and Todd C. Mowry. 2012. The evicted-address filter: A unified mechanism to address both cache pollution and thrashing. In Proceedings of the Parallel Architectures and Compilation Techniques. 355--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Still. 2005. The Definitive Guide to ImageMagick. Apress. Retrieved from http://books.google.com.hk/books?id=6KVZ8Ya6a8cC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jeffrey Stuecheli, Dimitris Kaseridis, David Daly, Hillery C. Hunter, and Lizy K. John. 2010. The virtual write queue: Coordinating DRAM and last-level cache policies. In Proceedings of the International Symposium on Computer Architecture. ACM, 72--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. G. E. Suh, L. Rudolph, and S. Devadas. 2004. Dynamic partitioning of shared cache memory. J. Supercomput. 28, 1 (2004), 7--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Z. Wang, S. M. Khan, and D. A. Jimnez. 2012. Improving writeback efficiency with decoupled last-write prediction. In Proceedings of the International Symposium on Computer Architecture (ISCA’12). 309--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zhe Wang, Shuchang Shan, Ting Cao, Junli Gu, Yi Xu, Shuai Mu, Yuan Xie, and Daniel A. Jiménez. 2013. WADE: Writeback-aware dynamic cache management for NVM-based main memory system. ACM Trans. Arch. Code Optimiz. 10 (Dec. 2013), 51:1--51:21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. W. Wei, D. Jiang, S. A. McKee, J. Xiong, and M. Chen. 2015. Exploiting program semantics to place data in hybrid memory. In Proceedings of the International Conference on Parallel Architecture and Compilation. 163--173. 1089-795X Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. W. Wei, D. Jiang, J. Xiong, and M. Chen. 2014. HAP: Hybrid-memory-aware partition in shared last-level cache. In Proceeding of International Conference on Computer Design. 28--35.Google ScholarGoogle Scholar
  44. Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the International Symposium on Computer Architecture. 24--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yuejian Xie and Gabriel H. Loh. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the International Symposium on Computer Architecture. 174--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Chun Jason Xue, Youtao Zhang, Yiran Chen, Guangyu Sun, J. Jianhua Yang, and Hai Li. 2011. Emerging non-volatile memories: Opportunities and challenges. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis. 325--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Dong Ye, A. Pavuluri, C. A. Waldspurger, B. Tsang, B. Rychlik, and S. Woo. 2008. Prototyping a hybrid main memory using a virtual machine monitor. In Proceedings of the International Conference on Computer Design. 272--279.Google ScholarGoogle Scholar
  48. Hanbin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael Harding, and Onur Mutlu. 2011. Row Buffer Locality-Aware Data Placementin Hybrid Memories. Technical Report No. 2011-005. Dept. of Electrical and Computer Engineering, Carnegie Mellon University.Google ScholarGoogle Scholar
  49. Deshan Zhang, Lei Ju, Mengying Zhao, Xiang Gao, and Zhiping Jia. 2016. Write-back aware shared last-level cache management for hybrid main memory. In Proceedings of the Annual Design Automation Conference. 172:1--172:6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Wangyuan Zhang and Tao Li. 2009. Exploring phase change memory and 3D die-stacking for power/thermal friendly, fast and durable memory architectures. In Proceedings of the Parallel Architectures and Compilation Techniques. 101--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. X. Zhang, Q. Hu, D. Wang, C. Li, and H. Wang. 2011. A read-write aware replacement policy for phase change memory.Google ScholarGoogle Scholar
  52. Y. Zhang and S. Swanson. 2015. A study of application performance with non-volatile main memory. In Proceedings of the Symposium on Mass Storage Systems and Technologies. 1--10.Google ScholarGoogle Scholar
  53. Miao Zhou, Yu Du, Bruce Childers, Rami Melhem, and Daniel Mossé. 2012. Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems. ACM Trans. Arch. Code Optimiz. 8, 4 (Jan. 2012), 53:1--53:21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the International Symposium on Computer Architecture. 14--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Yuanyuan Zhou, James Philbin, and Kai Li. 2001. The multi-queue replacement algorithm for second level buffer caches. In Proceedings of the USENIX Annual Technical Conference. 91--104. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. HAP: Hybrid-Memory-Aware Partition in Shared Last-Level Cache

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Architecture and Code Optimization
          ACM Transactions on Architecture and Code Optimization  Volume 14, Issue 3
          September 2017
          278 pages
          ISSN:1544-3566
          EISSN:1544-3973
          DOI:10.1145/3132652
          Issue’s Table of Contents

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 September 2017
          • Accepted: 1 June 2017
          • Revised: 1 May 2017
          • Received: 1 December 2016
          Published in taco Volume 14, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader