skip to main content
10.1145/3357526.3357543acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article
Public Access

Page migration support for disaggregated non-volatile memories

Published:30 September 2019Publication History

ABSTRACT

As demands for memory-intensive applications continue to grow, the memory capacity of each computing node is expected to grow at a similar pace. In high-performance computing (HPC) systems, the memory capacity per compute node is decided upon the most demanding application that would likely run on such system, and hence the average capacity per node in future HPC systems is expected to grow significantly. However, since HPC systems run many applications with different capacity demands, a large percentage of the overall memory capacity will likely be underutilized; memory modules can be thought of as private memory for its corresponding computing node. Thus, as HPC systems are moving towards the exascale era, a better utilization of memory is strongly desired. Moreover, upgrading memory system requires significant efforts. Fortunately, disaggregated memory systems promise better utilization by defining regions of global memory, typically referred to as memory blades, which can be accessed by all computing nodes in the system, thus achieving much better utilization.

Disaggregated memory systems are expected to be built using dense, power-efficient memory technologies. Thus, emerging nonvolatile memories (NVMs) are placing themselves as the main building blocks for such systems. However, NVMs are slower than DRAM. Therefore, it is expected that each computing node would have a small local memory that is based on either HBM or DRAM, whereas a large shared NVM memory would be accessible by all nodes. Managing such system with global and local memory requires a novel hardware/software co-design to initiate page migration between global and local memory to maximize performance while enabling access to huge shared memory. In this paper we provide support to migrate pages, investigate such memory management aspects and the major system-level aspects that can affect design decisions in disaggregated NVM systems

References

  1. Nadav Amit. 2017. Optimizing the TLB shootdown algorithm with page access tracking. In Proc. USENIX Ann. Conf. 27--39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Arpaci-Dusseau. 2000. Translation Lookaside Buffers (TLBs). http://pages.cs.wisc.edu/~eli/537/lectures/TLB.2x2.pdfGoogle ScholarGoogle Scholar
  3. Amro Awad, Arkaprava Basu, Sergey Blagodurov, Yan Solihin, and Gabriel H Loh. 2017. Avoiding TLB Shootdowns Through Self-Invalidating TLB Entries. In Parallel Architectures and Compilation Techniques (PACT), 2017 26th International Conference on. IEEE, 273--287.Google ScholarGoogle ScholarCross RefCross Ref
  4. Amro Awad, Sergey Blagodurov, and Yan Solihin. 2016. Write-aware management of NVM-based memory extensions. In Proceedings of the 2016 International Conference on Supercomputing. ACM, 9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. David H Bailey. 2011. Nas parallel benchmarks. In Encyclopedia of Parallel Computing. Springer, 1254--1259.Google ScholarGoogle Scholar
  6. Daniel Turull Chakri Padala and Vinay Yadav. 2017. Time for memory disaggregation? Ericsson Research Blog. Online]. https://www.ericsson.com/research-blog/time-memory-disaggregation/ (may 2017).Google ScholarGoogle Scholar
  7. Chiachen Chou, Aamer Jaleel, and Moinuddin Qureshi. 2017. BATMAN: techniques for maximizing system bandwidth of memory systems with stacked-DRAM. In Proceedings of the International Symposium on Memory Systems. ACM, 268--280.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chiachen Chou, Aamer Jaleel, and Moinuddin K Qureshi. 2014. Cameo: A two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dan Comperchio and Jason Stevens. 2014. Emerging Computing Technologies: Hewlett-PackardâĂŹs âĂIJThe MachineâĂ$ID Project. In HP Discover 2014 conference held in Las Vegas June 10--12. Willdan Energy Solutions, 1--4.Google ScholarGoogle Scholar
  10. CCIX Consortium. 2017. Online]. https://www.ccixconsortium.com/ (2017).Google ScholarGoogle Scholar
  11. GenZ Consortium. 2017. GenZ Core Specification. Online]. https://www.ericsson.com/research-blog/time-memory-disaggregation/ (May 2017).Google ScholarGoogle Scholar
  12. Howard David, Chris Fallin, Eugene Gorbatov, Ulf R Hanebutte, and Onur Mutlu. 2011. Memory power management via dynamic voltage/frequency scaling. In Proceedings of the 8th ACM international conference on Autonomic computing. ACM, 31--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Qingyuan Deng, David Meisner, Luiz Ramos, Thomas F Wenisch, and Ricardo Bianchini. 2011. Memscale: active low-power modes for main memory. In ACM SIGPLAN Notices, Vol. 46. ACM, 225--238.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Charles R Ferenbaugh. 2015. PENNANT: an unstructured mesh mini-app for advanced architecture research. Concurrency and Computation: Practice and Experience 27, 17 (2015), 4555--4572.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Geoffrey Gunow, John Tramm, Benoit Forget, Kord Smith, and Tim He. 2015. Simplemoc-a performance abstraction for 3d moc. (2015).Google ScholarGoogle Scholar
  16. J. Cao H.-Y. Chen S. B. Eryilmaz S. W. Fong J. A. Incorvia Z. Jiang H. Li C. Neumann K. Okabe S. Qin J. Sohn Y. Wu S. Yu X. Zheng H.-S. P. Wong, C. Ahn. [n. d.]. Stanford Memory Trends. Retrieved February 1, 2019 from https://nano.stanford.edu/stanford-memory-trendsGoogle ScholarGoogle Scholar
  17. Jim Handy. 2015. Understanding the Intel/Micron 3D XPoint memory. In Proc. SDC.Google ScholarGoogle Scholar
  18. Michael A Heroux, Douglas W Doerfler, Paul S Crozier, James M Willenbring, H Carter Edwards, Alan Williams, Mahesh Rajan, Eric R Keiter, Heidi K Thornquist, and Robert W Numrich. 2009. Improving performance via mini-applications. Sandia National Laboratories, Tech. Rep. SAND2009-5574 3 (2009).Google ScholarGoogle Scholar
  19. Forbes Technology Council Jai Menon. 2018. The Rise Of Memory-Centric Architectures. Online]. https://www.forbes.com/sites/forbestechcouncil/2018/11/16/the-rise-of-memory-centric-architectures/ (November 2018).Google ScholarGoogle Scholar
  20. Brian G Johnson and Charles H Dennison. 2004. Phase change memory. US Patent 6,791,102.Google ScholarGoogle Scholar
  21. Sudarsun Kannan, Ada Gavrilovska, Vishal Gupta, and Karsten Schwan. 2017. Heteroos: Os design for heterogeneous memory management in datacenter. In ACM SIGARCH Computer Architecture News, Vol. 45. ACM, 521--534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ian Karlin, Jeff Keasler, and JR Neely. 2013. Lulesh 2.0 updates and changes. Technical Report. Lawrence Livermore National Laboratory (LLNL), Livermore, CA.Google ScholarGoogle Scholar
  23. VR Kommareddy, A Awad, C Hughes, and SD Hammond. [n. d.]. Opal: A Centralized Memory Manager for Investigating Disaggregated Memory Systems. ([n. d.]).Google ScholarGoogle Scholar
  24. Shuang Liang, Ranjit Noronha, and Dhabaleswar K Panda. 2005. Swapping to remote memory over infiniband: An approach using a high performance network block device. In 2005 IEEE International Conference on Cluster Computing. IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  25. Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K Reinhardt, and Thomas F Wenisch. 2009. Disaggregated memory for expansion and sharing in blade servers. In ACM SIGARCH Computer Architecture News, Vol. 37. ACM, 267--278.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kevin Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F Wenisch. 2012. System-level implications of disaggregated memory. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on. IEEE, 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Chung-Hsiang Lin, Chia-Lin Yang, and Ku-Jei King. 2009. PPT: joint performance/power/thermal management of DRAM memory for multi-core systems. In Proceedings of the 2009 ACM/IEEE international symposium on Low power electronics and design. ACM, 93--98.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Felix Xiaozhu Lin and Xu Liu. 2016. Memif: Towards programming heterogeneous memory asynchronously. ACM SIGARCH Computer Architecture News 44, 2 (2016), 369--383.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Song Liu, Brian Leung, Alexander Neckar, Seda Ogrenci Memik, Gokhan Memik, and Nikos Hardavellas. 2011. Hardware/software techniques for DRAM thermal management. (2011).Google ScholarGoogle Scholar
  30. Mitesh R Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H Loh. 2015. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on. IEEE, 126--136.Google ScholarGoogle ScholarCross RefCross Ref
  31. Hugo Meyer, Jose Carlos Sancho, Josue V Quiroga, Ferad Zyulkyarov, Damian Roca, and Mario Nemirovsky. 2017. Disaggregated computing. an evaluation of current trends for datacentres. Procedia Computer Science 108 (2017), 685--694.Google ScholarGoogle ScholarCross RefCross Ref
  32. Guilherme Piccoli, Henrique N Santos, Raphael E Rodrigues, Christiane Pousa, Edson Borin, and Fernando M Quintão Pereira. 2014. Compiler support for selective page migration in NUMA architectures. In Proceedings of the 23rd international conference on Parallel architectures and compilation. ACM, 369--380.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Andreas Prodromou, Mitesh Meswani, Nuwan Jayasena, Gabriel Loh, and Dean M Tullsen. 2017. MemPod: A clustered architecture for efficient and scalable migration in flat address space multi-level memories. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on. IEEE, 433--444.Google ScholarGoogle ScholarCross RefCross Ref
  34. Luiz E Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page placement in hybrid memory systems. In Proceedings of the international conference on Supercomputing. ACM, 85--95.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Arun F Rodrigues, K Scott Hemmert, Brian W Barrett, Chad Kersey, Ron Oldfield, Marlo Weston, Rolf Risen, Jeanine Cook, Paul Rosenfeld, E CooperBalls, et al. 2011. The structural simulation toolkit. ACM SIGMETRICS Performance Evaluation Review 38, 4 (2011), 37--42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Bogdan F Romanescu, Alvin R Lebeck, Daniel J Sorin, and Anne Bracy. 2010. UNified instruction/translation/data (UNITD) coherence: One protocol to rule them all. In Proceedings-International Symposium on High-Performance Computer Architecture.Google ScholarGoogle ScholarCross RefCross Ref
  37. Jaewoong Sim, Alaa R Alameldeen, Zeshan Chishti, Chris Wilkerson, and Hyesoon Kim. 2014. Transparent hardware management of stacked dram as part of memory. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 13--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xiaoyuan Wang. 2018. Supporting Superpages and Lightweight Page Migration in Hybrid Memory Systems. arXiv preprint arXiv:1806.00776 (2018).Google ScholarGoogle Scholar
  39. H-S Philip Wong, Simone Raoux, SangBum Kim, Jiale Liang, John P Reifenberg, Bipin Rajendran, Mehdi Asheghi, and Kenneth E Goodson. 2010. Phase change memory. Proc. IEEE 98, 12 (2010), 2201--2227.Google ScholarGoogle ScholarCross RefCross Ref
  40. HanBin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael A Harding, and Onur Mutlu. 2012. Row buffer locality aware caching policies for hybrid memories. In Computer Design (ICCD), 2012 IEEE 30th International Conference on. IEEE, 337--344.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Page migration support for disaggregated non-volatile memories

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      MEMSYS '19: Proceedings of the International Symposium on Memory Systems
      September 2019
      517 pages
      ISBN:9781450372060
      DOI:10.1145/3357526

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 September 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader