ABSTRACT
As demands for memory-intensive applications continue to grow, the memory capacity of each computing node is expected to grow at a similar pace. In high-performance computing (HPC) systems, the memory capacity per compute node is decided upon the most demanding application that would likely run on such system, and hence the average capacity per node in future HPC systems is expected to grow significantly. However, since HPC systems run many applications with different capacity demands, a large percentage of the overall memory capacity will likely be underutilized; memory modules can be thought of as private memory for its corresponding computing node. Thus, as HPC systems are moving towards the exascale era, a better utilization of memory is strongly desired. Moreover, upgrading memory system requires significant efforts. Fortunately, disaggregated memory systems promise better utilization by defining regions of global memory, typically referred to as memory blades, which can be accessed by all computing nodes in the system, thus achieving much better utilization.
Disaggregated memory systems are expected to be built using dense, power-efficient memory technologies. Thus, emerging nonvolatile memories (NVMs) are placing themselves as the main building blocks for such systems. However, NVMs are slower than DRAM. Therefore, it is expected that each computing node would have a small local memory that is based on either HBM or DRAM, whereas a large shared NVM memory would be accessible by all nodes. Managing such system with global and local memory requires a novel hardware/software co-design to initiate page migration between global and local memory to maximize performance while enabling access to huge shared memory. In this paper we provide support to migrate pages, investigate such memory management aspects and the major system-level aspects that can affect design decisions in disaggregated NVM systems
- Nadav Amit. 2017. Optimizing the TLB shootdown algorithm with page access tracking. In Proc. USENIX Ann. Conf. 27--39.Google ScholarDigital Library
- A. Arpaci-Dusseau. 2000. Translation Lookaside Buffers (TLBs). http://pages.cs.wisc.edu/~eli/537/lectures/TLB.2x2.pdfGoogle Scholar
- Amro Awad, Arkaprava Basu, Sergey Blagodurov, Yan Solihin, and Gabriel H Loh. 2017. Avoiding TLB Shootdowns Through Self-Invalidating TLB Entries. In Parallel Architectures and Compilation Techniques (PACT), 2017 26th International Conference on. IEEE, 273--287.Google ScholarCross Ref
- Amro Awad, Sergey Blagodurov, and Yan Solihin. 2016. Write-aware management of NVM-based memory extensions. In Proceedings of the 2016 International Conference on Supercomputing. ACM, 9.Google ScholarDigital Library
- David H Bailey. 2011. Nas parallel benchmarks. In Encyclopedia of Parallel Computing. Springer, 1254--1259.Google Scholar
- Daniel Turull Chakri Padala and Vinay Yadav. 2017. Time for memory disaggregation? Ericsson Research Blog. Online]. https://www.ericsson.com/research-blog/time-memory-disaggregation/ (may 2017).Google Scholar
- Chiachen Chou, Aamer Jaleel, and Moinuddin Qureshi. 2017. BATMAN: techniques for maximizing system bandwidth of memory systems with stacked-DRAM. In Proceedings of the International Symposium on Memory Systems. ACM, 268--280.Google ScholarDigital Library
- Chiachen Chou, Aamer Jaleel, and Moinuddin K Qureshi. 2014. Cameo: A two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 1--12.Google ScholarDigital Library
- Dan Comperchio and Jason Stevens. 2014. Emerging Computing Technologies: Hewlett-PackardâĂŹs âĂIJThe MachineâĂ$ID Project. In HP Discover 2014 conference held in Las Vegas June 10--12. Willdan Energy Solutions, 1--4.Google Scholar
- CCIX Consortium. 2017. Online]. https://www.ccixconsortium.com/ (2017).Google Scholar
- GenZ Consortium. 2017. GenZ Core Specification. Online]. https://www.ericsson.com/research-blog/time-memory-disaggregation/ (May 2017).Google Scholar
- Howard David, Chris Fallin, Eugene Gorbatov, Ulf R Hanebutte, and Onur Mutlu. 2011. Memory power management via dynamic voltage/frequency scaling. In Proceedings of the 8th ACM international conference on Autonomic computing. ACM, 31--40.Google ScholarDigital Library
- Qingyuan Deng, David Meisner, Luiz Ramos, Thomas F Wenisch, and Ricardo Bianchini. 2011. Memscale: active low-power modes for main memory. In ACM SIGPLAN Notices, Vol. 46. ACM, 225--238.Google ScholarDigital Library
- Charles R Ferenbaugh. 2015. PENNANT: an unstructured mesh mini-app for advanced architecture research. Concurrency and Computation: Practice and Experience 27, 17 (2015), 4555--4572.Google ScholarDigital Library
- Geoffrey Gunow, John Tramm, Benoit Forget, Kord Smith, and Tim He. 2015. Simplemoc-a performance abstraction for 3d moc. (2015).Google Scholar
- J. Cao H.-Y. Chen S. B. Eryilmaz S. W. Fong J. A. Incorvia Z. Jiang H. Li C. Neumann K. Okabe S. Qin J. Sohn Y. Wu S. Yu X. Zheng H.-S. P. Wong, C. Ahn. [n. d.]. Stanford Memory Trends. Retrieved February 1, 2019 from https://nano.stanford.edu/stanford-memory-trendsGoogle Scholar
- Jim Handy. 2015. Understanding the Intel/Micron 3D XPoint memory. In Proc. SDC.Google Scholar
- Michael A Heroux, Douglas W Doerfler, Paul S Crozier, James M Willenbring, H Carter Edwards, Alan Williams, Mahesh Rajan, Eric R Keiter, Heidi K Thornquist, and Robert W Numrich. 2009. Improving performance via mini-applications. Sandia National Laboratories, Tech. Rep. SAND2009-5574 3 (2009).Google Scholar
- Forbes Technology Council Jai Menon. 2018. The Rise Of Memory-Centric Architectures. Online]. https://www.forbes.com/sites/forbestechcouncil/2018/11/16/the-rise-of-memory-centric-architectures/ (November 2018).Google Scholar
- Brian G Johnson and Charles H Dennison. 2004. Phase change memory. US Patent 6,791,102.Google Scholar
- Sudarsun Kannan, Ada Gavrilovska, Vishal Gupta, and Karsten Schwan. 2017. Heteroos: Os design for heterogeneous memory management in datacenter. In ACM SIGARCH Computer Architecture News, Vol. 45. ACM, 521--534.Google ScholarDigital Library
- Ian Karlin, Jeff Keasler, and JR Neely. 2013. Lulesh 2.0 updates and changes. Technical Report. Lawrence Livermore National Laboratory (LLNL), Livermore, CA.Google Scholar
- VR Kommareddy, A Awad, C Hughes, and SD Hammond. [n. d.]. Opal: A Centralized Memory Manager for Investigating Disaggregated Memory Systems. ([n. d.]).Google Scholar
- Shuang Liang, Ranjit Noronha, and Dhabaleswar K Panda. 2005. Swapping to remote memory over infiniband: An approach using a high performance network block device. In 2005 IEEE International Conference on Cluster Computing. IEEE, 1--10.Google ScholarCross Ref
- Kevin Lim, Jichuan Chang, Trevor Mudge, Parthasarathy Ranganathan, Steven K Reinhardt, and Thomas F Wenisch. 2009. Disaggregated memory for expansion and sharing in blade servers. In ACM SIGARCH Computer Architecture News, Vol. 37. ACM, 267--278.Google ScholarDigital Library
- Kevin Lim, Yoshio Turner, Jose Renato Santos, Alvin AuYoung, Jichuan Chang, Parthasarathy Ranganathan, and Thomas F Wenisch. 2012. System-level implications of disaggregated memory. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on. IEEE, 1--12.Google ScholarDigital Library
- Chung-Hsiang Lin, Chia-Lin Yang, and Ku-Jei King. 2009. PPT: joint performance/power/thermal management of DRAM memory for multi-core systems. In Proceedings of the 2009 ACM/IEEE international symposium on Low power electronics and design. ACM, 93--98.Google ScholarDigital Library
- Felix Xiaozhu Lin and Xu Liu. 2016. Memif: Towards programming heterogeneous memory asynchronously. ACM SIGARCH Computer Architecture News 44, 2 (2016), 369--383.Google ScholarDigital Library
- Song Liu, Brian Leung, Alexander Neckar, Seda Ogrenci Memik, Gokhan Memik, and Nikos Hardavellas. 2011. Hardware/software techniques for DRAM thermal management. (2011).Google Scholar
- Mitesh R Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H Loh. 2015. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on. IEEE, 126--136.Google ScholarCross Ref
- Hugo Meyer, Jose Carlos Sancho, Josue V Quiroga, Ferad Zyulkyarov, Damian Roca, and Mario Nemirovsky. 2017. Disaggregated computing. an evaluation of current trends for datacentres. Procedia Computer Science 108 (2017), 685--694.Google ScholarCross Ref
- Guilherme Piccoli, Henrique N Santos, Raphael E Rodrigues, Christiane Pousa, Edson Borin, and Fernando M Quintão Pereira. 2014. Compiler support for selective page migration in NUMA architectures. In Proceedings of the 23rd international conference on Parallel architectures and compilation. ACM, 369--380.Google ScholarDigital Library
- Andreas Prodromou, Mitesh Meswani, Nuwan Jayasena, Gabriel Loh, and Dean M Tullsen. 2017. MemPod: A clustered architecture for efficient and scalable migration in flat address space multi-level memories. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on. IEEE, 433--444.Google ScholarCross Ref
- Luiz E Ramos, Eugene Gorbatov, and Ricardo Bianchini. 2011. Page placement in hybrid memory systems. In Proceedings of the international conference on Supercomputing. ACM, 85--95.Google ScholarDigital Library
- Arun F Rodrigues, K Scott Hemmert, Brian W Barrett, Chad Kersey, Ron Oldfield, Marlo Weston, Rolf Risen, Jeanine Cook, Paul Rosenfeld, E CooperBalls, et al. 2011. The structural simulation toolkit. ACM SIGMETRICS Performance Evaluation Review 38, 4 (2011), 37--42.Google ScholarDigital Library
- Bogdan F Romanescu, Alvin R Lebeck, Daniel J Sorin, and Anne Bracy. 2010. UNified instruction/translation/data (UNITD) coherence: One protocol to rule them all. In Proceedings-International Symposium on High-Performance Computer Architecture.Google ScholarCross Ref
- Jaewoong Sim, Alaa R Alameldeen, Zeshan Chishti, Chris Wilkerson, and Hyesoon Kim. 2014. Transparent hardware management of stacked dram as part of memory. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 13--24.Google ScholarDigital Library
- Xiaoyuan Wang. 2018. Supporting Superpages and Lightweight Page Migration in Hybrid Memory Systems. arXiv preprint arXiv:1806.00776 (2018).Google Scholar
- H-S Philip Wong, Simone Raoux, SangBum Kim, Jiale Liang, John P Reifenberg, Bipin Rajendran, Mehdi Asheghi, and Kenneth E Goodson. 2010. Phase change memory. Proc. IEEE 98, 12 (2010), 2201--2227.Google ScholarCross Ref
- HanBin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael A Harding, and Onur Mutlu. 2012. Row buffer locality aware caching policies for hybrid memories. In Computer Design (ICCD), 2012 IEEE 30th International Conference on. IEEE, 337--344.Google ScholarDigital Library
Index Terms
- Page migration support for disaggregated non-volatile memories
Recommendations
Embedded non-volatile memories
SBCCI '07: Proceedings of the 20th annual conference on Integrated circuits and systems designThis tutorial covers trends in embedded non-volatile memories including details of issues for scaling NAND and NOR flash and descriptions of scaled flash memory technologies and various evolutionary flash memory technologies such as trapping site ...
Reducing write activities on non-volatile memories in embedded CMPs via data migration and recomputation
DAC '10: Proceedings of the 47th Design Automation ConferenceRecent advances in circuit and process technologies have pushed non-volatile memory technologies into a new era. These technologies exhibit appealing properties such as low power consumption, non-volatility, shock-resistivity, and high density. However, ...
Towards Write-Activity-Aware Page Table Management for Non-volatile Main Memories
Non-volatile memories such as phase change memory (PCM) and memristor are being actively studied as an alternative to DRAM-based main memory in embedded systems because of their properties, which include low power consumption and high density. Though ...
Comments