research-article

Free Access

Main Memory in HPC: Do We Need More or Could We Live with Less?

Authors:
Darko Zivanovic

Barcelona Supercomputing Center (BSC), Universitat Politècnica de Catalunya, Barcelona, Spain

Barcelona Supercomputing Center (BSC), Universitat Politècnica de Catalunya, Barcelona, Spain

0000-0003-2335-0006
View Profile

,
Milan Pavlovic

Barcelona Supercomputing Center (BSC), Universitat Politècnica de Catalunya, Barcelona, Spain

Barcelona Supercomputing Center (BSC), Universitat Politècnica de Catalunya, Barcelona, Spain
View Profile

,
Milan Radulovic

Barcelona Supercomputing Center (BSC), Universitat Politècnica de Catalunya, Barcelona, Spain

Barcelona Supercomputing Center (BSC), Universitat Politècnica de Catalunya, Barcelona, Spain
View Profile

,
Hyunsung Shin

Samsung Electronics Co., Ltd., Memory Division, Gyeonggi-do, Korea

Samsung Electronics Co., Ltd., Memory Division, Gyeonggi-do, Korea
View Profile

,
Jongpil Son

Samsung Electronics Co., Ltd., Memory Division, Gyeonggi-do, Korea

Samsung Electronics Co., Ltd., Memory Division, Gyeonggi-do, Korea
View Profile

,
Sally A. Mckee

Chalmers University of Technology, Göteborg, Sweden

Chalmers University of Technology, Göteborg, Sweden

0000-0003-0514-3767
View Profile

,
Paul M. Carpenter

Barcelona Supercomputing Center (BSC), Barcelona, Spain

Barcelona Supercomputing Center (BSC), Barcelona, Spain
View Profile

,
Petar Radojković

Barcelona Supercomputing Center (BSC), Barcelona, Spain

Barcelona Supercomputing Center (BSC), Barcelona, Spain
View Profile

,
Eduard Ayguadé

Barcelona Supercomputing Center (BSC), Universitat Politècnica de Catalunya, Barcelona, Spain

Barcelona Supercomputing Center (BSC), Universitat Politècnica de Catalunya, Barcelona, Spain
View Profile

ACM Transactions on Architecture and Code Optimization Volume 14 Issue 1Article No.: 3pp 1–26https://doi.org/10.1145/3023362

Published:06 March 2017Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

An important aspect of High-Performance Computing (HPC) system design is the choice of main memory capacity. This choice becomes increasingly important now that 3D-stacked memories are entering the market. Compared with conventional Dual In-line Memory Modules (DIMMs), 3D memory chiplets provide better performance and energy efficiency but lower memory capacities. Therefore, the adoption of 3D-stacked memories in the HPC domain depends on whether we can find use cases that require much less memory than is available now.

This study analyzes the memory capacity requirements of important HPC benchmarks and applications. We find that the High-Performance Conjugate Gradients (HPCG) benchmark could be an important success story for 3D-stacked memories in HPC, but High-Performance Linpack (HPL) is likely to be constrained by 3D memory capacity. The study also emphasizes that the analysis of memory footprints of production HPC applications is complex and that it requires an understanding of application scalability and target category, i.e., whether the users target capability or capacity computing. The results show that most of the HPC applications under study have per-core memory footprints in the range of hundreds of megabytes, but we also detect applications and use cases that require gigabytes per core. Overall, the study identifies the HPC applications and use cases with memory footprints that could be provided by 3D-stacked memory chiplets, making a first step toward adoption of this novel technology in the HPC domain.

Supplemental Material

Available for Download

pdf

taco1401-03.pdf (1.3 MB)

Slide deck associated with this paper

References

Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and Katherine A. Yelick. 2006. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183. EECS Department, University of California, Berkeley.Google Scholar
Daniel E. Atkins, Kelvin K. Droegemeier, Stuart I. Feldman, Stuart I. Feldman, Michael L. Klein, David G. Messerschmitt, Paul Messina, Jeremiah P. Ostriker, and Margaret H. Wright. 2003. Revolutionizing Science and Engineering Through Cyberinfrastructure. Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure. National Science Foundation.Google Scholar
Barcelona Supercomputing Center. 2013. MareNostrum III System Architecture. Technical Report.Google Scholar
Barcelona Supercomputing Center. 2014. Extrae User Guide Manual for Version 2.5.1. Barcelona Supercomputing Center.Google Scholar
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 72--81. Google ScholarDigital Library
Susmit Biswas, Bronis R. de Supinski, Martin Schulz, Diana Franklin, Timothy Sherwood, and Frederic T. Chong. 2011. Exploiting data similarity to reduce memory footprints. In Proceedings of the IEEE International Parallel 8 Distributed Processing Symposium (IPDPS). 152--163. Google ScholarDigital Library
Mark Bull. 2013. PRACE-2IP: D7.4 unified european applications benchmark suite final. (2013).Google Scholar
Chris Cantalupo, Karthik Raman, and Ruchira Sasanka. 2015. MCDRAM on 2nd Generation Intel Xeon Phi Processor (code-named Knights Landing): Analysis Methods and Tools. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC).Google Scholar
Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2014. CAMEO: A two-level memory organization with capacity of main memory and flexibility of hardware-managed cache. In Proceedings of the International Symposium on Microarchitecture (MICRO). 1--12. Google ScholarDigital Library
Xiangyu Dong, Yuan Xie, Naveen Muralimanohar, and Norman P. Jouppi. 2010. Simple but effective heterogeneous main memory with on-chip memory controller support. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 1--11. Google ScholarDigital Library
Jack Dongarra, Michael Heroux, and Piotr Luszczek. 2016. The HPCG Benchmark. Retrieved from http://www.hpcg-benchmark.org/.Google Scholar
Jack J. Dongarra and Michael A. Heroux. 2013. Toward a New Metric for Ranking High Performance Computing Systems. Sandia Report SAND2013-4744. Sandia National Laboratories.Google Scholar
Jack J. Dongarra, Piotr Luszczek, and Michael A. Heroux. 2014. HPCG: One year later. In Proceedings of the International Supercomputing Conference (ISC).Google Scholar
Jack J. Dongarra, Piotr Luszczek, and Antoine Petitet. 2003. The LINPACK benchmark: Past, present and future. Concurrency and Computation: Practice and Experience 15, 9 (2003), 803--820. Google ScholarCross Ref
ETP4HPC. 2013. ETP4HPC Strategic Research Agenda Achieving HPC Leadership in Europe. (June 2013).Google Scholar
Hybrid Memory Cube Consortium. 2014. Hybrid Memory Cube Specification 2.0. Retrieved from http://www.hybridmemorycube.org/specification-v2-download-form/.Google Scholar
Intel. 2016a. Intel VTune Amplifier 2016. Retrieved from https://software.intel.com/en-us/intel-vtune-amplifier-xe/.Google Scholar
Intel. 2016b. The memkind library. Retrieved from http://memkind.github.io/memkind/.Google Scholar
JEDEC Solid State Technology Association. 2013. High Bandwidth Memory (HBM) DRAM. http://www.jedec.org/standards-documents/docs/jesd235. (Oct. 2013).Google Scholar
James Jeffers, James Reinders, and Avinash Sodani. 2016. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition (2nd ed.). Morgan Kaufmann. Google ScholarDigital Library
Peter Kogge, Keren Bergman, Shekhar Borkar, Dan Campbell, William Carlson, William Dally, Monty Denneau, Paul Franzon, William Harrod, Kerry Hill, Jon Hiller, Sherman Karp, Stephen Keckler, Dean Klein, Robert Lucas, Mark Richards, Al Scarpelli, Steven Scott, Allan Snavely, Thomas Sterling, R. Stanley Williams, and Katherine Yelick. 2008. ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems. Technical Report.Google Scholar
Matthew J. Koop, Terry Jones, and Dhabaleswar K. Panda. 2007. Reducing connection memory requirements of MPI for infiniband clusters: A message coalescing approach. In Proceedings of the IEEE International Symposium on Cluster Computing and the Grid (CCGRID). 495--504. Google ScholarDigital Library
Piotr Luszczek and Jack J. Dongarra. 2005. Introduction to the HPC Challenge Benchmark Suite. ICL Technical Report ICL-UT-05-01. University of Tennessee.Google Scholar
Vladimir Marjanović, José Garcia, and Colin W. Glass. 2014. Performance modeling of the HPCG benchmark. In High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. Springer International Publishing, 172--192.Google Scholar
Mitesh R. Meswani, Sergey Blagodurov, David Roberts, John Slice, Mike Ignatowski, and Gabriel H. Loh. 2015. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA). 126--136. Google ScholarCross Ref
Richard Murphy, Jonathan Berry, William McLendon, Bruce Hendrickson, Douglas Gregor, and Andrew Lumsdaine. 2006. DFS: A simple to write yet difficult to execute benchmark. In IEEE International Symposium on Workload Characterization (IISWC). 175--177. Google ScholarCross Ref
Richard Murphy, Kyle Wheeler, Brian Barrett, and James Ang. 2010. Introducing the Graph 500. Cray User’s Group (CUG). (May 2010).Google Scholar
NERSC. 2012. Large Scale Computing and Storage Requirements for High Energy Physics: Target 2017. Report of the NERSC Requirements Review. Lawrence Berkeley National Laboratory.Google Scholar
NERSC. 2013. Large Scale Computing and Storage Requirements for Biological and Environmental Science: Target 2017. Report of the NERSC Requirements Review LBNL-6256E. Lawrence Berkeley National Laboratory.Google Scholar
NERSC. 2014a. High Performance Computing and Storage Requirements for Basic Energy Sciences: Target 2017. Report of the HPC Requirements Review LBNL-6978E. Lawrence Berkeley National Laboratory.Google Scholar
NERSC. 2014b. Large Scale Computing and Storage Requirements for Fusion Energy Sciences: Target 2017. Report of the NERSC Requirements Review LBNL-6631E. Lawrence Berkeley National Laboratory.Google Scholar
NERSC. 2015a. High Performance Computing and Storage Requirements for Nuclear Physics: Target 2017. Report of the NERSC Requirements Review LBNL-6926E. Lawrence Berkeley National Laboratory.Google Scholar
NERSC. 2015b. Large Scale Computing and Storage Requirements for Advanced Scientific Computing Research: Target 2017. Report of the NERSC Requirements Review LBNL-6978E. Lawrence Berkeley National Laboratory.Google Scholar
Chris J. Newburn. 2015. Code for the future: Knights Landing and beyond. In Proceedings of the International Supercomputing Conference (ISC).Google Scholar
Jongsoo Park, Mikhail Smelyanskiy, Karthikeyan Vaidyanathan, Alexander Heinecke, Dhiraj D. Kalamkar, Xing Liu, Md. Mosotofa Ali Patwary, Yutong Lu, and Pradeep Dubey. 2014. Efficient shared-memory implementation of high-performance conjugate gradient benchmark and its application to unstructured matrices. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC). 945--955. Google ScholarDigital Library
Milan Pavlovic, Yoav Etsion, and Alex Ramirez. 2011. On the memory system requirements of future scientific applications: Four case-studies. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC). 159--170. Google ScholarDigital Library
Milan Pavlovic, Milan Radulovic, Alex Ramirez, and Petar Radojkovic. 2015. Limpio - Lightweight MPI instrumentation. In Proceedings of the International Conference on Program Comprehension (ICPC). Retrieved from https://www.bsc.es/computer-sciences/computer-architecture/memory-systems/limpio, 303--306. Google ScholarDigital Library
O. Perks, S. D. Hammond, S. J. Pennycook, and S. A. Jarvis. 2011. Should we worry about memory loss? SIGMETRICS Performance Evaluation Review 38, 4 (March 2011), 69--74. Google ScholarDigital Library
Antoine Petitet, Clint Whaley, Jack Dongarra, Andy Cleary, and Piotr Luszczek. 2012. HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. Retrieved from http://www.netlib.org/benchmark/hpl/.Google Scholar
PRACE. 2013. Unified European Applications Benchmark Suite. www.prace-ri.eu/ueabs/. (2013).Google Scholar
PRACE. 2016. Prace Research Infrastructure. http://www.prace-ri.eu. (2016).Google Scholar
Milan Radulovic, Darko Zivanovic, Daniel Ruiz, Bronis R. de Supinski, Sally A. McKee, Petar Radojković, and Eduard Ayguadé. 2015. Another trip to the wall: How much will stacked DRAM benefit HPC? In Proceedings of the International Symposium on Memory Systems (MEMSYS). 31--36. Google ScholarDigital Library
Jaewoong Sim, Alaa R. Alameldeen, Zeshan Chishti, Chris Wilkerson, and Hyesoon Kim. 2014. Transparent hardware management of stacked DRAM as part of memory. In Proc. of the International Symposium on Microarchitecture (MICRO). 13--24. Google ScholarDigital Library
Avinash Sodani. 2011. Race to Exascale: Opportunities and Challenges. International Symposium on Microarchitecture (MICRO). (Dec. 2011). Keynote.Google Scholar
Avinash Sodani, Roger Gramunt, Jesus Corbal, Ho-Seop Kim, Krishna Vinod, Sundaram Chinthamani, Steven Hutsell, Rajat Agarwal, and Yen-Chen Liu. 2016. Knights landing: Second-generation Intel Xeon Phi product. IEEE Micro 36, 2 (March 2016), 34--46. Google ScholarDigital Library
SPEC. 2015a. SPEC MPI2007. Retrieved from http://www.spec.org/mpi2007/.Google Scholar
SPEC. 2015b. SPEC OMP2012. https://www.spec.org/omp2012/.Google Scholar
Rick Stevens, Andy White, Pete Beckman, Ray Bair-ANL, Jim Hack, Jeff Nichols, Al GeistORNL, Horst Simon, Kathy Yelick, John Shalf-LBNL, Steve Ashby, Moe Khaleel-PNNL, Michel McCoy, Mark Seager, Brent Gorda-LLNL, John Morrison, Cheryl Wampler-LANL, James Peery, Sudip Dosanjh, Jim Ang-SNL, Jim Davenport, Tom Schlagel, BNL, Fred Johnson, and Paul Messina. 2010. A Decadal DOE Plan for Providing Exascale Applications and Technologies for DOE Mission Needs. Presentation at Advanced Simulation and Computing Principal Investigators Meeting.Google Scholar
Erich Strohmaier, Jack Dongarra, Horst Simon, Martin Meuer, and Hans Meuer. 2015. TOP500 List. Retrieved from http://www.top500.org/.Google Scholar
Frederick C. Wong, Richard P. Martin, Remzi H. Arpaci-Dusseau, and David E. Culler. 1999. Architectural requirements and scalability of the NAS parallel benchmarks. In Proceedings of the of the ACM/IEEE Conference on Supercomputing (SC). Google ScholarDigital Library
Steven Cameron Woo, Moriyoshi Ohara, and Evan Torrie. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the of the International Symposium on Computer Architecture (ISCA). 24--36. Google ScholarDigital Library
Darko Zivanovic, Milan Radulovic, Germán Llort, David Zaragoza, Janko Strassburg, Paul M. Carpenter, Petar Radojković, and Eduard Ayguadé. 2016. Large-memory nodes for energy efficient high-performance computing. In Proceedings of the of the International Symposium on Memory Systems (MEMSYS). Google ScholarDigital Library

Index Terms

Main Memory in HPC: Do We Need More or Could We Live with Less?
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Hardware
  1. Emerging technologies
    1. Analysis and design of emerging devices and systems
    2. Memory and dense storage

Recommendations

Performance Impact of a Slower Main Memory: A case study of STT-MRAM in HPC
MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems

In high-performance computing (HPC), significant effort is invested in research and development of novel memory technologies. One of them is Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) --- byte-addressable, high-endurance non-volatile ...
Read More
Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems

The non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...
Read More
Enabling a reliable STT-MRAM main memory simulation
MEMSYS '17: Proceedings of the International Symposium on Memory Systems

STT-MRAM is a promising new memory technology with very desirable set of properties such as non-volatility, byte-addressability and high endurance. It has the potential to become the universal memory that could be incorporated to all levels of memory ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Architecture and Code Optimization Volume 14, Issue 1
March 2017
258 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3058793
Editor:
Koen De Bosschere
Ghent University
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 March 2017
- Accepted: 1 December 2016
- Revised: 1 November 2016
- Received: 1 May 2016
Published in taco Volume 14, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
HPCG
HPL
Memory capacity requirements
high-performance computing
production HPC applications
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 3,986
  Total Downloads
- Downloads (Last 12 months)337
- Downloads (Last 6 weeks)34
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Main Memory in HPC: Do We Need More or Could We Live with Less?

ACM Transactions on Architecture and Code Optimization

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Performance Impact of a Slower Main Memory: A case study of STT-MRAM in HPC

Redesign the Memory Allocator for Non-Volatile Main Memory

Enabling a reliable STT-MRAM main memory simulation