research-article

NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning

Authors:
Gagandeep Singh

Eindhoven University of Technology and IBM Research - Zurich

Eindhoven University of Technology and IBM Research - Zurich
View Profile

,
Juan Gómez-Luna

ETH Zürich

ETH Zürich
View Profile

,
Giovanni Mariani

IBM Research - Zurich

IBM Research - Zurich
View Profile

,
Geraldo F. Oliveira

ETH Zürich

ETH Zürich
View Profile

,
Stefano Corda

Eindhoven University of Technology and IBM Research - Zurich

Eindhoven University of Technology and IBM Research - Zurich
View Profile

,
Sander Stuijk

Eindhoven University of Technology

Eindhoven University of Technology
View Profile

,
Onur Mutlu

ETH Zürich

ETH Zürich
View Profile

,
Henk Corporaal

Eindhoven University of Technology

Eindhoven University of Technology
View Profile

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019June 2019Article No.: 27Pages 1–6https://doi.org/10.1145/3316781.3317867

Published:02 June 2019Publication History

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

Pages 1–6

ABSTRACT

The cost of moving data between the memory/storage units and the compute units is a major contributor to the execution time and energy consumption of modern workloads in computing systems. A promising paradigm to alleviate this data movement bottleneck is near-memory computing (NMC), which consists of placing compute units close to the memory/storage units. There is substantial research effort that proposes NMC architectures and identifies workloads that can benefit from NMC. System architects typically use simulation techniques to evaluate the performance and energy consumption of their designs. However, simulation is extremely slow, imposing long times for design space exploration. In order to enable fast early-stage design space exploration of NMC architectures, we need high-level performance and energy models.

We present NAPEL, a high-level performance and energy estimation framework for NMC architectures. NAPEL leverages ensemble learning to develop a model that is based on microarchitectural parameters and application characteristics. NAPEL training uses a statistical technique, called design of experiments, to collect representative training data efficiently. NAPEL provides early design space exploration 220× faster than a state-of-the-art NMC simulator, on average, with error rates of to 8.5% and 11.6% for performance and energy estimations, respectively, compared to the NMC simulator. NAPEL is also capable of making accurate predictions for previously-unseen applications.

References

J. Ahn et al. PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture. In ISCA 2015. Google ScholarDigital Library
J. Ahn et al. A scalable processing-in-memory accelerator for parallel graph processing. In ISCA 2015. Google ScholarDigital Library
A. Anghel et al. An instrumentation approach for hardware-agnostic software characterization. IJPP (2016).Google Scholar
E. Azarkhish et al. Design and evaluation of a processing-in-memory architecture for the smart memory cube. In ARCS 2016. Google ScholarDigital Library
A. Boroumand et al. CoNDA: Enabling efficient near-data accelerator communication by optimizing data movement. In ISCA 2019.Google Scholar
A. Boroumand et al. Google workloads for consumer devices: Mitigating data movement bottlenecks. In ASPLOS 2018. Google ScholarDigital Library
A. Boroumand et al. LazyPIM: An efficient cache coherence mechanism for processing-in-memory. CAL (2017).Google Scholar
L. Breiman. Random forests. Machine learning (2001). Google ScholarDigital Library
A. Calotoiu et al. Using automated performance modeling to find scalability bugs in complex codes. In SC 2013. Google ScholarDigital Library
S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In IISWC 2009. Google ScholarDigital Library
M. Gao et al. Practical near-data processing for in-memory analytics frameworks. In PACT 2015. Google ScholarDigital Library
S. Ghose et al. The processing-in-memory paradigm: Mechanisms to enable adoption. In Beyond-CMOS Technologies for Next Generation Computer Design (2019).Google ScholarCross Ref
Q. Guo et al. Microarchitectural design space exploration made fast. MicPro (2013). Google ScholarDigital Library
K. Hsieh et al. Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation. In ICCD 2016.Google ScholarCross Ref
K. Hsieh et al. Transparent offloading and mapping (TOM): Enabling programmer-transparent near-data processing in GPU systems. In ISCA 2016. Google ScholarDigital Library
IBM. IBM POWER9 CPU. URL: https://www.ibm.com/it-infrastructure/power/power9 (2018).Google Scholar
E. Ipek et al. Efficiently exploring architectural design spaces via predictive modeling. In ASPLOS 2006. Google ScholarDigital Library
P. J. Joseph et al. Construction and use of linear regression models for processor performance analysis. In HPCA 2006.Google ScholarCross Ref
J. S. Kim et al. GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies. BMC Genomics (2018).Google Scholar
Y. Kim et al. Ramulator: A fast and extensible DRAM simulator. CAL (2016). Google ScholarDigital Library
C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO 2004. Google ScholarDigital Library
D. Lee et al. Simultaneous multi-layer access: Improving 3D-stacked memory bandwidth at low cost. ACM TACO (2016). Google ScholarDigital Library
D. U. Lee et al. 25.2 A 1.2V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV. In ISSCC 2014.Google Scholar
D. Li et al. Processor design space exploration via statistical sampling and semi-supervised ensemble learning. IEEE Access (2018).Google ScholarCross Ref
G. Mariani et al. Predicting cloud performance for HPC applications: a user-oriented approach. In CCGrid 2017. Google ScholarDigital Library
D. C. Montgomery. Design and analysis of experiments. (2017).Google Scholar
O. Mutlu et al. Processing data where it makes sense: Enabling in-memory computation. MicPro (2019).Google Scholar
M. Natrella. NIST/SEMATECH e-handbook of statistical methods. (2010).Google Scholar
J. T. Pawlowski. Hybrid memory cube (HMC). In HCS 2011.Google ScholarCross Ref
F. Pedregosa et al. Scikit-learn: Machine learning in Python. JMLR (2011). Google ScholarDigital Library
L.-N. Pouchet. Polybench: The polyhedral benchmark suite. URL: http://www.cs.ucla.edu/pouchet/software/polybench (2012).Google Scholar
SAFARI Research Group. Ramulator for processing-in-memory. https://github.com/CMU-SAFARI/ramulator-pim/.Google Scholar
D. Sanchez and C. Kozyrakis. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In ISCA 2013. Google ScholarDigital Library
G. Singh et al. A Review of near-memory computing architectures: Opportunities and challenges. In DSD 2018.Google ScholarCross Ref
A. Wong et al. Parallel application signature for performance analysis and prediction. IEEE TPDS (2015).Google ScholarDigital Library
G. Wu et al. GPGPU performance and power estimation using machine learning. In HPCA 2015.Google ScholarCross Ref
X. Wu and F. Mueller. Scalaextrap: Trace-based communication extrapolation for SPMD programs. In PPoPP 2011. Google ScholarDigital Library

Recommendations

A durable and energy efficient main memory using phase change memory technology
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Using nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...
Read More
Mellow writes: extending lifetime in resistive memories through selective slow write backs
ISCA'16

Emerging resistive memory technologies, such as PCRAM and ReRAM, have been proposed as promising replacements for DRAM-based main memory, due to their better scalability, low standby power, and non-volatility. However, limited write endurance is a major ...
Read More
Per-bank refresh with adaptive early termination for high density DRAM
ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing

DRAM, which is mainly used as main memory, requires a refresh operation to maintain the integrity of stored data. Since memory read and write operations to a bank are not allowed while the bank is being refreshed, a lot of memory accesses may be blocked ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019
June 2019
1378 pages
ISBN:9781450367257
DOI:10.1145/3316781

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 59
  Total Citations
  View Citations
- 444
  Total Downloads
- Downloads (Last 12 months)93
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

ABSTRACT

References

Cited By

Recommendations

A durable and energy efficient main memory using phase change memory technology

Mellow writes: extending lifetime in resistive memories through selective slow write backs

Per-bank refresh with adaptive early termination for high density DRAM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

ABSTRACT

References

Cited By

Recommendations

A durable and energy efficient main memory using phase change memory technology

Mellow writes: extending lifetime in resistive memories through selective slow write backs

Per-bank refresh with adaptive early termination for high density DRAM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media