ABSTRACT
While GPU utilization allows one to speed up computations to the orders of magnitude, memory management remains the bottleneck making it often a challenge to achieve the desired performance. Hence, different memory optimizations are leveraged to make memory being used more effectively. We propose an approach automating memory management utilizing partial evaluation, a program transformation technique that enables data accesses to be pre-computed, optimized, and embedded into the code, saving memory transactions. An empirical evaluation of our approach shows that the transformed program could be up to 8 times as efficient as the original one in the case of CUDA C naïve string pattern matching algorithm implementation.
- Neil D. Jones. 1996. An Introduction to Partial Evaluation. ACM Comput. Surv. 28, 3 (1996), 480--503. Google ScholarDigital Library
- Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. 1993. Partial Evaluation and Automatic Program Generation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.Google ScholarDigital Library
- Gary Kessler. 2019. GCK'S FILE SIGNATURES TABLE. https://www.garykessler.net/library/file_sigs.html. Accessed: 2019-10-31.Google Scholar
- Roland Leissa, Klaas Boesche, Sebastian Hack, Arsène Pérard-Gayot, Richard Membarth, Philipp Slusallek, André Müller, and Bertil Schmidt. 2018. AnyDSL: A Partial Evaluation Framework for Programming High-performance Libraries. Proc. ACM Program. Lang. 2, OOPSLA, Article 119 (Oct. 2018), 30 pages Google ScholarDigital Library
- Digambar Povar and V. K. Bhadran. 2011. Forensic Data Carving. In Digital Forensics and Cyber Crime, Ibrahim Baggili (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 137--148.Google Scholar
- Putt Sakdhnagool, Amit Sabne, and Rudolf Eigenmann. 2019. RegDem: Increasing GPU Performance via Shared Memory Register Spilling. ArXiv abs/1907.02894 (2019).Google Scholar
- Eugene Sharygin, Ruben Buchatskiy, Roman Zhuykov, and Arseny Sher. 2018. Runtime Specialization of PostgreSQL Query Executor. In Perspectives of System Informatics, Alexander K. Petrenko and Andrei Voronkov (Eds.). Springer International Publishing, Cham, 375--386.Google Scholar
- Xinfeng Xie, Jason Cong, and Yun Liang. 2018. ICCAD : U : Optimizing GPU Shared Memory Allocation in Automated Cto-CUDA Compilation.Google Scholar
- Junzhe Zhang, Sai Ho Yeung, Yao Shu, Bingsheng He, and Wei Wang. 2019. Efficient Memory Management for GPU-based Deep Learning Systems. arXiv:cs.DC/1903.06631Google Scholar
Index Terms
- Optimizing GPU programs by partial evaluation
Recommendations
Architecture-Aware Mapping and Optimization on a 1600-Core GPU
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed SystemsThe graphics processing unit (GPU) continues to make in-roads as a computational accelerator for high-performance computing (HPC). However, despite its increasing popularity, mapping and optimizing GPU code remains a difficult task, it is a multi-...
SIMD Monte-Carlo Numerical Simulations Accelerated on GPU and Xeon Phi
The efficiency of a pleasingly parallel application is studied for several computing platforms. A real world problem, i.e., Monte-Carlo numerical simulations of stratospheric balloon envelope drift descent is considered. We detail the optimization of ...
Pervasive massively multithreaded GPU processors
CF '09: Proceedings of the 6th ACM conference on Computing frontiersThis talk presents an overview of NVIDIA's SIMT architecture and some brief insights on how some CUDA programming paradigms map onto it. A brief history of SIMT is provided to explain how NVIDIA ended up implementing a unified SIMT processor core in its ...
Comments