ABSTRACT
Energy consumption has become a core concern in computing systems. In this context, power capping is an approach that aims at ensuring that the power consumption of a system does not overcome a predefined threshold. Although various power capping techniques exist in the literature, they do not fit well the nature of multi-threaded workloads with shared data accesses and non-minimal thread-level concurrency. For these workloads, scalability may be limited by thread contention on hardware resources and/or data, to the point that performance may even decrease while increasing the thread-level parallelism, indicating scarce ability to exploit the actual computing power available in highly parallel hardware. In this paper, we consider the problem of maximizing the performance of multi-thread applications under a power cap by dynamically tuning the thread-level parallelism and the power state of CPU-cores in combination. Based on experimental observations, we design a technique that adaptively identifies, in linear time within a bi-dimensional space, the optimal parallelism and power state setting. We evaluated the proposed technique with different benchmark applications, and using different methods for synchronizing threads when accessing shared data, and we compared it with other state-of-the-art power capping techniques.
Supplemental Material
Available for Download
EPADS: Exploration-based Power-capping for Applications with Diverse Scalability. The software in this repository defines a power capping solution that maximizes application performance while operating within power consumption constraints. This solution is based on the results of a preliminary analysis that shows how the throughput curve, when varying the number of cores assigned to a multithreaded application, preserves the same shape, and the same value for the maximum, at different performance states (P-state). Based on this result, the software presented in this repository performs an online exploration of the configurations of P-state and assigned cores/threads with the goal of adaptively allocating the power budget (power cap) to maximize application performance. Please check readme.txt in the zip file for futher informations.
- Md Abdullah Shahneous Bari, Nicholas Chaimov, Abid M. Malik, Kevin A. Huck, Barbara Chapman, Allen D. Malony, and Osman Sarood. 2016. ARCS: Adaptive runtime configuration selection for power-constrained OpenMP applications. Proceedings - IEEE International Conference on Cluster Computing, ICCC (2016), 461--470.Google Scholar
- Roberto Vitali, Alessandro Pellegrini, and Francesco Quaglia . 2012. Load Sharing for Optimistic Parallel Simulations on Multi Core Machines. SIGMETRICS Perform. Eval. Rev. Vol. 40, 3 (Jan. 2012), 2--11. 0163--5999 Google ScholarDigital Library
- Huazhe Zhang and Henry Hoffmann. 2016. Maximizing Performance Under a Power Cap: A Comparison of Hardware, Software, and Hybrid Techniques. SIGPLAN Not., Vol. 51, 4 (March 2016), 545--559. 0362--1340Google ScholarDigital Library
Index Terms
- Adaptive Performance Optimization under Power Constraint in Multi-thread Applications with Diverse Scalability
Recommendations
Performance Optimization of the HPCG Benchmark on the Sunway TaihuLight Supercomputer
In this article, we present some key techniques for optimizing HPCG on Sunway TaihuLight and demonstrate how to achieve high performance in memory-bound applications by exploiting specific characteristics of the hardware architecture. In particular, we ...
BatchSizer: Power-Performance Trade-off for DNN Inference
ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation ConferenceGPU accelerators can deliver significant improvement for DNN processing; however, their performance is limited by internal and external parameters. A well-known parameter that restricts the performance of various computing platforms in real-world setups,...
CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms
As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern ...
Comments