Static and Dynamic Frequency Scaling on Multicore CPUs

Authors:
Wenlei Bao

The Ohio State University, Columbus, Ohio

The Ohio State University, Columbus, Ohio
View Profile

,
Changwan Hong

The Ohio State University, Columbus, Ohio

The Ohio State University, Columbus, Ohio
View Profile

,
Sudheer Chunduri

IBM Research India, S. Cass Avenue Lemont, IL

IBM Research India, S. Cass Avenue Lemont, IL
View Profile

,
Sriram Krishnamoorthy

Pacific Northwest National Laboratory, Richland, WA

Pacific Northwest National Laboratory, Richland, WA
View Profile

,
Louis-Noël Pouchet

Colorado State University, Fort Collins, CO

Colorado State University, Fort Collins, CO
View Profile

,
Fabrice Rastello

University Grenoble Alpes, Grenoble France

University Grenoble Alpes, Grenoble France
View Profile

,
P. Sadayappan

The Ohio State University, Columbus, Ohio

The Ohio State University, Columbus, Ohio
View Profile

ACM Transactions on Architecture and Code Optimization Volume 13 Issue 4Article No.: 51pp 1–26https://doi.org/10.1145/3011017

Published:28 December 2016Publication History

ACM Transactions on Architecture and Code Optimization

Abstract

Dynamic Voltage and Frequency Scaling (DVFS) typically adapts CPU power consumption by modifying a processor’s operating frequency (and the associated voltage). Typical DVFS approaches include using default strategies such as running at the lowest or the highest frequency or reacting to the CPU’s runtime load to reduce or increase frequency based on the CPU usage. In this article, we argue that a compile-time approach to CPU frequency selection is achievable for affine program regions and can significantly outperform runtime-based approaches. We first propose a lightweight runtime approach that can exploit the properties of the power profile specific to a processor, outperforming classical Linux governors such as powersave or on-demand for computational kernels. We then demonstrate that, for affine kernels in the application, a purely compile-time approach to CPU frequency and core count selection is achievable, providing significant additional benefits over the runtime approach. Our framework relies on a one-time profiling of the target CPU, along with a compile-time categorization of loop-based code segments in the application. These are combined to determine at compile-time the frequency and the number of cores to use to execute each affine region to optimize energy or energy-delay product. Extensive evaluation on 60 benchmarks and 5 multi-core CPUs show that our approach systematically outperforms the powersave Linux governor while also improving overall performance.

References

Brian Austin and Nicholas J. Wright. 2014. Measurement and interpretation of micro-benchmark and application energy use on the cray XC30. In Proceedings of E2SC. 51--59. Google ScholarDigital Library
Wenlei Bao. 2014. Power-Aware WCET Analysis. Master’s thesis. Ohio State University.Google Scholar
Wenlei Bao, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, and P. Sadayappan. 2016. PolyCheck: Dynamic verification of iteration space transformations on affine programs. In Proceedings of POPL. ACM, 539--554. Google ScholarDigital Library
Wenlei Bao, Sanket Tavarageri, Fusun Ozguner, and P. Sadayappan. 2014. PWCET: Power-aware worst case execution time analysis. In Proceedings of ICPPW. IEEE, 439--447. Google ScholarDigital Library
Cedric Bastoul. 2004. Code generation in the polyhedral model is easier than you think. In Proceedings of PACT. 7--16. Google ScholarCross Ref
U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. 2008. PLUTO: A practical and fully automatic polyhedral program optimization system. In Proceedings of PLDI. Google ScholarDigital Library
Siddhartha Chatterjee, Erin Parker, Philip J. Hanlon, and Alvin R. Lebeck. 2001. Exact analysis of the cache behavior of nested loops. In Proceedings of PLDI. ACM, 286--297. Google ScholarDigital Library
Karel De Vogeleer, Gerard Memmi, Pierre Jouvelot, and Fabien Coelho. 2014. The energy/frequency convexity rule: Modeling and experimental validation on mobile devices. In Parallel Processing and Applied Mathematics. Vol. 8384. Springer Berlin, 793--803. DOI:http://dx.doi.org/10.1007/978-3-642-55224-3_74 Google ScholarCross Ref
Tahir Diop, Natalie Enright Jerger, and Jason Anderson. 2014. Power modeling for heterogeneous processors. In Proceedings of GPGPU. Google ScholarCross Ref
Keith I. Farkas, Jason Flinn, Godmar Back, Dirk Grunwald, and Jennifer M. Anderson. 2000. Quantifying the energy consumption of a pocket computer and a Java virtual machine. ACM SIGMETRICS Performance Evaluation Review 28, 1 (2000), 252--263. Google ScholarDigital Library
P. Feautrier. 1992. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. International Journal of Parallel Programming 21, 6 (Dec. 1992), 389--420. Google ScholarCross Ref
Jeanne Ferrante, Vivek Sarkar, and Wendy Thrash. 1991. On estimating and enhancing cache effectiveness. LCPC 589 (1991), 328--343.Google Scholar
M. Floyd, B. Brock, M. Ware, K. Rajamani, A. Drake, C. Lefurgy, and L. Pesantez. 2010. Harnessing the adaptive energy management features of the power7 chip. HOT Chips 2010 (2010).Google Scholar
Rong Ge, Xizhou Feng, Wu-chun Feng, and Kirk W. Cameron. 2007. CPU miser: A performance-directed, run-time system for power-aware clusters. In Proceedings of ICPP. 18--25. Google ScholarDigital Library
Rong Ge, Ryan Vogt, Jahangir Majumder, Arif Alam, Martin Burtscher, and Ziliang Zong. 2013. Effects of dynamic voltage and frequency scaling on a K20 GPU. In Proceedings of ICPP. 826--833. Google ScholarDigital Library
Somnath Ghosh, Margaret Martonosi, and Sharad Malik. 1999. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) 21, 4 (1999), 703--746. Google ScholarDigital Library
Sylvain Girbal, Nicolas Vasilache, Cédric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. 2006. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. International Journal of Parallel Programming 34, 3 (2006). Google ScholarDigital Library
Changwan Hong, Wenlei Bao, Albert Cohen, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, J. Ramanujam, and P. Sadayappan. 2016. Effective padding of multidimensional arrays to avoid cache conflict misses. In Proceedings of PLDI. ACM, 129--144. Google ScholarDigital Library
Chung-Hsing Hsu and Ulrich Kremer. 2003. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of PLDI. ACM, 38--48. Google ScholarDigital Library
Intel. Intel Math Kernel Library (Intel MKL). https://software.intel.com/en-us/intel-mkl.Google Scholar
Intel. Intel Performance Counter Monitor. www.intel.com/software/pcm.Google Scholar
Alexandra Jimborean, Konstantinos Koukos, Vasileios Spiliopoulos, David Black-Schaffer, and Stefanos Kaxiras. 2014. Fix the code. Don’t tweak the hardware: A new compiler approach to voltage-frequency scaling. In Proceedings of CGO. ACM, 262.Google ScholarDigital Library
Jian Li and Jose F. Martinez. 2006. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In Proceedings of HPCA. 77--87.Google Scholar
Jacob R. Lorch and Alan Jay Smith. 2001. Improving dynamic voltage scaling algorithms with PACE. In ACM SIGMETRICS Performance Evaluation Review, Vol. 29. ACM, 50--61. Google ScholarDigital Library
John D. McCalpin. 1991-2007. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical Report. University of Virginia, Charlottesville, Virginia. http://www.cs.virginia.edu/stream/ A continually updated technical report. Retrieved from http://www.cs.virginia.edu/stream/.Google Scholar
Xinxin Mei, Ling Sing Yung, Kaiyong Zhao, and Xiaowen Chu. 2013. A measurement study of GPU DVFS on energy conservation. In Proceedings of Workshop on Power-Aware Computing and Systems. 10. Google ScholarDigital Library
Netlib. Netlib BLAS. Retrieved from http://www.netlib.org/blas/index.html.Google Scholar
OpenCV. OpenCV: Open Source Computer Vision Library. Retrieved from http://opencv.org.Google Scholar
PoCC, the Polyhedral Compiler Collection, version 1.3. Retrieved from http://pocc.sourceforge.net.Google Scholar
PolyBench/C 3.2. Retrieved from http://polybench.sourceforge.net.Google Scholar
Louis-Noël Pouchet, Peng Zhang, P. Sadayappan, and Jason Cong. 2013. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of FPGA. Google ScholarDigital Library
H. Saputra, M. Kandemir, and others. 2002. Energy-conscious compilation based on voltage scaling. In Proceedings of LCTES. Google ScholarDigital Library
Vivek Sarkar. 1997. Automatic selection of high order transformations in the IBM XL Fortran compilers. IBM Journal of Research 8 Development 41, 3 (May 1997).Google Scholar
Markus Schordan, Pei-Hung Lin, Dan Quinlan, and Louis-Noel Pouchet. 2014. Verification of polyhedral optimizations with constant loop bounds in finite state space computations. In Leveraging Applications of Formal Methods, Verification and Validation. Specialized Techniques and Applications, Tiziana Margaria and Bernhard Steffen (Eds.). Lecture Notes in Computer Science, Vol. 8803. Springer Berlin Heidelberg, 493--508. DOI:http://dx.doi.org/10.1007/978-3-662-45231-8_41 Google ScholarCross Ref
Kevin Skadron, Mircea R. Stan, Karthik Sankaranarayanan, Wei Huang, Sivakumar Velusamy, and David Tarjan. 2004. Temperature-aware microarchitecture: Modeling and implementation. ACM Transactions on Architecture and Code Optimization 1, 1 (March 2004), 94--125. DOI:http://dx.doi.org/10.1145/980152.980157 Google ScholarDigital Library
Sanket Tavarageri and P. Sadayappan. 2013. A compiler analysis to determine useful cache size for energy efficiency. In Proceedings of IPDPSW. IEEE, 923--930. Google ScholarDigital Library
Sven Verdoolaege. 2010. isl: An integer set library for the polyhedral model. In Mathematical Software--ICMS 2010. Springer, 299--302.Google ScholarCross Ref
Sven Verdoolaege, Gerda Janssens, and Maurice Bruynooghe. 2009. Equivalence checking of static affine programs using widening to handle recurrences. In Computer Aided Verification. Springer, 599--613. Google ScholarDigital Library
S. Verdoolaege, R. Seghir, K. Beyls, V. Loechner, and M. Bruynooghe. 2007. Counting integer points in parametric polytopes using Barvinok’s rational functions. Algorithmica 48, 1 (June 2007), 37--66. Google ScholarDigital Library
Daecheol You and K.-S. Chung. 2012. Dynamic voltage and frequency scaling framework for low-power embedded GPUs. Electronics Letters 48, 21 (2012), 1333--1334. Google ScholarCross Ref
Tomofumi Yuki and Sanjay Rajopadhye. 2014. Folklore confirmed: Compiling for speed = compiling for energy. In Proceedings of LCPC. 169--184. Google ScholarCross Ref

Index Terms

Static and Dynamic Frequency Scaling on Multicore CPUs
1. Hardware
  1. Power and energy
    1. Power estimation and optimization
      1. Chip-level power issues
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Power management

Recommendations

Voltage scaling and dark silicon in symmetric multicore processors

As technology scales further, multicore and many-core processors emerge as an alternative to keep up with performance demands. However, because of power and thermal constraints, we are obliged to power off remarkable area of chip. Many innovative ...
Read More
Optimizing total power of many-core processors considering voltage scaling limit and process variations
ISLPED '09: Proceedings of the 2009 ACM/IEEE international symposium on Low power electronics and design

Recently, processor manufacturers have integrated more than a hundred cores in a single die to deliver extremely high throughput for highly-parallel, data-intensive applications like physics simulations, 3D-graphics, etc. Meanwhile, excessive power ...
Read More
Parallelism via Multithreaded and Multicore CPUs

Multicore and multithreaded CPUs have become the new approach to obtaining increases in CPU performance. Numeric applications mostly benefit from a large number of computationally powerful cores. Servers typically benefit more if chip circuitry is used ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Architecture and Code Optimization Volume 13, Issue 4
December 2016
648 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3012405
Editor:
Koen De Bosschere
Ghent University
Issue’s Table of Contents
Copyright © 2016 ACM
© 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 December 2016
- Accepted: 1 October 2016
- Revised: 1 September 2016
- Received: 1 June 2016
Published in taco Volume 13, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Affine Programs
CPU Energy
Static Analysis
Voltage and Frequency Scaling
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 1,216
  Total Downloads
- Downloads (Last 12 months)211
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Static and Dynamic Frequency Scaling on Multicore CPUs

ACM Transactions on Architecture and Code Optimization

Abstract

References

Cited By

Index Terms

Recommendations

Voltage scaling and dark silicon in symmetric multicore processors

Optimizing total power of many-core processors considering voltage scaling limit and process variations

Parallelism via Multithreaded and Multicore CPUs