Abstract
Dynamic Voltage and Frequency Scaling (DVFS) typically adapts CPU power consumption by modifying a processor’s operating frequency (and the associated voltage). Typical DVFS approaches include using default strategies such as running at the lowest or the highest frequency or reacting to the CPU’s runtime load to reduce or increase frequency based on the CPU usage. In this article, we argue that a compile-time approach to CPU frequency selection is achievable for affine program regions and can significantly outperform runtime-based approaches. We first propose a lightweight runtime approach that can exploit the properties of the power profile specific to a processor, outperforming classical Linux governors such as powersave or on-demand for computational kernels. We then demonstrate that, for affine kernels in the application, a purely compile-time approach to CPU frequency and core count selection is achievable, providing significant additional benefits over the runtime approach. Our framework relies on a one-time profiling of the target CPU, along with a compile-time categorization of loop-based code segments in the application. These are combined to determine at compile-time the frequency and the number of cores to use to execute each affine region to optimize energy or energy-delay product. Extensive evaluation on 60 benchmarks and 5 multi-core CPUs show that our approach systematically outperforms the powersave Linux governor while also improving overall performance.
- Brian Austin and Nicholas J. Wright. 2014. Measurement and interpretation of micro-benchmark and application energy use on the cray XC30. In Proceedings of E2SC. 51--59. Google ScholarDigital Library
- Wenlei Bao. 2014. Power-Aware WCET Analysis. Master’s thesis. Ohio State University.Google Scholar
- Wenlei Bao, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, and P. Sadayappan. 2016. PolyCheck: Dynamic verification of iteration space transformations on affine programs. In Proceedings of POPL. ACM, 539--554. Google ScholarDigital Library
- Wenlei Bao, Sanket Tavarageri, Fusun Ozguner, and P. Sadayappan. 2014. PWCET: Power-aware worst case execution time analysis. In Proceedings of ICPPW. IEEE, 439--447. Google ScholarDigital Library
- Cedric Bastoul. 2004. Code generation in the polyhedral model is easier than you think. In Proceedings of PACT. 7--16. Google ScholarCross Ref
- U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. 2008. PLUTO: A practical and fully automatic polyhedral program optimization system. In Proceedings of PLDI. Google ScholarDigital Library
- Siddhartha Chatterjee, Erin Parker, Philip J. Hanlon, and Alvin R. Lebeck. 2001. Exact analysis of the cache behavior of nested loops. In Proceedings of PLDI. ACM, 286--297. Google ScholarDigital Library
- Karel De Vogeleer, Gerard Memmi, Pierre Jouvelot, and Fabien Coelho. 2014. The energy/frequency convexity rule: Modeling and experimental validation on mobile devices. In Parallel Processing and Applied Mathematics. Vol. 8384. Springer Berlin, 793--803. DOI:http://dx.doi.org/10.1007/978-3-642-55224-3_74 Google ScholarCross Ref
- Tahir Diop, Natalie Enright Jerger, and Jason Anderson. 2014. Power modeling for heterogeneous processors. In Proceedings of GPGPU. Google ScholarCross Ref
- Keith I. Farkas, Jason Flinn, Godmar Back, Dirk Grunwald, and Jennifer M. Anderson. 2000. Quantifying the energy consumption of a pocket computer and a Java virtual machine. ACM SIGMETRICS Performance Evaluation Review 28, 1 (2000), 252--263. Google ScholarDigital Library
- P. Feautrier. 1992. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. International Journal of Parallel Programming 21, 6 (Dec. 1992), 389--420. Google ScholarCross Ref
- Jeanne Ferrante, Vivek Sarkar, and Wendy Thrash. 1991. On estimating and enhancing cache effectiveness. LCPC 589 (1991), 328--343.Google Scholar
- M. Floyd, B. Brock, M. Ware, K. Rajamani, A. Drake, C. Lefurgy, and L. Pesantez. 2010. Harnessing the adaptive energy management features of the power7 chip. HOT Chips 2010 (2010).Google Scholar
- Rong Ge, Xizhou Feng, Wu-chun Feng, and Kirk W. Cameron. 2007. CPU miser: A performance-directed, run-time system for power-aware clusters. In Proceedings of ICPP. 18--25. Google ScholarDigital Library
- Rong Ge, Ryan Vogt, Jahangir Majumder, Arif Alam, Martin Burtscher, and Ziliang Zong. 2013. Effects of dynamic voltage and frequency scaling on a K20 GPU. In Proceedings of ICPP. 826--833. Google ScholarDigital Library
- Somnath Ghosh, Margaret Martonosi, and Sharad Malik. 1999. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) 21, 4 (1999), 703--746. Google ScholarDigital Library
- Sylvain Girbal, Nicolas Vasilache, Cédric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. 2006. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. International Journal of Parallel Programming 34, 3 (2006). Google ScholarDigital Library
- Changwan Hong, Wenlei Bao, Albert Cohen, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, J. Ramanujam, and P. Sadayappan. 2016. Effective padding of multidimensional arrays to avoid cache conflict misses. In Proceedings of PLDI. ACM, 129--144. Google ScholarDigital Library
- Chung-Hsing Hsu and Ulrich Kremer. 2003. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of PLDI. ACM, 38--48. Google ScholarDigital Library
- Intel. Intel Math Kernel Library (Intel MKL). https://software.intel.com/en-us/intel-mkl.Google Scholar
- Intel. Intel Performance Counter Monitor. www.intel.com/software/pcm.Google Scholar
- Alexandra Jimborean, Konstantinos Koukos, Vasileios Spiliopoulos, David Black-Schaffer, and Stefanos Kaxiras. 2014. Fix the code. Don’t tweak the hardware: A new compiler approach to voltage-frequency scaling. In Proceedings of CGO. ACM, 262.Google ScholarDigital Library
- Jian Li and Jose F. Martinez. 2006. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In Proceedings of HPCA. 77--87.Google Scholar
- Jacob R. Lorch and Alan Jay Smith. 2001. Improving dynamic voltage scaling algorithms with PACE. In ACM SIGMETRICS Performance Evaluation Review, Vol. 29. ACM, 50--61. Google ScholarDigital Library
- John D. McCalpin. 1991-2007. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical Report. University of Virginia, Charlottesville, Virginia. http://www.cs.virginia.edu/stream/ A continually updated technical report. Retrieved from http://www.cs.virginia.edu/stream/.Google Scholar
- Xinxin Mei, Ling Sing Yung, Kaiyong Zhao, and Xiaowen Chu. 2013. A measurement study of GPU DVFS on energy conservation. In Proceedings of Workshop on Power-Aware Computing and Systems. 10. Google ScholarDigital Library
- Netlib. Netlib BLAS. Retrieved from http://www.netlib.org/blas/index.html.Google Scholar
- OpenCV. OpenCV: Open Source Computer Vision Library. Retrieved from http://opencv.org.Google Scholar
- PoCC, the Polyhedral Compiler Collection, version 1.3. Retrieved from http://pocc.sourceforge.net.Google Scholar
- PolyBench/C 3.2. Retrieved from http://polybench.sourceforge.net.Google Scholar
- Louis-Noël Pouchet, Peng Zhang, P. Sadayappan, and Jason Cong. 2013. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of FPGA. Google ScholarDigital Library
- H. Saputra, M. Kandemir, and others. 2002. Energy-conscious compilation based on voltage scaling. In Proceedings of LCTES. Google ScholarDigital Library
- Vivek Sarkar. 1997. Automatic selection of high order transformations in the IBM XL Fortran compilers. IBM Journal of Research 8 Development 41, 3 (May 1997).Google Scholar
- Markus Schordan, Pei-Hung Lin, Dan Quinlan, and Louis-Noel Pouchet. 2014. Verification of polyhedral optimizations with constant loop bounds in finite state space computations. In Leveraging Applications of Formal Methods, Verification and Validation. Specialized Techniques and Applications, Tiziana Margaria and Bernhard Steffen (Eds.). Lecture Notes in Computer Science, Vol. 8803. Springer Berlin Heidelberg, 493--508. DOI:http://dx.doi.org/10.1007/978-3-662-45231-8_41 Google ScholarCross Ref
- Kevin Skadron, Mircea R. Stan, Karthik Sankaranarayanan, Wei Huang, Sivakumar Velusamy, and David Tarjan. 2004. Temperature-aware microarchitecture: Modeling and implementation. ACM Transactions on Architecture and Code Optimization 1, 1 (March 2004), 94--125. DOI:http://dx.doi.org/10.1145/980152.980157 Google ScholarDigital Library
- Sanket Tavarageri and P. Sadayappan. 2013. A compiler analysis to determine useful cache size for energy efficiency. In Proceedings of IPDPSW. IEEE, 923--930. Google ScholarDigital Library
- Sven Verdoolaege. 2010. isl: An integer set library for the polyhedral model. In Mathematical Software--ICMS 2010. Springer, 299--302.Google ScholarCross Ref
- Sven Verdoolaege, Gerda Janssens, and Maurice Bruynooghe. 2009. Equivalence checking of static affine programs using widening to handle recurrences. In Computer Aided Verification. Springer, 599--613. Google ScholarDigital Library
- S. Verdoolaege, R. Seghir, K. Beyls, V. Loechner, and M. Bruynooghe. 2007. Counting integer points in parametric polytopes using Barvinok’s rational functions. Algorithmica 48, 1 (June 2007), 37--66. Google ScholarDigital Library
- Daecheol You and K.-S. Chung. 2012. Dynamic voltage and frequency scaling framework for low-power embedded GPUs. Electronics Letters 48, 21 (2012), 1333--1334. Google ScholarCross Ref
- Tomofumi Yuki and Sanjay Rajopadhye. 2014. Folklore confirmed: Compiling for speed = compiling for energy. In Proceedings of LCPC. 169--184. Google ScholarCross Ref
Index Terms
- Static and Dynamic Frequency Scaling on Multicore CPUs
Recommendations
Voltage scaling and dark silicon in symmetric multicore processors
As technology scales further, multicore and many-core processors emerge as an alternative to keep up with performance demands. However, because of power and thermal constraints, we are obliged to power off remarkable area of chip. Many innovative ...
Optimizing total power of many-core processors considering voltage scaling limit and process variations
ISLPED '09: Proceedings of the 2009 ACM/IEEE international symposium on Low power electronics and designRecently, processor manufacturers have integrated more than a hundred cores in a single die to deliver extremely high throughput for highly-parallel, data-intensive applications like physics simulations, 3D-graphics, etc. Meanwhile, excessive power ...
Parallelism via Multithreaded and Multicore CPUs
Multicore and multithreaded CPUs have become the new approach to obtaining increases in CPU performance. Numeric applications mostly benefit from a large number of computationally powerful cores. Servers typically benefit more if chip circuitry is used ...
Comments