skip to main content
research-article
Free Access

Static and Dynamic Frequency Scaling on Multicore CPUs

Published:28 December 2016Publication History
Skip Abstract Section

Abstract

Dynamic Voltage and Frequency Scaling (DVFS) typically adapts CPU power consumption by modifying a processor’s operating frequency (and the associated voltage). Typical DVFS approaches include using default strategies such as running at the lowest or the highest frequency or reacting to the CPU’s runtime load to reduce or increase frequency based on the CPU usage. In this article, we argue that a compile-time approach to CPU frequency selection is achievable for affine program regions and can significantly outperform runtime-based approaches. We first propose a lightweight runtime approach that can exploit the properties of the power profile specific to a processor, outperforming classical Linux governors such as powersave or on-demand for computational kernels. We then demonstrate that, for affine kernels in the application, a purely compile-time approach to CPU frequency and core count selection is achievable, providing significant additional benefits over the runtime approach. Our framework relies on a one-time profiling of the target CPU, along with a compile-time categorization of loop-based code segments in the application. These are combined to determine at compile-time the frequency and the number of cores to use to execute each affine region to optimize energy or energy-delay product. Extensive evaluation on 60 benchmarks and 5 multi-core CPUs show that our approach systematically outperforms the powersave Linux governor while also improving overall performance.

References

  1. Brian Austin and Nicholas J. Wright. 2014. Measurement and interpretation of micro-benchmark and application energy use on the cray XC30. In Proceedings of E2SC. 51--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Wenlei Bao. 2014. Power-Aware WCET Analysis. Master’s thesis. Ohio State University.Google ScholarGoogle Scholar
  3. Wenlei Bao, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, and P. Sadayappan. 2016. PolyCheck: Dynamic verification of iteration space transformations on affine programs. In Proceedings of POPL. ACM, 539--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Wenlei Bao, Sanket Tavarageri, Fusun Ozguner, and P. Sadayappan. 2014. PWCET: Power-aware worst case execution time analysis. In Proceedings of ICPPW. IEEE, 439--447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cedric Bastoul. 2004. Code generation in the polyhedral model is easier than you think. In Proceedings of PACT. 7--16. Google ScholarGoogle ScholarCross RefCross Ref
  6. U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan. 2008. PLUTO: A practical and fully automatic polyhedral program optimization system. In Proceedings of PLDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Siddhartha Chatterjee, Erin Parker, Philip J. Hanlon, and Alvin R. Lebeck. 2001. Exact analysis of the cache behavior of nested loops. In Proceedings of PLDI. ACM, 286--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Karel De Vogeleer, Gerard Memmi, Pierre Jouvelot, and Fabien Coelho. 2014. The energy/frequency convexity rule: Modeling and experimental validation on mobile devices. In Parallel Processing and Applied Mathematics. Vol. 8384. Springer Berlin, 793--803. DOI:http://dx.doi.org/10.1007/978-3-642-55224-3_74 Google ScholarGoogle ScholarCross RefCross Ref
  9. Tahir Diop, Natalie Enright Jerger, and Jason Anderson. 2014. Power modeling for heterogeneous processors. In Proceedings of GPGPU. Google ScholarGoogle ScholarCross RefCross Ref
  10. Keith I. Farkas, Jason Flinn, Godmar Back, Dirk Grunwald, and Jennifer M. Anderson. 2000. Quantifying the energy consumption of a pocket computer and a Java virtual machine. ACM SIGMETRICS Performance Evaluation Review 28, 1 (2000), 252--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Feautrier. 1992. Some efficient solutions to the affine scheduling problem, part II: multidimensional time. International Journal of Parallel Programming 21, 6 (Dec. 1992), 389--420. Google ScholarGoogle ScholarCross RefCross Ref
  12. Jeanne Ferrante, Vivek Sarkar, and Wendy Thrash. 1991. On estimating and enhancing cache effectiveness. LCPC 589 (1991), 328--343.Google ScholarGoogle Scholar
  13. M. Floyd, B. Brock, M. Ware, K. Rajamani, A. Drake, C. Lefurgy, and L. Pesantez. 2010. Harnessing the adaptive energy management features of the power7 chip. HOT Chips 2010 (2010).Google ScholarGoogle Scholar
  14. Rong Ge, Xizhou Feng, Wu-chun Feng, and Kirk W. Cameron. 2007. CPU miser: A performance-directed, run-time system for power-aware clusters. In Proceedings of ICPP. 18--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rong Ge, Ryan Vogt, Jahangir Majumder, Arif Alam, Martin Burtscher, and Ziliang Zong. 2013. Effects of dynamic voltage and frequency scaling on a K20 GPU. In Proceedings of ICPP. 826--833. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Somnath Ghosh, Margaret Martonosi, and Sharad Malik. 1999. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Transactions on Programming Languages and Systems (TOPLAS) 21, 4 (1999), 703--746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sylvain Girbal, Nicolas Vasilache, Cédric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. 2006. Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies. International Journal of Parallel Programming 34, 3 (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Changwan Hong, Wenlei Bao, Albert Cohen, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, J. Ramanujam, and P. Sadayappan. 2016. Effective padding of multidimensional arrays to avoid cache conflict misses. In Proceedings of PLDI. ACM, 129--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Chung-Hsing Hsu and Ulrich Kremer. 2003. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of PLDI. ACM, 38--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Intel. Intel Math Kernel Library (Intel MKL). https://software.intel.com/en-us/intel-mkl.Google ScholarGoogle Scholar
  21. Intel. Intel Performance Counter Monitor. www.intel.com/software/pcm.Google ScholarGoogle Scholar
  22. Alexandra Jimborean, Konstantinos Koukos, Vasileios Spiliopoulos, David Black-Schaffer, and Stefanos Kaxiras. 2014. Fix the code. Don’t tweak the hardware: A new compiler approach to voltage-frequency scaling. In Proceedings of CGO. ACM, 262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jian Li and Jose F. Martinez. 2006. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In Proceedings of HPCA. 77--87.Google ScholarGoogle Scholar
  24. Jacob R. Lorch and Alan Jay Smith. 2001. Improving dynamic voltage scaling algorithms with PACE. In ACM SIGMETRICS Performance Evaluation Review, Vol. 29. ACM, 50--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. John D. McCalpin. 1991-2007. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical Report. University of Virginia, Charlottesville, Virginia. http://www.cs.virginia.edu/stream/ A continually updated technical report. Retrieved from http://www.cs.virginia.edu/stream/.Google ScholarGoogle Scholar
  26. Xinxin Mei, Ling Sing Yung, Kaiyong Zhao, and Xiaowen Chu. 2013. A measurement study of GPU DVFS on energy conservation. In Proceedings of Workshop on Power-Aware Computing and Systems. 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Netlib. Netlib BLAS. Retrieved from http://www.netlib.org/blas/index.html.Google ScholarGoogle Scholar
  28. OpenCV. OpenCV: Open Source Computer Vision Library. Retrieved from http://opencv.org.Google ScholarGoogle Scholar
  29. PoCC, the Polyhedral Compiler Collection, version 1.3. Retrieved from http://pocc.sourceforge.net.Google ScholarGoogle Scholar
  30. PolyBench/C 3.2. Retrieved from http://polybench.sourceforge.net.Google ScholarGoogle Scholar
  31. Louis-Noël Pouchet, Peng Zhang, P. Sadayappan, and Jason Cong. 2013. Polyhedral-based data reuse optimization for configurable computing. In Proceedings of FPGA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Saputra, M. Kandemir, and others. 2002. Energy-conscious compilation based on voltage scaling. In Proceedings of LCTES. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vivek Sarkar. 1997. Automatic selection of high order transformations in the IBM XL Fortran compilers. IBM Journal of Research 8 Development 41, 3 (May 1997).Google ScholarGoogle Scholar
  34. Markus Schordan, Pei-Hung Lin, Dan Quinlan, and Louis-Noel Pouchet. 2014. Verification of polyhedral optimizations with constant loop bounds in finite state space computations. In Leveraging Applications of Formal Methods, Verification and Validation. Specialized Techniques and Applications, Tiziana Margaria and Bernhard Steffen (Eds.). Lecture Notes in Computer Science, Vol. 8803. Springer Berlin Heidelberg, 493--508. DOI:http://dx.doi.org/10.1007/978-3-662-45231-8_41 Google ScholarGoogle ScholarCross RefCross Ref
  35. Kevin Skadron, Mircea R. Stan, Karthik Sankaranarayanan, Wei Huang, Sivakumar Velusamy, and David Tarjan. 2004. Temperature-aware microarchitecture: Modeling and implementation. ACM Transactions on Architecture and Code Optimization 1, 1 (March 2004), 94--125. DOI:http://dx.doi.org/10.1145/980152.980157 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Sanket Tavarageri and P. Sadayappan. 2013. A compiler analysis to determine useful cache size for energy efficiency. In Proceedings of IPDPSW. IEEE, 923--930. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sven Verdoolaege. 2010. isl: An integer set library for the polyhedral model. In Mathematical Software--ICMS 2010. Springer, 299--302.Google ScholarGoogle ScholarCross RefCross Ref
  38. Sven Verdoolaege, Gerda Janssens, and Maurice Bruynooghe. 2009. Equivalence checking of static affine programs using widening to handle recurrences. In Computer Aided Verification. Springer, 599--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Verdoolaege, R. Seghir, K. Beyls, V. Loechner, and M. Bruynooghe. 2007. Counting integer points in parametric polytopes using Barvinok’s rational functions. Algorithmica 48, 1 (June 2007), 37--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Daecheol You and K.-S. Chung. 2012. Dynamic voltage and frequency scaling framework for low-power embedded GPUs. Electronics Letters 48, 21 (2012), 1333--1334. Google ScholarGoogle ScholarCross RefCross Ref
  41. Tomofumi Yuki and Sanjay Rajopadhye. 2014. Folklore confirmed: Compiling for speed = compiling for energy. In Proceedings of LCPC. 169--184. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Static and Dynamic Frequency Scaling on Multicore CPUs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Architecture and Code Optimization
        ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 4
        December 2016
        648 pages
        ISSN:1544-3566
        EISSN:1544-3973
        DOI:10.1145/3012405
        Issue’s Table of Contents

        Copyright © 2016 ACM

        © 2016 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 December 2016
        • Accepted: 1 October 2016
        • Revised: 1 September 2016
        • Received: 1 June 2016
        Published in taco Volume 13, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader