skip to main content
research-article

Compiler Optimization for Reducing Leakage Power in Multithread BSP Programs

Published:18 November 2014Publication History
Skip Abstract Section

Abstract

Multithread programming is widely adopted in novel embedded system applications due to its high performance and flexibility. This article addresses compiler optimization for reducing the power consumption of multithread programs. A traditional compiler employs energy management techniques that analyze component usage in control-flow graphs with a focus on single-thread programs. In this environment the leakage power can be controlled by inserting on and off instructions based on component usage information generated by flow equations. However, these methods cannot be directly extended to a multithread environment due to concurrent execution issues.

This article presents a multithread power-gating framework composed of multithread power-gating analysis (MTPGA) and predicated power-gating (PPG) energy management mechanisms for reducing the leakage power when executing multithread programs on simultaneous multithreading (SMT) machines. Our multithread programming model is based on hierarchical bulk-synchronous parallel (BSP) models. Based on a multithread component analysis with dataflow equations, our MTPGA framework estimates the energy usage of multithread programs and inserts PPG operations as power controls for energy management. We performed experiments by incorporating our power optimization framework into SUIF compiler tools and by simulating the energy consumption with a post-estimated SMT simulator based on Wattch toolkits. The experimental results show that the total energy consumption of a system with PPG support and our power optimization method is reduced by an average of 10.09% for BSP programs relative to a system without a power-gating mechanism on leakage contribution set to 30%; and the total energy consumption is reduced by an average of 4.27% on leakage contribution set to 10%. The results demonstrate our mechanisms are effective in reducing the leakage energy of BSP multithread programs.

References

  1. R. Barik. 2005. Efficient computation of may-happen-in-parallel information for concurrent Java programs. In Proceedings of the 18th International Conference on Languages and Compilers for Parallel Computing (LCPC'05). Lecture Notes in Computer Science, vol. 4339, Springer, 152--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Bellas, I. N. Hajj, and C. D. Polychronopoulos. 2000. Architectural and compiler techniques for energy reduction in high-performance microprocessors. IEEE Trans. VLSI 8, 3, 317--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. H. Bisseling. 2004. Parallel Scientific Computation: A Structured Approach using BSP and MPI. Oxford University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. A. Butts and G. S. Sohi. 2000. A static power model for architects. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'00). 191--201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Callahan and J. Sublok. 1989. Static analysis of low level synchronization. In Proceedings of the ACM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging (PADD'89). 100--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Cha and D. Lee. 2001. H-BSP: A hierarchical bsp computation model. J. Supercomput. 18, 2, 179--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. 1992. Low-power cmos digital design. IEEE J. Solid-State Circ. 27, 4, 473--484.Google ScholarGoogle ScholarCross RefCross Ref
  8. J.-M. Chang and M. Pedram. 1995. Register allocation and binding for low power. In Proceedings of the Design Automation Conference (DAC'95). 29--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Cordeiro, G. Mounie, S. Perarnau, D. Trystram, J.-M. Vincent, and F. Wagner. 2010. Random graph generation for scheduling simulations. In Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques (SIMUTools'10). 60:1--60:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Dropsho, V. Kursun, D. H. Albonesi, S. Dwarkadas, and E. G. Friedman. 2002. Managing static leakage energy in microprocessor functional units. In Proceedings of the 35th International Symposium on Microarchitecture (MICRO'02). 321--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Duesterwald and M. L. Soffa. 1991. Concurrency analysis in the presence of procedures using a data-flow framework. In Proceedings of the Symposium on Testing, Analysis, and Verification (TAV'91). 36--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Goodacre. 2011. Understanding what those 250 million transistors are doing. In Proceedings of the 11th International Forum on Embedded MPSoC and Multicore (MPSoC'11).Google ScholarGoogle Scholar
  13. M. Horowitz, T. Indermaur, and R. Gonzalez. 1994. Low-power digital design. In Proceedings of the IEEE Symposium on Low Power Electronics. 8--11.Google ScholarGoogle Scholar
  14. P. Y. T. Hsu and E. S. Davidson. 1986. Highly concurrent scalar processing. In Proceedings of the 13th Annual International Symposium on Computer Architecture (ISCA'86). 386--395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose. 2004. Microarchitectural techniques for power gating of execution units. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'04). 32--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. T. Kao and A. P. Chandrakasan. 2000. Dual-threshold voltage techniques for low-power digital circuits. IEEE J. Solid-State Circ. 35, 7, 1009--1018.Google ScholarGoogle ScholarCross RefCross Ref
  17. C. W. Kessler. 2000. NestStep: Nested parallelism and virtual shared memory for the bsp model. J. Supercomput. 17, 3, 245--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Lee, J. K. Lee, T.-T. Hwang, and S.-C. Tsai. 2003. Compiler optimizations on vliw instruction scheduling for low power. ACM Trans. Des. Autom. Electron. Syst. 8, 2, 252--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Lee, J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T. T. Dao, Y. Cho, S. J. Seo, S. H. Lee, S. M. Cho, H. J. Song, S.-B. Suh, and J.-D. Choi. 2010. An opencl framework for heterogeneous multicores with local memory. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). ACM Press, New York, 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Lee, J. M. Youn, D. Cho, and Y. Paek. 2013. Reducing instruction bit-width for low-power vliw architectures. ACM Trans. Des. Autom. Electron. Syst. 18, 2, 25:1--25:32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. T.-C. Lee, V. Tiwari, S. Malik, and M. Fujita. 1997. Power analysis and minimization techniques for embedded dsp software. IEEE Trans. VLSI Syst. 5, 1, 123--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Li and C. Verbrugge. 2004. A practical mhp information analysis for concurrent java programs. In Proceedings of the 17th International Conference on Languages and Compilers for Parallel Computing (LCPC'04). Lecture Notes in Computer Science, vol. 3602, Springer, 194--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Li and J. Xue. 2004. A trace-based binary compilation framework for energy-aware computing. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'04). ACM Press, New York, 95--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. P. Masticola and B. G. Ryder. 1993. Non-concurrency analysis. In Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'93). 129--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. F. McColl. 1996. Universal computing. In Proceedings of the 2nd International Euro-Par Conference on Parallel Processing (Euro-Par'96). Lecture Notes in Computer Science, vol. 1123, Springer, 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Naumovich and G. S. Avrunin. 1998. A conservative data flow algorithm for detecting all pairs of statements that may happen in parallel for rendezvous-based concurrent programs. In Proceedings of the 6th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE'98). 24--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Naumovich, G. S. Avrunin, and L. A. Clarke. 1999. An efficient algorithm for computing mhp information for concurrent java programs. In Proceedings of the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE'99). Lecture Notes in Computer Science, vol. 1687, Springer, 338--354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Ramalingam. 2000. Context-sensitive synchronization-sensitive analysis is undecidable. ACM Trans. Program. Lang. Syst. 22, 2, 416--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Rele, S. Pande, S. Onder, and R. Gupta. 2002. Optimizing static power dissipation by functional units in superscalar processors. In Proceedings of the 11th International Conference on Compiler Construction (CC'02). 261--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Rusu, S. Tam, H. Muljono, D. Ayers, J. Chang, B. Cherkauer, J. Stinson, J. Benoit, R. Varada, J. Leung, et al. 2007. A 65-nm dual-core multithreaded xeon® processor with 16-mb l3 cache. IEEE J. Solid-State Circ. 42, 1, 17--25.Google ScholarGoogle ScholarCross RefCross Ref
  31. C.-L. Su and A. M. Despain. 1995. Cache designs for energy efficiency. In Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS'95). 306--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. N. Taylor. 1983. Complexity of analyzing the synchronization structure of concurrent programs. Acta Informatica 19, 57--84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. V. Tiwari, R. Donnelly, S. Malik, and R. Gonzalez. 1997. Dynamic power management for microprocessors: A case study. In Proceedings of the International Conference on VLSI Design (VLSID'97). 185--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, and F. Baez. 1998. Reducing power in high-performance microprocessors. In Proceedings of the 35th Annual Design Automation Conference (DAC'98). 732--737. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. D. L. Torre and C. P. Kruskal. 1996. Submachine locality in the bulk synchronous setting (extended abstract). In Proceedings of the 2nd International Euro-Par Conference on Parallel Processing (Euro-Par'96). Vol. 2. Springer, 352--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. L. G. Valiant. 1990. A bridging model for parallel computation. Comm. ACM 33, 8, 103--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. L. G. Valiant. 2008. A bridging model for multi-core computing. In Proceedings of the 16th Annual European Symposium on Algorithms (ESA'08). 13--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. L. G. Valiant. 2011. A bridging model for multi-core computing. J. Comput. Syst. Sci. 77, 1, 154--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. H. Yang, R. Govindarajan, G. R. Gao, G. Cai, and Z. Hu. 2002. Exploiting schedule slacks for rate-optimal power-minimum software pipelining. In Proceedings of the 3rd Workshop on Compilers and Operating Systems for Low Power (COLP'02).Google ScholarGoogle Scholar
  40. Y.-P. You, C.-W. Huang, and J. K. Lee. 2005. A sink-n-hoist framework for leakage power reduction. In Proceedings of the ACM International Conference on Embedded Software (EMSOFT'05). 83--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y.-P. You, C.-W. Huang, and J. K. Lee. 2007. Compilation for compact power-gating controls. ACM Trans. Des. Autom. Electron. Syst. 12, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Y.-P. You, C. Lee, and J. K. Lee. 2002. Compiler analysis and supports for leakage power reduction on microprocessors. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC'02). Lecture Notes in Computer Science, vol. 2481, Springer, 63--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Y.-P. You, C. Lee, and J. K. Lee. 2006. Compilers for leakage power reduction. ACM Trans. Des. Autom. Electron. Syst. 11, 1, 147--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. W. Zhang, M. T. Kandemir, N. Vijaykrishnan, M. J. Irwin, and V. De. 2003. Compiler support for reducing leakage energy consumption. In Proceedings of the 6th Design Automation and Test in Europe Conference (DATE'03). 1146--1147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. V. Zivojnovic, J. M. Velarde, and C. Schlager. 1994. DSPstone: A dsp-oriented benchmarking methodology. In Proceedings of 5th International Conference on Signal Processing Applications and Technology.Google ScholarGoogle Scholar

Index Terms

  1. Compiler Optimization for Reducing Leakage Power in Multithread BSP Programs

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Design Automation of Electronic Systems
            ACM Transactions on Design Automation of Electronic Systems  Volume 20, Issue 1
            November 2014
            377 pages
            ISSN:1084-4309
            EISSN:1557-7309
            DOI:10.1145/2690851
            Issue’s Table of Contents

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 18 November 2014
            • Accepted: 1 September 2014
            • Revised: 1 August 2014
            • Received: 1 October 2013
            Published in todaes Volume 20, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader