Abstract
Multithread programming is widely adopted in novel embedded system applications due to its high performance and flexibility. This article addresses compiler optimization for reducing the power consumption of multithread programs. A traditional compiler employs energy management techniques that analyze component usage in control-flow graphs with a focus on single-thread programs. In this environment the leakage power can be controlled by inserting on and off instructions based on component usage information generated by flow equations. However, these methods cannot be directly extended to a multithread environment due to concurrent execution issues.
This article presents a multithread power-gating framework composed of multithread power-gating analysis (MTPGA) and predicated power-gating (PPG) energy management mechanisms for reducing the leakage power when executing multithread programs on simultaneous multithreading (SMT) machines. Our multithread programming model is based on hierarchical bulk-synchronous parallel (BSP) models. Based on a multithread component analysis with dataflow equations, our MTPGA framework estimates the energy usage of multithread programs and inserts PPG operations as power controls for energy management. We performed experiments by incorporating our power optimization framework into SUIF compiler tools and by simulating the energy consumption with a post-estimated SMT simulator based on Wattch toolkits. The experimental results show that the total energy consumption of a system with PPG support and our power optimization method is reduced by an average of 10.09% for BSP programs relative to a system without a power-gating mechanism on leakage contribution set to 30%; and the total energy consumption is reduced by an average of 4.27% on leakage contribution set to 10%. The results demonstrate our mechanisms are effective in reducing the leakage energy of BSP multithread programs.
- R. Barik. 2005. Efficient computation of may-happen-in-parallel information for concurrent Java programs. In Proceedings of the 18th International Conference on Languages and Compilers for Parallel Computing (LCPC'05). Lecture Notes in Computer Science, vol. 4339, Springer, 152--169. Google ScholarDigital Library
- N. Bellas, I. N. Hajj, and C. D. Polychronopoulos. 2000. Architectural and compiler techniques for energy reduction in high-performance microprocessors. IEEE Trans. VLSI 8, 3, 317--326. Google ScholarDigital Library
- R. H. Bisseling. 2004. Parallel Scientific Computation: A Structured Approach using BSP and MPI. Oxford University Press. Google ScholarDigital Library
- J. A. Butts and G. S. Sohi. 2000. A static power model for architects. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'00). 191--201. Google ScholarDigital Library
- D. Callahan and J. Sublok. 1989. Static analysis of low level synchronization. In Proceedings of the ACM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging (PADD'89). 100--111. Google ScholarDigital Library
- H. Cha and D. Lee. 2001. H-BSP: A hierarchical bsp computation model. J. Supercomput. 18, 2, 179--200. Google ScholarDigital Library
- A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. 1992. Low-power cmos digital design. IEEE J. Solid-State Circ. 27, 4, 473--484.Google ScholarCross Ref
- J.-M. Chang and M. Pedram. 1995. Register allocation and binding for low power. In Proceedings of the Design Automation Conference (DAC'95). 29--35. Google ScholarDigital Library
- D. Cordeiro, G. Mounie, S. Perarnau, D. Trystram, J.-M. Vincent, and F. Wagner. 2010. Random graph generation for scheduling simulations. In Proceedings of the 3rd International ICST Conference on Simulation Tools and Techniques (SIMUTools'10). 60:1--60:10. Google ScholarDigital Library
- S. Dropsho, V. Kursun, D. H. Albonesi, S. Dwarkadas, and E. G. Friedman. 2002. Managing static leakage energy in microprocessor functional units. In Proceedings of the 35th International Symposium on Microarchitecture (MICRO'02). 321--332. Google ScholarDigital Library
- E. Duesterwald and M. L. Soffa. 1991. Concurrency analysis in the presence of procedures using a data-flow framework. In Proceedings of the Symposium on Testing, Analysis, and Verification (TAV'91). 36--48. Google ScholarDigital Library
- J. Goodacre. 2011. Understanding what those 250 million transistors are doing. In Proceedings of the 11th International Forum on Embedded MPSoC and Multicore (MPSoC'11).Google Scholar
- M. Horowitz, T. Indermaur, and R. Gonzalez. 1994. Low-power digital design. In Proceedings of the IEEE Symposium on Low Power Electronics. 8--11.Google Scholar
- P. Y. T. Hsu and E. S. Davidson. 1986. Highly concurrent scalar processing. In Proceedings of the 13th Annual International Symposium on Computer Architecture (ISCA'86). 386--395. Google ScholarDigital Library
- Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose. 2004. Microarchitectural techniques for power gating of execution units. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'04). 32--37. Google ScholarDigital Library
- J. T. Kao and A. P. Chandrakasan. 2000. Dual-threshold voltage techniques for low-power digital circuits. IEEE J. Solid-State Circ. 35, 7, 1009--1018.Google ScholarCross Ref
- C. W. Kessler. 2000. NestStep: Nested parallelism and virtual shared memory for the bsp model. J. Supercomput. 17, 3, 245--262. Google ScholarDigital Library
- C. Lee, J. K. Lee, T.-T. Hwang, and S.-C. Tsai. 2003. Compiler optimizations on vliw instruction scheduling for low power. ACM Trans. Des. Autom. Electron. Syst. 8, 2, 252--268. Google ScholarDigital Library
- J. Lee, J. Kim, S. Seo, S. Kim, J. Park, H. Kim, T. T. Dao, Y. Cho, S. J. Seo, S. H. Lee, S. M. Cho, H. J. Song, S.-B. Suh, and J.-D. Choi. 2010. An opencl framework for heterogeneous multicores with local memory. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). ACM Press, New York, 193--204. Google ScholarDigital Library
- J. Lee, J. M. Youn, D. Cho, and Y. Paek. 2013. Reducing instruction bit-width for low-power vliw architectures. ACM Trans. Des. Autom. Electron. Syst. 18, 2, 25:1--25:32. Google ScholarDigital Library
- M. T.-C. Lee, V. Tiwari, S. Malik, and M. Fujita. 1997. Power analysis and minimization techniques for embedded dsp software. IEEE Trans. VLSI Syst. 5, 1, 123--133. Google ScholarDigital Library
- L. Li and C. Verbrugge. 2004. A practical mhp information analysis for concurrent java programs. In Proceedings of the 17th International Conference on Languages and Compilers for Parallel Computing (LCPC'04). Lecture Notes in Computer Science, vol. 3602, Springer, 194--208. Google ScholarDigital Library
- L. Li and J. Xue. 2004. A trace-based binary compilation framework for energy-aware computing. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'04). ACM Press, New York, 95--106. Google ScholarDigital Library
- S. P. Masticola and B. G. Ryder. 1993. Non-concurrency analysis. In Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'93). 129--138. Google ScholarDigital Library
- W. F. McColl. 1996. Universal computing. In Proceedings of the 2nd International Euro-Par Conference on Parallel Processing (Euro-Par'96). Lecture Notes in Computer Science, vol. 1123, Springer, 25--36. Google ScholarDigital Library
- G. Naumovich and G. S. Avrunin. 1998. A conservative data flow algorithm for detecting all pairs of statements that may happen in parallel for rendezvous-based concurrent programs. In Proceedings of the 6th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE'98). 24--34. Google ScholarDigital Library
- G. Naumovich, G. S. Avrunin, and L. A. Clarke. 1999. An efficient algorithm for computing mhp information for concurrent java programs. In Proceedings of the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE'99). Lecture Notes in Computer Science, vol. 1687, Springer, 338--354. Google ScholarDigital Library
- G. Ramalingam. 2000. Context-sensitive synchronization-sensitive analysis is undecidable. ACM Trans. Program. Lang. Syst. 22, 2, 416--430. Google ScholarDigital Library
- S. Rele, S. Pande, S. Onder, and R. Gupta. 2002. Optimizing static power dissipation by functional units in superscalar processors. In Proceedings of the 11th International Conference on Compiler Construction (CC'02). 261--275. Google ScholarDigital Library
- S. Rusu, S. Tam, H. Muljono, D. Ayers, J. Chang, B. Cherkauer, J. Stinson, J. Benoit, R. Varada, J. Leung, et al. 2007. A 65-nm dual-core multithreaded xeon® processor with 16-mb l3 cache. IEEE J. Solid-State Circ. 42, 1, 17--25.Google ScholarCross Ref
- C.-L. Su and A. M. Despain. 1995. Cache designs for energy efficiency. In Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS'95). 306--315. Google ScholarDigital Library
- R. N. Taylor. 1983. Complexity of analyzing the synchronization structure of concurrent programs. Acta Informatica 19, 57--84.Google ScholarDigital Library
- V. Tiwari, R. Donnelly, S. Malik, and R. Gonzalez. 1997. Dynamic power management for microprocessors: A case study. In Proceedings of the International Conference on VLSI Design (VLSID'97). 185--192. Google ScholarDigital Library
- V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, and F. Baez. 1998. Reducing power in high-performance microprocessors. In Proceedings of the 35th Annual Design Automation Conference (DAC'98). 732--737. Google ScholarDigital Library
- P. D. L. Torre and C. P. Kruskal. 1996. Submachine locality in the bulk synchronous setting (extended abstract). In Proceedings of the 2nd International Euro-Par Conference on Parallel Processing (Euro-Par'96). Vol. 2. Springer, 352--358. Google ScholarDigital Library
- L. G. Valiant. 1990. A bridging model for parallel computation. Comm. ACM 33, 8, 103--111. Google ScholarDigital Library
- L. G. Valiant. 2008. A bridging model for multi-core computing. In Proceedings of the 16th Annual European Symposium on Algorithms (ESA'08). 13--28. Google ScholarDigital Library
- L. G. Valiant. 2011. A bridging model for multi-core computing. J. Comput. Syst. Sci. 77, 1, 154--166. Google ScholarDigital Library
- H. Yang, R. Govindarajan, G. R. Gao, G. Cai, and Z. Hu. 2002. Exploiting schedule slacks for rate-optimal power-minimum software pipelining. In Proceedings of the 3rd Workshop on Compilers and Operating Systems for Low Power (COLP'02).Google Scholar
- Y.-P. You, C.-W. Huang, and J. K. Lee. 2005. A sink-n-hoist framework for leakage power reduction. In Proceedings of the ACM International Conference on Embedded Software (EMSOFT'05). 83--94. Google ScholarDigital Library
- Y.-P. You, C.-W. Huang, and J. K. Lee. 2007. Compilation for compact power-gating controls. ACM Trans. Des. Autom. Electron. Syst. 12, 4. Google ScholarDigital Library
- Y.-P. You, C. Lee, and J. K. Lee. 2002. Compiler analysis and supports for leakage power reduction on microprocessors. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (LCPC'02). Lecture Notes in Computer Science, vol. 2481, Springer, 63--73. Google ScholarDigital Library
- Y.-P. You, C. Lee, and J. K. Lee. 2006. Compilers for leakage power reduction. ACM Trans. Des. Autom. Electron. Syst. 11, 1, 147--164. Google ScholarDigital Library
- W. Zhang, M. T. Kandemir, N. Vijaykrishnan, M. J. Irwin, and V. De. 2003. Compiler support for reducing leakage energy consumption. In Proceedings of the 6th Design Automation and Test in Europe Conference (DATE'03). 1146--1147. Google ScholarDigital Library
- V. Zivojnovic, J. M. Velarde, and C. Schlager. 1994. DSPstone: A dsp-oriented benchmarking methodology. In Proceedings of 5th International Conference on Signal Processing Applications and Technology.Google Scholar
Index Terms
Compiler Optimization for Reducing Leakage Power in Multithread BSP Programs
Recommendations
A sink-n-hoist framework for leakage power reduction
EMSOFT '05: Proceedings of the 5th ACM international conference on Embedded softwarePower leakage constitutes an increasing fraction of the total power consumption in modern semiconductor technologies. Recent research efforts have tried to integrate architecture and compiler solutions to employ power-gating mechanisms to reduce leakage ...
Compilers for leakage power reduction
Power leakage constitutes an increasing fraction of the total power consumption in modern semiconductor technologies. Recent research efforts indicate that architectures, compilers, and software can be optimized so as to reduce the switching power (also ...
Compilation for compact power-gating controls
Power leakage constitutes an increasing fraction of the total power consumption in modern semiconductor technologies due to the continuing size reductions and increasing speeds of transistors. Recent studies have attempted to reduce leakage power using ...
Comments