ABSTRACT
This paper presents an automated performance tuning solution, which partitions a program into a number of tuning sections and finds the best combination of compiler options for each section. Our solution builds on prior work on feedback-driven optimization, which tuned the whole program, instead of each section. Our key novel algorithm partitions a program into appropriate tuning sections. We also present the architecture of a system that automates the tuning process; it includes several pre-tuning steps that partition and instrument the program, followed by the actual tuning and the post-tuning assembly of the individually-optimized parts. Our system, called PEAK, achieves fast tuning speed by measuring a small number of invocations of each code section, instead of the whole-program execution time, as in common solutions. Compared to these solutions PEAK reduces tuning time from 2.19 hours to 5.85 minutes on average, while achieving similar program performance. PEAK improves the performance of SPEC CPU2000 FP benchmarks by 12% on average over GCC O3, the highest optimization level, on a Pentium IV machine.
- K. Chow and Y. Wu. Feedback-directed selection and characterization of compiler optimizations. In Second Workshop on Feedback Directed Optimizations, Israel, November 1999.Google Scholar
- K. D. Cooper, M. W. Hall, and K. Kennedy. A methodology for procedure cloning. Computer Languages, 19(2):105--117, 1993.Google ScholarDigital Library
- S. L. Graham, P. B. Kessler, and M. K. McKusick. gprof: a call graph execution profiler. In SIGPLAN Symposium on Compiler Construction, pages 120--126, 1982. Google ScholarDigital Library
- E. D. Granston and A. Holler. Automatic recommendation of compiler options. In 4th Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4), December 2001.Google Scholar
- A. Hedayat, N. Sloane, and J. Stufken. Orthogonal Arrays: Theory and Applications. Springer, 1999.Google Scholar
- T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, F. Bodin, and H. A. G. Wijshoff. A feasibility study in iterative compilation. In International Symposium on High Performance Computing (ISHPC'99), pages 121--132, 1999. Google ScholarDigital Library
- Z. Pan and R. Eigenmann. Rating compiler optimizations for automatic performance tuning. In SC2004: High Performance Computing, Networking and Storage Conference, page (10 pages), November 2004. Google ScholarDigital Library
- Z. Pan and R. Eigenmann. Fast and effective orchestration of compiler optimizations for automatic performance tuning. In The 4th Annual International Symposium on Code Generation and Optimization (CGO), page (12 pages), March 2006. Google ScholarDigital Library
- R. P. J. Pinkers, P. M. W. Knijnenburg, M. Haneda, and H. A. G. Wijshoff. Statistical selection of compiler options. In The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'04), pages 494--501, Volendam, The Netherlands, October 2004. Google ScholarDigital Library
- M. Stephenson, S. Amarasinghe, M. Martin, and U.-M. O'Reilly. Meta optimization: improving compiler heuristics with machine learning. In Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, pages 77--90. ACM Press, 2003. Google ScholarDigital Library
- S. Triantafyllis, M. Vachharajani, N. Vachharajani, and D. I. August. Compiler optimization-space exploration. In Proceedings of the international symposium on Code generation and optimization, pages 204--215, 2003. Google ScholarDigital Library
- R. C. Whaley and J. Dongarra. Automatically tuned linear algebra software. In SuperComputing 1998: High Performance Networking and Computing, 1998. Google ScholarDigital Library
Index Terms
- Fast, automatic, procedure-level performance tuning
Recommendations
PEAK—a fast and effective performance tuning system via compiler optimization orchestration
Compile-time optimizations generally improve program performance. Nevertheless, degradations caused by individual compiler optimization techniques are to be expected. Feedback-directed optimization orchestration systems generate optimized code versions ...
Bounds modelling and compiler optimizations for superscalar performance tuning
Special double issue on microprocessor architectureMulti-processor performance on the Tera MTA
SC '98: Proceedings of the 1998 ACM/IEEE conference on SupercomputingThe Tera MTA is a revolutionary commercial computer based on a multithreaded processor architecture. In contrast to many other parallel architectures, the Tera MTA can effectively use high amounts of parallelism on a single processor. By running ...
Comments