Abstract
This paper presents the results of our investigation of code positioning techniques using execution profile data as input into the compilation process. The primary objective of the positioning is to reduce the overhead of the instruction memory hierarchy.After initial investigation in the literature, we decided to implement two prototypes for the Hewlett-Packard Precision Architecture (PA-RISC). The first, built on top of the linker, positions code based on whole procedures. This prototype has the ability to move procedures into an order that is determined by a "closest is best" strategy.The second prototype, built on top of an existing optimizer package, positions code based on basic blocks within procedures. Groups of basic blocks that would be better as straight-line sequences are identified as chains. These chains are then ordered according to branch heuristics. Code that is never executed during the data collection runs can be physically separated from the primary code of a procedure by a technique we devised called procedure splitting.The algorithms we implemented are described through examples in this paper. The performance improvements from our work are also summarized in various tables and charts.
- ASSOCIATION FOR COMPUTING MACHINERY. Sigplan awards. http://www.acm.org/sigplan/awards.htm.Google Scholar
- BALA, V., DUESTERWALD, E., AND BANERJIA, S. Dynamo: a transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN'00 conference on Programming language design and implementation (June 2000), ACM Press, pp. 1--12. Google ScholarDigital Library
- BENITEZ, M. E., AND DAVIDSON, J. W. A portable global optimizer and linker. In Proceedings of the SIGPLAN'88 conference on Programming Language design and Implementation (June 1988), ACM Press, pp. 329--338. Google ScholarDigital Library
- COHN, R. S., GOODWIN, D. W., AND LOWNEY, P. G. Optimizing executables on windows NT with spike. Digital Technical Journal 9, 4 (1997), 3--20. Google ScholarDigital Library
- DUESTERWALD, E., AND BALA, V. Software profiling for hot path prediction: less is more. In Proceedings of the ninth international conference on Architectural support for programming languages and operating systems (November 2000), ACM Press, pp. 202--211. Google ScholarDigital Library
- HARTLEY, S. J. Compile-time program restructuring in multiprogrammed virtual memory systems. IEEE Transactions on Software Engineering 14, 11 (Nov. 1988), 1640--1644. Google ScholarDigital Library
- HATFIELD, D. J., AND GERALD, J. Program restructuring for virtual memory. IBM Systems Journal 10, 3 (1971), 169--192.Google ScholarDigital Library
- HWU, W. W., AND CHANG, P. P. Achieving high instruction cache performance with an optimizing compiler. In Proceedings of the 16th annual international symposium on Computer architecture (June 1989), ACM Press, pp. 242--251. Google ScholarDigital Library
- MCFARLING, S. Program optimization for instruction caches. In Proceedings of the third international conference on Architectural support for programming languages and operating systems (April 1989), ACM Press, pp. 183--191. Google ScholarDigital Library
- PETTIS, K. W., BAILEY, T. A., JAIN, A. K., AND DUBES, R. C. An intrinsic dimensionality estimator from near-neighbor information. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1 (1979).Google ScholarDigital Library
- PETTIS, K. W., AND BUZBEE, W. B. Hewlett-Packard Precision Architecture compiler performance. Hewlett-Packard Journal: technical information from the laboratories of Hewlett-Packard Company 38, 3 (Mar. 1987), 29--35.Google Scholar
- SHOWMAN, P. S., PETTIS, K. W., ARKIN, K. J., SPOELSTRA, J. A., PRICE, J., CULBERTSON, W. B., AND SHURTLEFF, JR., R. D. Applications software for the Touchscreen Personal Computer. Hewlett-Packard Journal: technical information from the laboratories of Hewlett-Packard Company 35, 8 (Aug. 1984), 15--24.Google Scholar
- SRIVASTAVA, A., EDWARDS, A., AND VOI, H. Vulcan binary transformation in a distributed environment. Tech. Rep. MSR-TR-2001-50, Microsoft Research, April 2001.Google Scholar
- UNG, D., AND CIFUENTES, C. Optimising hot paths in a dynamic binary translator. ACM SIGARCH Computer Architecture News 29, 1 (2001), 55--65. Google ScholarDigital Library
- WALL, D. W. Global register allocation at link time. In Proceedings of the 1986 SIGPLAN symposium on Compiler construction (June 1986), ACM Press, pp. 264--275. Google ScholarDigital Library
- WULF, W. A., AND MCKEE, S. A. Hitting the memory wall: Implications of the obvious. Computer Architecture News 23, 1 (Mar. 1995), 20--24. Google ScholarDigital Library
- {CMR88} Coutant, Meloy and Ruscetta, "DOC: A Practical Approach to Source-Level Debugging of Globally Optimized Code," Proceedings of SIGPLAN '88 Conference on Programming Language Design and Implementation, SIGPLAN Notices, Vol. 23, No. 7, July 1988, pp. 125--134. Google ScholarDigital Library
- {DV87} Davidson and Vaughan, "The Effect of Instruction Set Complexity on Program Size and Memory Performance," Second International Conference on Architectural Support for Programming Languages and Operating Systems, October 1987, pp. 60--63. Google ScholarCross Ref
- {Fer74} Ferrari, "Improving Locality by Critical Working Sets," CACM, Vol. 17, No. 11, November 1974, pp. 614--620. Google ScholarDigital Library
- {Fer76} Ferrari, "The Improvement of Program Behavior," Computer, Vol. 9, No. 11, November 1976, pp. 39--47.Google ScholarDigital Library
- {GKM82} Graham, Kessler and McKusick, "gprof: a Call Graph Execution Profiler," Proceedings of the SIGPLAN '82 Symposium on Compiler Construction, SIGPLAN Notices, Vol. 17, No. 6, June 1982, pp. 120--126. Google ScholarDigital Library
- {Har88} Hartley, "Compile-Time Program Restructuring in Multiprogrammed Virtual Memory Systems," IEEE Transactions on Software Engineering, Vol. 14, No. 11, November, 1988, pp. 1640--1644. Google ScholarDigital Library
- {HG71} Hatfield and Gerald, "Program Restructuring for Virtual Memory," IBM Systems Journal, Vol. 10, No. 3, 1971, pp. 168--192.Google ScholarDigital Library
- {HC89} Hwu and Chang, "Achieving High Instruction Cache Performance with an Optimizing Compiler," Proc. 16th Sym. on Computer Architecture, Jerusalem, Israel, May 1989, pp. 242--250. Google ScholarDigital Library
- {McF89} McFarling, "Program Optimization for Instruction Caches," Third International Conference on Architectural Support for Programming Languages and Operating Systems, April 1989, pp. 183--191. Google ScholarDigital Library
- {Sar89} Sarkar, "Determining Average Program Execution Times and their Variance," Proceedings of SIGPLAN '89 Conference on Programming Language Design and Implementation, SIGPLAN Notices, Vol. 24, No. 7, July 1989, pp. 298--312. Google ScholarDigital Library
Recommendations
Profile guided code positioning
PLDI '90: Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementationThis paper presents the results of our investigation of code positioning techniques using execution profile data as input into the compilation process. The primary objective of the positioning is to reduce the overhead of the instruction memory ...
Profile guided code positioning
This paper presents the results of our investigation of code positioning techniques using execution profile data as input into the compilation process. The primary objective of the positioning is to reduce the overhead of the instruction memory ...
Profile guided selection of ARM and thumb instructions
LCTES/SCOPES '02: Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systemsThe ARM processor core is a leading processor design for the embedded domain. In the embedded domain, both memory and energy are important concerns. For this reason the 32 bit ARM processor also supports the 16 bit Thumb instruction set. For a given ...
Comments