article

Profile guided code positioning

Authors:
Karl Pettis

Hewlett-Packard Company, Cupertino, CA

Hewlett-Packard Company, Cupertino, CA
View Profile

,
Robert C. Hansen

Hewlett-Packard Company, Cupertino, CA

Hewlett-Packard Company, Cupertino, CA
View Profile

,
Jack W. Davidson

University of Virginia, Charlottesville, VA

University of Virginia, Charlottesville, VA
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 39 Issue 4April 2004pp 398–411https://doi.org/10.1145/989393.989433

Published:01 April 2004Publication History

ACM SIGPLAN Notices

Abstract

This paper presents the results of our investigation of code positioning techniques using execution profile data as input into the compilation process. The primary objective of the positioning is to reduce the overhead of the instruction memory hierarchy.After initial investigation in the literature, we decided to implement two prototypes for the Hewlett-Packard Precision Architecture (PA-RISC). The first, built on top of the linker, positions code based on whole procedures. This prototype has the ability to move procedures into an order that is determined by a "closest is best" strategy.The second prototype, built on top of an existing optimizer package, positions code based on basic blocks within procedures. Groups of basic blocks that would be better as straight-line sequences are identified as chains. These chains are then ordered according to branch heuristics. Code that is never executed during the data collection runs can be physically separated from the primary code of a procedure by a technique we devised called procedure splitting.The algorithms we implemented are described through examples in this paper. The performance improvements from our work are also summarized in various tables and charts.

References

ASSOCIATION FOR COMPUTING MACHINERY. Sigplan awards. http://www.acm.org/sigplan/awards.htm.Google Scholar
BALA, V., DUESTERWALD, E., AND BANERJIA, S. Dynamo: a transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN'00 conference on Programming language design and implementation (June 2000), ACM Press, pp. 1--12. Google ScholarDigital Library
BENITEZ, M. E., AND DAVIDSON, J. W. A portable global optimizer and linker. In Proceedings of the SIGPLAN'88 conference on Programming Language design and Implementation (June 1988), ACM Press, pp. 329--338. Google ScholarDigital Library
COHN, R. S., GOODWIN, D. W., AND LOWNEY, P. G. Optimizing executables on windows NT with spike. Digital Technical Journal 9, 4 (1997), 3--20. Google ScholarDigital Library
DUESTERWALD, E., AND BALA, V. Software profiling for hot path prediction: less is more. In Proceedings of the ninth international conference on Architectural support for programming languages and operating systems (November 2000), ACM Press, pp. 202--211. Google ScholarDigital Library
HARTLEY, S. J. Compile-time program restructuring in multiprogrammed virtual memory systems. IEEE Transactions on Software Engineering 14, 11 (Nov. 1988), 1640--1644. Google ScholarDigital Library
HATFIELD, D. J., AND GERALD, J. Program restructuring for virtual memory. IBM Systems Journal 10, 3 (1971), 169--192.Google ScholarDigital Library
HWU, W. W., AND CHANG, P. P. Achieving high instruction cache performance with an optimizing compiler. In Proceedings of the 16th annual international symposium on Computer architecture (June 1989), ACM Press, pp. 242--251. Google ScholarDigital Library
MCFARLING, S. Program optimization for instruction caches. In Proceedings of the third international conference on Architectural support for programming languages and operating systems (April 1989), ACM Press, pp. 183--191. Google ScholarDigital Library
PETTIS, K. W., BAILEY, T. A., JAIN, A. K., AND DUBES, R. C. An intrinsic dimensionality estimator from near-neighbor information. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1 (1979).Google ScholarDigital Library
PETTIS, K. W., AND BUZBEE, W. B. Hewlett-Packard Precision Architecture compiler performance. Hewlett-Packard Journal: technical information from the laboratories of Hewlett-Packard Company 38, 3 (Mar. 1987), 29--35.Google Scholar
SHOWMAN, P. S., PETTIS, K. W., ARKIN, K. J., SPOELSTRA, J. A., PRICE, J., CULBERTSON, W. B., AND SHURTLEFF, JR., R. D. Applications software for the Touchscreen Personal Computer. Hewlett-Packard Journal: technical information from the laboratories of Hewlett-Packard Company 35, 8 (Aug. 1984), 15--24.Google Scholar
SRIVASTAVA, A., EDWARDS, A., AND VOI, H. Vulcan binary transformation in a distributed environment. Tech. Rep. MSR-TR-2001-50, Microsoft Research, April 2001.Google Scholar
UNG, D., AND CIFUENTES, C. Optimising hot paths in a dynamic binary translator. ACM SIGARCH Computer Architecture News 29, 1 (2001), 55--65. Google ScholarDigital Library
WALL, D. W. Global register allocation at link time. In Proceedings of the 1986 SIGPLAN symposium on Compiler construction (June 1986), ACM Press, pp. 264--275. Google ScholarDigital Library
WULF, W. A., AND MCKEE, S. A. Hitting the memory wall: Implications of the obvious. Computer Architecture News 23, 1 (Mar. 1995), 20--24. Google ScholarDigital Library
{CMR88} Coutant, Meloy and Ruscetta, "DOC: A Practical Approach to Source-Level Debugging of Globally Optimized Code," Proceedings of SIGPLAN '88 Conference on Programming Language Design and Implementation, SIGPLAN Notices, Vol. 23, No. 7, July 1988, pp. 125--134. Google ScholarDigital Library
{DV87} Davidson and Vaughan, "The Effect of Instruction Set Complexity on Program Size and Memory Performance," Second International Conference on Architectural Support for Programming Languages and Operating Systems, October 1987, pp. 60--63. Google ScholarCross Ref
{Fer74} Ferrari, "Improving Locality by Critical Working Sets," CACM, Vol. 17, No. 11, November 1974, pp. 614--620. Google ScholarDigital Library
{Fer76} Ferrari, "The Improvement of Program Behavior," Computer, Vol. 9, No. 11, November 1976, pp. 39--47.Google ScholarDigital Library
{GKM82} Graham, Kessler and McKusick, "gprof: a Call Graph Execution Profiler," Proceedings of the SIGPLAN '82 Symposium on Compiler Construction, SIGPLAN Notices, Vol. 17, No. 6, June 1982, pp. 120--126. Google ScholarDigital Library
{Har88} Hartley, "Compile-Time Program Restructuring in Multiprogrammed Virtual Memory Systems," IEEE Transactions on Software Engineering, Vol. 14, No. 11, November, 1988, pp. 1640--1644. Google ScholarDigital Library
{HG71} Hatfield and Gerald, "Program Restructuring for Virtual Memory," IBM Systems Journal, Vol. 10, No. 3, 1971, pp. 168--192.Google ScholarDigital Library
{HC89} Hwu and Chang, "Achieving High Instruction Cache Performance with an Optimizing Compiler," Proc. 16th Sym. on Computer Architecture, Jerusalem, Israel, May 1989, pp. 242--250. Google ScholarDigital Library
{McF89} McFarling, "Program Optimization for Instruction Caches," Third International Conference on Architectural Support for Programming Languages and Operating Systems, April 1989, pp. 183--191. Google ScholarDigital Library
{Sar89} Sarkar, "Determining Average Program Execution Times and their Variance," Proceedings of SIGPLAN '89 Conference on Programming Language Design and Implementation, SIGPLAN Notices, Vol. 24, No. 7, July 1989, pp. 298--312. Google ScholarDigital Library

Recommendations

Profile guided code positioning
PLDI '90: Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation

This paper presents the results of our investigation of code positioning techniques using execution profile data as input into the compilation process. The primary objective of the positioning is to reduce the overhead of the instruction memory ...
Read More
Profile guided code positioning

This paper presents the results of our investigation of code positioning techniques using execution profile data as input into the compilation process. The primary objective of the positioning is to reduce the overhead of the instruction memory ...
Read More
Profile guided selection of ARM and thumb instructions
LCTES/SCOPES '02: Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems

The ARM processor core is a leading processor design for the embedded domain. In the embedded domain, both memory and energy are important concerns. For this reason the 32 bit ARM processor also supports the 16 bit Thumb instruction set. For a given ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 39, Issue 4
20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation 1979-1999: A Selection
April 2004
673 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/989393
Editor:
Kathryn S. McKinley
The University of Texas at Austin, USA
Issue’s Table of Contents
Copyright © 2004 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 April 2004
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 544
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Profile guided code positioning

ACM SIGPLAN Notices

Abstract

References

Cited By

Recommendations

Profile guided code positioning

Profile guided code positioning

Profile guided selection of ARM and thumb instructions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Profile guided code positioning

ACM SIGPLAN Notices

Abstract

References

Cited By

Recommendations

Profile guided code positioning

Profile guided code positioning

Profile guided selection of ARM and thumb instructions

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media