ABSTRACT
In this paper, we present a new framework for selecting, duplicating and sequencing instructions so as to decrease register pressure. The motivation for this work is to target current and future high-performance processors where reductions in register pressure in the compiled programs can lead to improved performance.
For instruction selection and duplication, a unique feature of our approach is the ability to perform these transformations on intermediate-language instructions in a general dependence graph that contains both true and non-true dependences, unlike past work that restricted their attention to a single expression tree or a single expression dag. For instruction sequencing, we present a new algorithm for reducing register pressure that is based on backwards scheduling
We present preliminary performance results to validate our approach. Our results show that register-sensitive instruction duplication can deliver significant speedups (up to 1.22x) for the SPECint95 benchmarks on an IA-32 processor. We also show that register-sensitive sequencing delivers smaller speedups (up to 1.12x) for the SPECjvm and Java Grande benchmarks on a PowerPC processor (when utilizing two-thirds of its registers). We expect to see more significant speedups due to register-sensitive sequencing on processors with fewer register than the PowerPC (such as the IA-32).
- 1.A.V. Aho, R. Sethi, and J.D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986.]] Google ScholarDigital Library
- 2.Randy Allen and Ken Kennedy. Automatic Translation of FORTRAN Programs to Vector Form. ACM Transactions on Programming Languages and Systems, 9(4):491-592, October 1987.]] Google ScholarDigital Library
- 3.Bowen Alpern et al. The Jalepeno virtual machine. IBM Systems Journal special issue on Java performance, 39(1), 2000. (See also http://www.research.ibm.com/jalapeno.)]] Google ScholarDigital Library
- 4.Matthew Arnold, David Grove, Michael Hind, Stephen Fink, and Peter F. Sweeney. Adaptive optimization in the Jalapeno JVM. In ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, October 2000.]] Google ScholarDigital Library
- 5.M. Auslander and M. Hopkins. An Overview of the PL.8 Compiler. Proceedings of the Sigplan '82 Symposium on Compiler Construction, 17(6):22-31, June 1982.]] Google ScholarDigital Library
- 6.D.A. Berson, R. Gupta, and M.L. Soffa. URSA: A Unified ReSource Allocator for Registers and Functional Units in VLIW Architectures. Proceedings of the IFIP WG 10.3 Working Conference onArchitectures and Compilations Techniques for Fine and Medium Grained Parallelism, pages 243-254, 1993.]] Google ScholarDigital Library
- 7.Michael G. Burke, Jong-Deok Choi, Stephen Fink, David Grove, Michael Hind, Vivek Sarkar, Mauricio J. Serrano, V. C. Sreedhar, Harini Srinivasan, and John Whaley. The Jalape no Dynamic Optimizing Compiler for Java. In ACM Java Grande Conference, June 1999.]] Google ScholarDigital Library
- 8.Steve Carr and Ken Kennedy. Scalar Replacement in the Presence of Conditional Control Flow. Software|Practice and Experience, (1):51-77, January 1994.]] Google ScholarDigital Library
- 9.Craig Chambers, Igor Pechtchanski, Vivek Sarkar, Mauricio J. Serrano, and Harini Srinivasan. Dependence analysis for Java. In 12th International Workshop on Languages and Compilers for Parallel Computing, August1999.]] Google ScholarDigital Library
- 10.Jong-Deok Choi, David Grove, Michael Hind, and Vivek Sarkar. Efficient and precise modeling of exceptions for the analysis of Java programs. In ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, September 1999.]] Google ScholarDigital Library
- 11.R.R. Henry C.W. Fraser and T.A. Proebsting. Burg | fast optimal instruction selection and tree parsing. In SIGPLAN '92 Conference on Programming Language Design and Implementation, 1992.]] Google ScholarDigital Library
- 12.Ron Cytron and Jeanne Ferrante. What's in a Name? Or the Value of Renaming for Parallelism Detection and Storage Allocation. Proceedings of the 1987 International Conference onParallel Processing, pages 19-27, August 1987.]]Google Scholar
- 13.Ron Cytron, Jeanne Ferrante, and Vivek Sarkar. Experiences Using Control Dependence in PTRAN. In Languages and compilers for parallel computing. Selection of papers of the 2nd workshop. Held Aug. 1-3, 1989 in Urbana, IL., Research Monographs in Parallel and Distributed Computing, pages 186-212. MIT Press, Cambridge, MA, 1990.]] Google ScholarDigital Library
- 14.Ron Cytron, Jim Lipkis, and Edith Schonberg. A Compiler-Assisted Approach to SPMD Execution. Supercomputing 90, November 1990.]] Google ScholarDigital Library
- 15.S.J. Eggers D.G. Bradlee and R.R. Henry. Integrating register allocation and instruction scheduling for riscs. In Fourth ACM International Conference onArchitectural Support for Programming Languages and Operating Systems, 1991.]] Google ScholarDigital Library
- 16.David A. Dunn and Wei-Chung Hsu. Instruction Scheduling for the HP PA-8000. Proceedings of MICRO-29, pages 298-307, December 1996.]] Google ScholarDigital Library
- 17.M. Anton Ertl. Optimal code selection in DAGs. In 26th Annual ACM SIGACT-SIGPLAN Symposium on the Principles of Programming Languages, January 1999.]] Google ScholarDigital Library
- 18.J. Ferrante, K. Ottenstein, and J. Warren. The Program Dependence Graph and its Use in Optimization. ACM Transactions on Programming Languages and Systems, 9(3):319-349, July 1987.]] Google ScholarDigital Library
- 19.Stephen Fink, Kathleen Knobe, and Vivek Sarkar. Unified analysis of array and object references in strongly typed languages. In Static Analysis Symposium (SAS'00), June 2000.]] Google ScholarDigital Library
- 20.Christopher Fraser and David Hanson. A Retargetable C Compiler: Design and Implementation. Addison-Wesley, 1995.]] Google ScholarDigital Library
- 21.Seth Copen Goldstein, Herman Schmit, Mihai Budiu, Srihari Cadambi, Matt Moe, and Reed Taylor. Baring it all to Software: Raw Processors. IEEE Computer, 33(4), April 2000.]]Google Scholar
- 22.J. Goodman and W. Hsu. Code Scheduling and Register Allocation in Large Basic Blocks. Proceedings of ACM Conference on Supercomputing, pages 442-452, 1988.]] Google ScholarDigital Library
- 23.Tim Lindholm and Frank Yellin. The Java Virtual Machine Specification. The Java Series. Addison-Wesley, 1996.]] Google ScholarDigital Library
- 24.Rajeev Motwani, Krishna V. Palem, Vivek Sarkar, and Salem Reyen. Combining Register Allocation and Instruction Scheduling (Technical Summary). Technical report, Courant Institute, New York University, July 1995. TR 698.]]Google Scholar
- 25.E. Pelegr -Llopart and S. L. Graham. Optimal code generation for expression trees: an application burs theory. In15th Annual ACM Symposium on the Principles of Programming Languages, pages 294-308, January 1988.]] Google ScholarDigital Library
- 26.S.S. Pinter. Register allocation with instruction scheduling: a new approach. In ACM SIGPLAN Conference onProgramming Language Design and Implementation, pages 248-257, 1993.]] Google ScholarDigital Library
- 27.R. Silvera, J. Wang, G.R. Gao, and R. Govindarajan. A Register Pressure Sensitive Instruction Scheduler for Dynamic Issue Processors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), October 1997.]] Google ScholarDigital Library
- 28.M.G. Valluri and R. Govindarajan. Evaluating Register Allocation and Instruction Scheduling Techniques in Out-of-Order Issue Processors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), October 1999.]] Google ScholarDigital Library
- 29.E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal. Baring it all to Software: Raw Processors. IEEE Computer, September 1997. Special issue on "Future Microprocessors - How to use a Billion Transistors".]] Google ScholarDigital Library
- 30.Michael J. Wolfe. Optimizing Supercompilers for Supercomputers. Pitman, London and The MIT Press, Cambridge, Massachusetts, 1989. In the series, Research Monographs in Parallel and Distributed Computing.]] Google ScholarDigital Library
- 31.Daniel Yellin. Speeding up dynamic transitive closure for bounded degree graphs. Acta Informatica, 30:369-384, 1993.]]Google ScholarDigital Library
Index Terms
- Register-sensitive selection, duplication, and sequencing of instructions
Recommendations
A Register Pressure Sensitive Instruction Scheduler for Dynamic Issue Processors
PACT '97: Proceedings of the 1997 International Conference on Parallel Architectures and Compilation TechniquesSeveral modern superscalar processors contain an out of order (OOO) instruction issue mechanism, which resolves dependencies between instructions to expose greater instruction level parallelism (ILP). How to extend a traditional instruction scheduler to ...
Dynamic coalescing for 16-bit instructions
In the embedded domain, memory usage and energy consumption are critical constraints.Embedded processors such as the ARM and MIPS provide a 16-bit instruction set, (called Thumb in the case of the ARM family of processors), in addition to the 32-bit ...
Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures
In this paper, we address the problem of generating an optimal instruction sequence S for a Directed Acyclic Graph (DAG), where S is optimal in terms of the number of registers used. We call this the Minimum Register Instruction Sequence (MRIS) problem. ...
Comments