Abstract
As part of an effort to develop an optimizing compiler for a pipelined architecture, a code reorganization algorithm has been developed that significantly reduces the number of runtime pipeline interlocks. In a pass after code generation, the algorithm uses a dag representation to heuristically schedule the instructions in each basic block.Previous algorithms for reducing pipeline interlocks have had worst-case runtimes of at least O(n4). By using a dag representation which prevents scheduling deadlocks and a selection method that requires no lookahead, the resulting algorithm reorganizes instructions almost as effectively in practice, while having an O(n2) worst-case runtime.
- D. Bernstein, D. Cohen, Y. Lavon, and V. Rainish. Performance evaluation of instruction scheduling on the IBM RISC System/6000. In Proc. of MICRO-25, pages 226--235, Portland, OR, December 1992. Google ScholarDigital Library
- D. G. Bradlee, S. J. Eggers, and R. R. Henry. Integrated register allocation and instruction scheduling for RISCs. In Proc. of 4th Intl. Conf. on Arch. Supp. For Prog. Lang. and Oper. Syst., pages 121--131, Santa Clara, CA, April 1991. Google ScholarDigital Library
- K. Ebcioglu, R. D. Groves, K.-C. Kim, G. M. Silberman, and I. Ziv. VLIW compilation techniques in a superscalar environment. In Proc. of ACM '94 Conf. on PLDI, pages 36--48, Orlando, FL, June 1994. Google ScholarDigital Library
- J. A. Fisher. Trace scheduling: a technique for global microcode compaction. IEEE Trans. on Computers, C-30(7):478--490, July 1981.Google ScholarDigital Library
- M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, San Francisco, CA, 1979. Google ScholarDigital Library
- P. B. Gibbons and S. S. Muchnick. Efficient instruction scheduling for a pipelined architecture. In Proc. of SIGPLAN '86 Conf. on Comp. Constr., pages 11--16, Palo Alto, CA, June 1986. Google ScholarDigital Library
- J. L. Hennessy and T. R. Gross. Code generation and reorganization in the presence of pipeline constraints. In Conf. Recd. of 9th Annual ACM Symp. on Princ. of Prog. Lang., pages 120--127, Albuquerque, NM, January 1982. Google ScholarDigital Library
- J. L. Hennessy and T. R. Gross. Postpass code optimization of pipeline constraints. ACM TOPLAS, 5(3):422--448, July 1983. Google ScholarDigital Library
- R. A. Huff. Lifetime-sensitive modulo scheduling. In Proc. of ACM '93 Conf. on PLDI, pages 258--267, Albuquerque, NM, June 1993. Google ScholarDigital Library
- W.-M. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Cheng, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superbblock: an effective technique for VLIW and superscalar compilation. J. Supercomp., pages 229--248, July 1993. Google ScholarDigital Library
- M. S. Johnson and T. C. Miller. Effectiveness of a machine-level, global optimizer. In Proc. of SIGPLAN '86 Conf. on Comp. Constr., pages 109--117, Palo Alto, CA, June 1986. Google ScholarDigital Library
- G. Kane. PA-RISC Architecture. Prentice Hall PTR, Upper Saddle River, NJ, 1995. Google ScholarDigital Library
- D. R. Kerns and S. J. Eggers. Balanced scheduling: instruction scheduling when memory latency is uncertain. In Proc. of ACM '93 Conf. on PLDI, pages 278--289, Albuquerque, NM, June 1993. Google ScholarDigital Library
- M. S. Lam. Instruction scheduling for superscalar architectures. In J. F. Traub, editor, Annl. Revw. of Comp. Sci., volume 4, pages 173--201. Annual Reviews, Inc., Palo Alto, CA, 1990.Google Scholar
- S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco, CA, 1997. Google ScholarDigital Library
- A. Nicolau. A fine-grain parallelizing compiler. Technical Report TR-86-792, Dept. of Comp. Sci., Cornell Univ., Ithaca, NY, December 1986. Google ScholarDigital Library
- S. S. Pinter. Register allocation with instruction scheduling. In Proc. of ACM '93 Conf. on PLDI, pages 248--257, Albuquerque, NM, June 1993. Google ScholarDigital Library
- T. A. Proebsting. and C. A. Fischer. Linear-time, optimal code scheduling for delayed-load architectures. In Proc. of ACM '91 Conf. on PLDI, pages 256--267, Toronto, ON, June 1991. Google ScholarDigital Library
- R. L. Sites. Instruction scheduling for the Cray-1 computer. Technical Report 78-CS-023, Univ. of Calif., San Diego, CA, July 1978.Google Scholar
- M. Smotherman, S. Krishnamurthy, P. S. Aravind, and D. Hunnicutt. Efficient DAG construction and heuristic calculation for instruction scheduling. In Proc. of 24th Annual Intl. Symp. on Microarch., pages 93--102, Albuquerque, NM, November 1991. Google ScholarDigital Library
- H.S. Warren. Instruction scheduling for the IBM RISC System/6000. IBM J. of Res. and Devt., 34(1):85--92, January 1990. Google ScholarDigital Library
- K. Wilken, J. Liu, and M. Heffernan. Optimal instruction scheduling using integer programming. In Proc. of ACM 2000 Conf. on PLDI, pages 121--133, Vancouver, BC, June 2000. Google ScholarDigital Library
- {Ary83} Arya, S. Optimal Instruction Scheduling for a Class of Vector Processors: An Integer Programming Approach. Tech. Rept. CRL-TR-19-83, Computer Research Laboratory, the Univ. of Michigan, Ann Arbor, April 1983.Google Scholar
- {Aus82} Auslander, M. & M. Hopkins. An Overview of the PL.8 Compiler. Proc. ACM SIGPLAN Symp. on Compiler Construction, Boston, June 1982, pp. 22--31. Google ScholarDigital Library
- {Con67} Conway, R. W., W. L. Maxwell & L. W. Miller, Theory of Scheduling, Addison-Wesley, Reading, MA, 1967.Google Scholar
- {Cou86} Coutant, D. S. Retargetable High-Level Alias Analysis, Proc. ACM Symp. on Princ. of Prog. Lang., St. Petersburg Beach, FL, January 1986, pp. 110--118. Google ScholarDigital Library
- {Dav81} Davidson, S., D. Landskov, B. D. Shriver & P. W. Mallett. Some Experiments in Local Microcode Compaction for Horizontal Machines. IEEE Trans. on Computers, Vol. C-30, No. 7, July 1981, pp. 460--477.Google Scholar
- {Fis81} Fisher, J. A. Trace Scheduling: A Technique for Global Microcode Compaction. IEEE Trans. on Computers, Vol. C-30, No. 7, July 1981, pp. 478--490.Google Scholar
- {Gro83} Gross, T. R. Code Optimization of Pipeline Constraints. Tech. Rept. 83-255, Computer Systems Lab., Stanford Univ., Dec. 1983.Google Scholar
- {Hen81} Hennessy, J. L. Symbolic Debugging of Optimized Code, ACM Trans. on Prog. Lang. and Sys., Vol. 3, No. 1, Jan. 1981, pp. 200--206.Google Scholar
- {Hen83} Hennessy, J. L. & T. R. Gross. Postpass Code Optimization of Pipeline Constraints. ACM Trans. on Prog. Lang. and Sys, Vol. 5, No. 3, July 1983, pp. 422--448. Google ScholarDigital Library
- {Joh86} Johnson, M. S. & T. C. Miller. Effectiveness of a Machine-Level, Global Optimizer, Proc. of the SIGPLAN '86 Conf. on Comp. Constr., June 1986. Google ScholarDigital Library
- {Knu68} Knuth, D. E. Fundamental Algorithms, Addison-Wesley, Reading, MA, p. 258.Google Scholar
- {Kog81} Kogge, P. M. The Architecture of Pipelined Computers, McGraw-Hill, New York, 1981.Google ScholarDigital Library
- {Rym82} Rymarczyk, J. W. Coding Guidelines for Pipelined Processors, Proc. of the Symp. on Arch. Supt. for Prog. Lang. and Oper. Syst., Palo Alto, CA, March 1982, pp. 12--19. Google ScholarDigital Library
- {Sit78} Sites, R. L. Instruction Ordering for the Cray-1 Computer. Tech. Rept. 78-CS-023, Univ. of California, San Diego, July 1978.Google Scholar
- {Spi71} Spillman, Thomas C., Exposing Side-Effects in a PL/I Optimizing Compiler, Information Processing 81, North-Holland, 1972, pp. 376--381.Google Scholar
- {Tho64} Thornton, J. E. Parallel Operation in the Control Data 6600, Proc. Fall Joint Comp. Conf., Part 2, Vol. 26, 1964, pp. 33--40.Google Scholar
- {Tok81} Tokoru, M., E. Tamura & T. Takizuka. Optimization of Microprograms. IEEE Trans. on Computers, Vol. C-30, No. 7, July 1981, pp. 491--504.Google Scholar
- {Tom67} Tomasulo, R. M. An Efficient Algorithm for Exploiting Multiple Arithmetic Units, IBM J. of Res. and Devt., vol. 11, No. 1, Jan. 1967, pp. 25--33.Google ScholarDigital Library
- {Veg82} Vegdahl, S. Local Code Generation and Compaction in Optimizing Microcode Compilers, Ph.D. thesis, Carnegie-Mellon Univ., Dec. 1982. Google ScholarDigital Library
- {Zel84} Zellweger, P.T. Interactive Source-Level Debugging of Optimized Programs, Research Report CSL-84-5, Xerox Palo Alto Research Center, Palo Alto, CA, May 1984.Google Scholar
Recommendations
Efficient instruction scheduling for a pipelined architecture
SIGPLAN '86: Proceedings of the 1986 SIGPLAN symposium on Compiler constructionAs part of an effort to develop an optimizing compiler for a pipelined architecture, a code reorganization algorithm has been developed that significantly reduces the number of runtime pipeline interlocks. In a pass after code generation, the algorithm ...
Comments