Abstract
A compiler for VLIW and superscalar processors must expose sufficient instruction-level parallelism (ILP) to effectively utilize the parallel hardware. However, ILP within basic blocks is extremely limited for control-intensive programs. We have developed a set of techniques for exploiting ILP across basic block boundaries. These techniques are based on a novel structure called thesuperblock. The superblock enables the optimizer and scheduler to extract more ILP along the important execution paths by systematically removing constraints due to the unimportant paths. Superblock optimization and scheduling have been implemented in the IMPACT-I compiler. This implementation gives us a unique opportunity to fully understand the issues involved in incorporating these techniques into a real compiler. Superblock optimizations and scheduling are shown to be useful while taking into account a variety of architectural features.
Similar content being viewed by others
References
Aho, A., Sethi, R., and Ullman, J. 1986.Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, Mass.
Aiken, A., and Nicolau, A. 1988. A development environment for horizontal microcode.IEEE Trans. Software Engineering, 14 (May): 584–594.
Bernstein, D., and Rodeh, M. 1991. Global instruction scheduling for superscalar machines. InProc., ACM SIGPLAN 1991 Conf. on Programming Language Design and Implementation (June), pp. 241–255.
Chaitin, G.J. 1982. Register allocation and spilling via graph coloring. InProc., ACM SIGPLAN 82 Symp. on Compiler Construction (June), pp. 98–105.
Chang, P.P., and Hwu, W.W. 1988. Trace selection for compiling large C application programs to microcode. InProc., 21st Internat. Workshop on Microprogramming and Microarchitecture (Nov.), pp. 188–198.
Chang, P.P., Mahlke, S.A., and Hwu, W.W. 1991. Using profile information to assist classic code optimizations.Software Practice and Experience, 21, 12 (Dec): 1301–1321.
Chang, P.P., Mahlke, S.A., Chen, W.Y., Waiter, N.J., and Hwu, W.W. 1991. IMPACT: An architectural framework for multiple-instruction-issue processors. InProc., 18th Internat. Symp. on Comp. Architecture (May), pp. 266–275.
Chen, W.Y., Chang, P.P., Conte, T.M., and Hwu, W.W. 1991. The effect of code expanding optimizations on instruction cache design. Tech. Rept. CRHC-91-17, Center for Reliable and High-Performance Computing, Univ. of Ill., Urbana, Ill.
Chow, F.C., and Hennessy, J.L. 1990. The priority-based coloring approach to register allocation.ACM Trans. Programming Languages and Systems, 12 (Oct.): 501–536.
Colwell, R.P., Nix, R.P., O'Donnell, J.J., Papworth, D.B., and Rodman, P.K. 1987. A VLIW architecture for a trace scheduling compiler. InProc., 2nd Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Apr.), pp. 180–192.
Ellis, J. 1986.Bulldog: A Compiler for VLIW Architectures. MIT Press, Cambridge, Mass.
Fisher, J.A. 1981. Trace scheduling: A technique for global microcode compaction.IEEE Trans. Comps., C-30, 7 (July): 478–490.
Gupta, R., and Soffa, M.L. 1990. Region scheduling: An approach for detecting and redistributing parallelism.IEEE Trans. Software Engineering, 16 (Apr.): 421–431.
Horst, R.W., Harris, R.L., and Jardine, R.L. 1990. Multiple instruction issue in the NonStop Cyclone processor. InProc., 17th Internat. Symp. on Computer Architecture (May), pp. 216–226.
Hwu, W.W., and Chang, P.P. 1989a. Achieving high instruction cache performance with an optimizing compiler. InProc., 16th Internat. Symp. on Comp. Architecture (May), pp. 242–251.
Hwu, W.W., and Chang, P.P. 1989b. Inline function expansion for compiling realistic C programs. InProc., ACM SIGPLAN 1989 Conf. on Programming Language Design and Implementation (June), pp. 246–257.
Hwu, W.W., and Chang, P.P. 1992. Efficient instruction sequencing with inline target insertion.IEEE Trans. Comps., 41, 12 (Dec.):1537–1551.
Intel. 1989.i860 64-Bit Microprocessor Programmer's Reference Manual. Intel Corp., Santa Clara, Calif.
Jouppi, N.P., and Wall, D.W. 1989. Available instruction-level parallelism for superscalar and superpipelined machines. InProc., 3rd Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Apr.), pp. 272–282.
Kane, G. 1987.MIPS R2000 RISC Architecture. Prentice-Hall, Englewood Cliffs, N.J.
Kuck, D.J. 1978.The Structure of Computers and Computations. John Wiley, New York.
Kuck, D.J., Kuhn, R.H., Padua, D.A., Leasure, B., and Wolfe, M. 1981. Dependence graphs and compiler optimizations. InProc., 8th ACM Symp. on Principles of Programming Languages (Jan.), pp. 207–218.
Mahlke, S.A., Chen, W.Y., Hwu, W.W., Rau, B.R., and Schlansker, M.S.S. 1992. Sentinel scheduling for VLIW and superscalar processors. InProc., 5th Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Boston, Oct.), pp. 238–247.
Nakatani, T., and Ebcioglu, K. 1989. Combining as a compilation technique for VLIW architectures. InProc., 22nd Internat. Workshop on Microprogramming and Microarchitecture (Sept.), pp. 43–55.
Rau, B.R., Yen, D.W.L., Yen, W., and Towle, R. A. 1989. The Cydra 5 departmental supercomputer.IEEE Comp., 22, 1 (Jan.): 12–34.
Schuette, M.A., and Shen, J.P. 1991. An instruction-level performance analysis of the Multiflow TRACE 14/300. InProc., 24th Internat. Workshop on Microprogramming and Microarchitecture (Nov.), pp. 2–11.
Smith, M.D., Johnson, M., and Horowitz, M.A. 1989. Limits on multiple instruction issue. InProc., 3rd Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Apr.), pp. 290–302.
Warren, H.S., Jr. 1990. Instruction scheduling for the IBM RISC System/6000 processor.IBM J. Res. and Dev., 34, 1 (Jan.): 85–92.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Hwu, W.M.W., Mahlke, S.A., Chen, W.Y. et al. The superblock: An effective technique for VLIW and superscalar compilation. J Supercomput 7, 229–248 (1993). https://doi.org/10.1007/BF01205185
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01205185