ABSTRACT
This paper presents techniques for compiling loops with complex, indirect array accesses into loops whose array references have at most one level of indirection. The transformation allows prefetching of array indices for more efficient structuring of communication on distributed-memory machines. It can also improve performance on other architectures by enabling prefetching of data between levels of the memory hierarchy or exploitation of hardware support for vectorized gather/scatter. Our techniques are implemented in a compiler for Fortran D and execution speed improvements are given for multiprocessor and vector machines.
- 1.B. Alpern, M. N. Wegman, and F. K. Zadeck. Detecting equality of variables in programs. In Proceedings of the Fifteenth Annual ACM Symposium on the Principles of Programming Languages, pages 1-11, San Diego, CA, January 1988.]] Google ScholarDigital Library
- 2.B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus. Charmm: A program for macromolecular energy, minimization, and dynamics calculations. Journal of Computational Chemistry, 4:187, 1983.]]Google ScholarCross Ref
- 3.B. Chapman, P. Mehrotra, and H. Zima. Programming in Vienna Fortran. Scientific Programming, 1(1):31-50, Fall 1992.]]Google ScholarDigital Library
- 4.Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems, 13(4):451-490, October 1991.]] Google ScholarDigital Library
- 5.R. Das, Y.-S. Hwang, M. Uysal, J. Saltz, and A. Sussman. Applying the CHAOS/PARTI library to irregular problems in computational chemistry and computational aerodynamics. In Proceedings of the 1993 Scalable Parallel Libraries Conference, pages 45-56. IEEE Computer Society Press, October 1993.]]Google Scholar
- 6.R. Das, D. J. Mavriplis, J. Saltz, S. Gupta, and R. Ponnusamy. The design and implementation of a parallel unstructured Euler solver using software primitives. AIAA Journal, 32(3):489-496, March 1994.]]Google ScholarCross Ref
- 7.R. Das and J. Saltz. Parallelizing molecular dynamics codes using the Parti software primitives. In Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, pages 187-192. SIAM, March 1993.]]Google Scholar
- 8.D. Loveman (Ed.). Draft High Performance Fortran language specification, version 1.0. Technical Report CRPC-TR92225, Center for Research on Parallel Computation, Rice University, January 1993.]]Google Scholar
- 9.Geoffrey Fox, Seema Hiranandani, Ken Kennedy, Charles Koelbel, Uli Kremer, Chau-Wen Tseng, and Min-You Wu. Fortran D language specification. Technical Report CRPC-TR90079, Center for Research on Parallel Computation, Rice University, December 1990.]]Google Scholar
- 10.R. v. Hanxleden, K. Kennedy, C. Koelbel, R. Das, and J. Saltz. Compiler analysis for irregular problems in Fortran D. In Proceedings of the 5th Workshop on Languages and Compilers for Parallel Computing, New Haven, CT, August 1992.]] Google ScholarDigital Library
- 11.Reinhard v. Hanxleden. Handling irregular problems with Fortran D - a preliminary report. In Proceedings of the Fourth Workshop on Compilers for Parallel Computers, Delft, The Netherlands, December 1993. Also available as CRPC Technical Report CRPC-TR93339- S.]]Google ScholarCross Ref
- 12.Reinhard von Hanxleden, Ken Kennedy, Charles Koelbel, Raja Das, and Joel Saltz. Compiler analysis for irregular problems in Fortran D. Technical Report 92-22, ICASE, NASA Langley Research Center, June 1992.]]Google Scholar
- 13.Paul Havlak. Interprocedural Symbolic Analysis. PhD thesis, Rice University, Houston, TX, May 1994.]] Google ScholarDigital Library
- 14.S. Hiranandani, K. Kennedy, and C. Tseng. Evaluation of compiler optimizations for Fortran D on MIMD distributed-memory machines. In Proceedings of the 1992 International Conference on Supercomputing. ACM Press, July 1992.]] Google ScholarDigital Library
- 15.S. Hiranandani, K. Kennedy, and C. W. Tseng. Compiler optimizations for fortran d on mimd distributed memory machines. In Supercomputing '91, Albuquerque, NM, November 1991.]] Google ScholarDigital Library
- 16.Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiler optimizations for Fortran D on MIMD distributed-memory machines. In Proceedings Supercomputing '91, pages 86-100. IEEE Computer Society Press, November 1991.]] Google ScholarDigital Library
- 17.Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiling Fortran D for MIMD distributed-memory machines. Communications of the ACM, 35(8):66-80, August 1992.]] Google ScholarDigital Library
- 18.C. Koelbel and P. Mehrotra. Compiling global name-space parallel loops for distributed execution. IEEE Transactions on Parallel and Distributed Systems, 2(4):440-451, October 1991.]] Google ScholarDigital Library
- 19.T. Lengauer and R. E. Tarjan. A fast algorithm for finding dominators in a flowgraph. ACM Transactions on Programming Languages and Systems, 1:121-141, 1979.]] Google ScholarDigital Library
- 20.D. J. Mavriplis. Three dimensional multigrid for the Euler equations. AIAA paper 91-1549CP, pages 824-831, June 1991.]]Google Scholar
- 21.R. Mirchandaney, J. H. Saltz, R. M. Smith, D. M. Nicol, and Kay Crowley. Principles of runtime support for parallel processors. In Proceedings of the 1988 ACM International Conference on Supercomputing, pages 140-152, July 1988.]] Google ScholarDigital Library
- 22.A. Rogers and K. Pingali. Compiling for distributed memory architectures. IEEE Transactions on Parallel and Distributed Systems, 5(3):281-298, March 1994.]] Google ScholarDigital Library
- 23.J. Saltz, R. Das, R. Ponnusamy, D. Mavriplis, H Berryman, and J. Wu. Parti procedures for realistic loops. In Proceedings of the 6th Distributed Memory Computing Conference, Portland, Oregon, April- May 1991.]]Google ScholarCross Ref
- 24.Joel Saltz, Harry Berryman, and Janet Wu. Multiprocessors and runtime compilation. Technical Report 90-59, ICASE, NASA Langley Research Center, September 1990.]]Google Scholar
- 25.G. A. Venkatesh. The semantic approach to program slicing. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 107-119, June 1991.]] Google ScholarDigital Library
- 26.M. Weiser. Program slicing. IEEE Trans. on Software Eng., SE- 10(4):352-357, July 1984.]]Google ScholarDigital Library
- 27.H. Zima, P. Brezany, B. Chapman, P. Mehrotra, and A. Schwald. Vienna Fortran - a language specification, version 1.1. Interim Report 21, ICASE, NASA Langley Research Center, March 1992.]]Google Scholar
Index Terms
- Index array flattening through program transformation
Recommendations
Flattening and parallelizing irregular, recurrent loop nests
PPOPP '95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programmingIrregular loop nests in which the loop bounds are determined dynamically by indexed arrays are difficult to compile into expressive parallel constructs, such as segmented scans and reductions. In this paper, we describe a suite of transformations to ...
Flattening-based mapping of imperfect loop nests for CGRAs
CODES '14: Proceedings of the 2014 International Conference on Hardware/Software Codesign and System SynthesisFor loop accelerators such as coarse-grained reconfigurable architectures (CGRAs) and GP-GPUs, nested loops represent an important source of parallelism. Existing solutions to mapping nested loops on CGRAs, however, are either designed for perfectly ...
Data-only flattening for nested data parallelism
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programmingData parallelism has proven to be an effective technique for high-level programming of a certain class of parallel applications, but it is not well suited to irregular parallel computations. Blelloch and others proposed nested data parallelism (NDP) as ...
Comments