Abstract
For many applications, achieving good performance on a private memory parallel computer requires exploiting data parallelism as well as task parallelism. Depending on the size of the input data set and the number of nodes (i.e., processors), different tradeoffs between task and data parallelism are appropriate for a parallel system. Most existing compilers focus on only one of data parallelism and task parallelism. Therefore, to achieve the desired results, the programmer must separately program the data and task parallelism. We have taken a unified approach to exploiting both kinds of parallelism in a single framework with an existing language. This approach eases the task of programming and exposes the tradeoffs between data and task parallelism to compiler. We have implemented a parallelizing Fortran compiler for the iWarp system based on this approach. We discuss the design of our compiler, and present performance results to validate our approach.
- 1 ALBERT, E., KNOBE, K., LUKAS, J., AND STEELE, G. Compiling Fortran 8x array features for the Connection Machine computer system. In Proceedings of the A CM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages and Systems (New Haven, CT, July 1988), pp. 42-56. Google ScholarDigital Library
- 2 BORKAR, S., COHN, R., COX, G., GLEASON, S., GROSS, T., KUNG, H. T., LAM, M., MOORE, B., PETERSON, C., PIEPER, J., RANKIN, L., TSENG, P. S., SUTTON, J., UR- BANSKI, J., AND WEBB, J. iWarp: An integrated solution to high-speed parallel computing. In Supercomputing '88 (Nov. 1988), pp. 330-339. Google ScholarDigital Library
- 3 BORKAR, S., COHN, R., Cox, G., GROSS, T., KUNG, H. T., LAM, M., MOORE, M. L. B., MOORE, W., PETER- SON, C., SUSMAN, J., SUTTON, J., URBANSKI, J., AND WEBB, J. Supporting systolic and memory communication in iWarp. In Proceedings of the 17th Annual InternationaI Symposium on ComputerArchitecture (Seattle, WA, May 1990), pp. 70-81. Google ScholarDigital Library
- 4 CARRIERO, N., AND GELERNTI#R, D. Application# experience with Linda. In Proceedings of the ACM SIG- PLAN Symposium on Parallel Programming: Experience with Applications, Languages and Systems (New Haven, CT, July 1988), pp. 173-187. Google ScholarDigital Library
- 5 CHAPMAN, B., MEHROTRA, P., AND ZIMA, H. Programming in Vienna Fortran. Scientific Programming 1, 1 (Aug. 1992), 31-50.Google ScholarDigital Library
- 6 DONGARRA, J., AND SORENSEN, D. A portable environment for developing parallel Fortran programs. Parallel Computing 5 (1987), 175-186.Google ScholarCross Ref
- 7 FOX, G. The architecture of problems and portable parallel software systems. Tech. Rep. CRPC-TR91172, Northeast Parallel Architectures Center, 1991.Google Scholar
- 8 HIGH PERFORMANCE FORTRAN FORUM. High Performance Fortran Language Specification, Jan. 1993. Version 1.0 DRAFT.Google Scholar
- 9 HILLIS, D. W., AND STEELE, JR., G. L. Data parallel algorithms. Communications of the ACM 29, 12 (Dec. 1986), 1170-1183. Google ScholarDigital Library
- 10 HIRANANDANI, S., KENNEDY, K., AND TSENG, C. Compiler optimizations for Fortran D on MIMD distributedmemory machines. In Proceedings of Supercomputing '91 (Albuquerque, NM, November 1991), pp. 86-100. Google ScholarDigital Library
- 11 KOELBEL, C., MEHROTRA, P., AND ROSENDALE, J. V. Semi-automatic domain decomposition in BLAZE. In Proceedings of the 1987 International Conference on Parallel Processing (August 1987), S. K. Sahni, Ed., pp. 521-524.Google Scholar
- 12 LAM, M., AND RINARD, M. Coarse-grain parallel programming in Jade. In Proceedings of the Third A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Williamsburg, VA, April 1991), pp. 94-105. Google ScholarDigital Library
- 13 LEE, E. A., AND MESSERSCHMITT, D. G. Static scheduling of synchronous data flow programs for digital signal processing. IEEE Transactions on Computers C-36, 1 (Jan. 1987), 24-35. Google ScholarDigital Library
- 14 O'HALLARON, D. The Assign parallel program generator. In Proceedings of the 6th Distributed Memory Computing Conference (Portland, OR, Apr. 1991), pp. 178-185.Google ScholarCross Ref
- 15 PRINTZ, H. Automatic Mapping of Large Signal Processing Systems to a Parallel Machine. PhD thesis, Department of Computer Science, Carnegie-Mellon University, 1991. Also available as report CMU-CS-91- 101. Google ScholarDigital Library
- 16 PRINTZ, H., KUNG, H. T., MUMMERT, T., AND SCHERER, P. Automatic mapping of large signal processing systems to a parallel machine. In Proceedings of SPIE Symposium, Real-Time Signal Processing XI (San Diego, CA, Aug. 1989), Society of Photo-Optical Instrumentation Engineers, pp. 2-16.Google ScholarCross Ref
- 17 STICHNOTH, J. Efficient compilation of array statements for private memory multicomputers. Tech. Rep. CMU- CS-93-109, School of Computer Science, Carnegie Mellon University, Feb. 1993. Google ScholarDigital Library
- 18 SUNDERAM, V. S. PVM : A framework for parallel distributed computing. Concurrency: Practice and Experience 2, 4 (December 1990), 315-339. Google ScholarDigital Library
- 19 TSENG, P. S. A Parallelizing Compiler For Distributed Memory Parallel Computers. PhD thesis, Department of Computer Science, Carnegie-Mellon University, 1989. Also available as CMU Tech Report CMU- CS-89-148. Google ScholarDigital Library
- 20 ZIMA, H., BAST, H.-J., AND GERNDT, M. SUPERB: A tool for semi-automatic MIMD/SIMD parallelization. Parallel Computing 6 (1988), 1-18.Google ScholarCross Ref
Index Terms
- Exploiting task and data parallelism on a multicomputer
Recommendations
Exploiting task and data parallelism on a multicomputer
PPOPP '93: Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programmingFor many applications, achieving good performance on a private memory parallel computer requires exploiting data parallelism as well as task parallelism. Depending on the size of the input data set and the number of nodes (i.e., processors), different ...
A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers
Distributed Memory Multicomputers (DMMs), such as the IBM SP-2, the Intel Paragon, and the Thinking Machines CM-5, offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all ...
Comments