ABSTRACT
Many applications in a variety of domains including digital signal processing, image processing and computer vision are composed of a sequence of tasks that act on a stream of input data sets in a pipelined manner. Recent research has established that these applications are best mapped to a massively parallel machine by dividing the tasks into modules and assigning a subset of the available processors to each module. This paper addresses the problem of optimally mapping such applications onto a massively parallel machine. We formulate the problem of optimizing throughput in task pipelines and present two new solution algorithms. The formulation uses a general and realistic model for inter-task communication, takes memory constraints into account, and addresses the entire problem of mapping which includes clustering tasks into modules, assignment of processors to modules, and possible replication of modules. The first algorithm is based on dynamic programming and finds the optimal mapping of k tasks onto P processors in O(P4k2) time. We also present a heuristic algorithm that is linear in the number of processors and establish with theoretical and practical results that the solutions obtained are optimal in practical situations. The entire framework is implemented as an automatic mapping tool for the Fx parallelizing compiler for High Performance Fortran. We present experimental results that demonstrate the importance of choosing a good mapping and show that the methods presented yield efficient mappings and predict optimal performance accurately.
- 1.BOKHARI, S. Assignment Problems in Parallel and Distributed Computing. Kluwer Academic Publishers, 1987.]] Google ScholarDigital Library
- 2.CHANDY, M., FOSTER, I., KENNEDY, K., KOELBEL, C., AND TSENG, C. Integrated support for task and data parallelism. International Journal of Supercomputer Applications 8, 2 (1994), 80-98.]]Google Scholar
- 3.CHAPMAN, B., MEHROTRA, P., VAN ROSENDALE, J., AND ZIMA, H. A software architecture for multidisciplinary applications: Integrating task and data parallelism. Tech. Rep. 94-18, ICASE, NASA Langley Research Center, Hampton, VA, Mar. 1994.]] Google ScholarDigital Library
- 4.CHOUDHARY, A., NARAHARI, B., NICOL, D., AND SIMHA, R. Optimal processor assignment for a class of pipelined computations. IEEE Transactions on Parallel and Distributed Systems 5, 4 (April 94), 439-445.]] Google ScholarDigital Library
- 5.CROWL, L., CROVELLA, M., LEBLANC, T., AND SCOTT, M. The advantages of multiple parallelizations in combinatorial search. Journal of Parallel and Distributed Computing 21 (1994), 110-123.]] Google ScholarDigital Library
- 6.DINDA, P., GROSS, T., O'HALLARON, D., SEGALL, E., STICH- NOTH, J., SUBHLOK, J., WEBB, J., AND YANG, B. The CMU task parallel program suite. Tech. Rep. CMU-CS-94-131, School of Computer Science, Carnegie Mellon University, Mar. 1994.]]Google Scholar
- 7.FOSTER, I., AVALANI, B., CHOUDHARY, A., AND XU, M. A compilation system that integrates High Performance Fortran and Fortran M. In Proceeding of 1994 Scalable High Performance Computing Conference (Knoxville, TN, October 1994), pp. 293-300.]]Google ScholarCross Ref
- 8.GROSS, T., O'HALLARON, D., AND SUBHLOK, J. Task parallelism in a High Performance Fortran framework. IEEE Parallel & Distributed Technology, 3 (1994), 16-26.]] Google ScholarDigital Library
- 9.HIGH PERFORMANCE FORTRAN FORUM. High Performance Fortran Language Specification, Version 1.0, May 1993.]]Google Scholar
- 10.RAMASWAMY, S,, SAPATNEKAR, $., AND BANERJEE, P. A convex programming approach for exploiting data and functional parallelism. In Proceedings of the 1994 International Conference on Parallel Processing (St Charles, IL, August 1994), vol. 2, pp. 116-125.]] Google ScholarDigital Library
- 11.SARKAR, V. Partitioning and Scheduling Parallel Programs for Multiprocessors. The MIT Press, Cambridge, MA, 1989.]] Google ScholarDigital Library
- 12.SUBHLOK, J., O'HALLARON, D., GROSS, T., DINDA, P., AND WEBB, J. Communication and memory requirements as the basis for mapping task and data parallel programs. In Supercomputing '94 (Washington, DC, November 1994), pp. 330- 339.]]Google ScholarCross Ref
- 13.SUBHLOK, J., STICHNOTH, J., O'HALLARON, D., AND GROSS, T. Exploiting task and data parallelism on a multicomputer. in A CM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Diego, CA, May 1993), pp. 13- 22.]] Google ScholarDigital Library
- 14.VONDRAN, G. Optimization of latency, throughput and processors for pipelines of data parallel tasks. Master's thesis, Dept. of Electrical and Computer Engineering, Carnegie Mellon University, 1995. In preparation.]]Google Scholar
- 15.WEBB, J. Latency and bandwidth consideration in parallel robotics image processing. In Supercomputing '93 (Portland, OR, Nov. 1993), pp. 230-239.]] Google ScholarDigital Library
- 16.YANG, B., WEBB, J., STICHNOTH, J., O'HALLARON, D., AND GROSS, T. Do&merge: Integrating parallel loops and reductions. In Sixth Annual Workshop on Languages and Compilers for Parallel Computing (Portland, Oregon, Aug 1993).]] Google ScholarDigital Library
- 17.YANG, T. Scheduling and Code Generation for Parallel Architectures. PhD thesis, Rutgers University, May 1993.]] Google ScholarDigital Library
Index Terms
- Optimal mapping of sequences of data parallel tasks
Recommendations
Optimal mapping of sequences of data parallel tasks
Many applications in a variety of domains including digital signal processing, image processing and computer vision are composed of a sequence of tasks that act on a stream of input data sets in a pipelined manner. Recent research has established that ...
Communicating Data-Parallel Tasks: An MPI Library for HPF
HIPC '96: Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)High Performance Fortran (HPF) has emerged as a standard dialect of Fortran for data-parallel computing. However, HPF does not support task parallelism or heterogeneous computing adequately. This paper presents a summary of our work on a library-based ...
Combined scheduling and mapping for scalable computing with parallel tasks
Biological Knowledge Discovery and Data MiningRecent and future parallel clusters and supercomputers use symmetric multiprocessors SMPs and multi-core processors as basic nodes, providing a huge amount of parallel resources. These systems often have hierarchically structured interconnection ...
Comments