Abstract
To satisfy the growing need for computing power, a high degree of parallelism will be necessary in future supercomputers. Up to the late 70s, supercomputers were either multiprocessors (SIMD-MIMD) or pipelined monoprocessors. Current commercial products combine these two levels of parallelism.
Effective performance will depend on the spectrum of algorithms which is actually run in parallel. In a previous paper [Je86], we have presented the DSPA processor, a pipeline processor which is actually performant on a very large family of loops.
In this paper, we present the GREEDY network, a new interconnection network (IN) for tightly coupled multiprocessors (TCMs). Then we propose an original and cost effective hardware synchronization mechanism. When DSPA processors are connected with a shared memory through a GREEDY network and synchronized by our synchronization mechanism, a very high parallelism may be achieved at execution time on a very large spectrum of loops including loops where independency of the successive iterations cannot be checked at compile time as e.g. loop 1:
DO 1 I=1 N
1 A(P(I)=A(Q(I))
- Bu71 P.Budnick, D.J.Kuck "The organization and use of parallel memories" IEEE Transactions on Computers, Vo1.C20, pp1566-1569, Dec.1971Google Scholar
- Co86 KCourtel, DESS microelectronique report June 1986 University of Rennes.Google Scholar
- Cy86 R.G.Cytmn "Doacross: beyond vectorization for multiprocessors (Extended abstract)" International Conference on Parallel Processing 1986, pp836-844Google Scholar
- Ga83 D.Gajski, D.Kuck, D.L.awrie, A.Sameh, "Cedar : a large scale multiprocessor", International Conference on Parallel Processing 1983, pp521-529Google Scholar
- Go83 A.Gottlieb & al., "The NYU Ultracomputer - Designing an MIMD shared memory parallel computer" IEEE Transactions on Computers, Vol. C-32, pp175-189, feb.1983Google Scholar
- Hw84 K.Hwang, F.A.Briggs, Computer architecture ad parallel processing. Mac Graw Hill 1984 Google ScholarDigital Library
- Je86 Y.Jegou, A.Seznec, "Data Synchronized Pipeline Architecture : Pipelining in Multiprocessor Environment" Proceedings of the 1986 International Conference on Parallel Processing pp487-494; also Journal of parallel and distributed computing, pp508-526 dec.1986 Google ScholarDigital Library
- La75 D.H.Lawrie,"Access and alignment of data in an array computer", IEEE Transactions on Computers, vol C-24, pp.11451155, dec. 1975.Google Scholar
- Pf85 Paster & al "The IBM Research Parallel Processor Prototype W3): introduction and architecture" International Conference on Parallel Processing 1985.Google Scholar
- Se86 A.Seznec, Y.Jegou "Address Synchronized Multiprocessor Architecture" rapport INRIA 527 Juillet 1987.Google Scholar
- Se87a A.Seznec, YJtgou "Optimizing memory throughput in a tightly coupled multiprocessor" Proceedings of the 1987 International Conference on Parallel Processing, pp344-346Google Scholar
- Se87b A.Seznec, "Contribution a l'etude des multiprocesseurs fortement pipelinds" these d'etat, June 1987, University of Rennes IGoogle Scholar
Index Terms
- Synchronizing processors through memory requests in a tightly coupled multiprocessor
Recommendations
Parallelizing tightly nested loops
IPPS '91: Proceedings of the Fifth International Parallel Processing SymposiumPresents a new technique to parallelize nested loops at the statement level. It transforms sequential nested loops, either vectorizable or not, into parallel ones. Previously, the wavefront method was used to parallelize non-vectorizable nested loops. ...
Comments