Abstract
An interconnection pattern of processing elements, the cube-connected cycles (CCC), is introduced which can be used as a general purpose parallel processor. Because its design complies with present technological constraints, the CCC can also be used in the layout of many specialized large scale integrated circuits (VLSI). By combining the principles of parallelism and pipelining, the CCC can emulate the cube-connected machine and the shuffle-exchange network with no significant degradation of performance but with a more compact structure. We describe in detail how to program the CCC for efficiently solving a large class of problems that include Fast Fourier transform, sorting, permutations, and derived algorithms.
- 1 Aho, A.V., Hopcroft J.E., and Ullman, J.D., The Analysis and Design of Computer Algorithms. Addison-Wesley, Reading, MA, 1974. Google ScholarDigital Library
- 2 Barnes, G.H., Brown, R.M., Kato, M., Kuck, D.J,, Slotnick, DL, and Stokes, R.A. The ILLIAC IV computer. IEEE Trans. Comput. C-17, 8, (Aug. 1968), 746-757.Google ScholarDigital Library
- 3 Batcher, K.E., Sorting networks and their applications. Proc. AF1PS Spring Joint Computer Conf. 32, Atlantic City, N J, April 1968, 307-314.Google ScholarDigital Library
- 4 Brent, R.P., and Kung, H.T. The chip complexity of binary arithmetic. Proc. 12th Ann. Symp. on Theory of Computing, Los Angeles, CA., April 1980, 190-200. Google ScholarDigital Library
- 5 Gentleman, W.M., Some complexity results for matrix computation on parallel processors. J. A CM, 25, 1 (Jan. 1978) 112- 115. Google ScholarDigital Library
- 6 Guibas, L.J., Kung, H.T., and Thompson, C.D., Direct VSLI implementation of combinatorial algorithms. Res. Rept., Dept. Comp. Sci., Carnegie-Mellon Univ., Pittsburgh, PA, March 1979.Google Scholar
- 7 Heller, D., A survey of parallel algorithms in numerical linear algebra. Dept. Comp. Sci., Carnegie-Mellon Univ., Pittsburgh, PA, Feb. 1976.Google Scholar
- 8 Hirschberg, D.S., Fast Parallel sorting algorithms. Comm ACM, 21, 8 (Aug. 1978) 657-661. Google ScholarDigital Library
- 9 Hoey, D., and Leiserson, C.L., A layout for the shuffle-exchange network. Proc. 1980 Int'l Conf. on Parallel Processing, Boyne, MI, Aug. 1980, 327-336.Google Scholar
- 10 Knuth, D.E., The Art of Computer Programming. Vol. 3, Sorting and Searching. Addison-Wesley, Reading, MA, 1973. Google ScholarDigital Library
- 11 Kung, H.T., and Leiserson, C.E., Algorithms for VSLI processor arrays. Symp. on Sparse Matrix Computations, Knoxville, TN, Nov. 1978.Google Scholar
- 12 Levitt, K.N., and Kautz, W.H., Cellular arrays for the solution of graph problems. Comm ACM, 15, 9 (Sept. 1972) 789-801. Google ScholarDigital Library
- 13 Mead, C., and Conway, L.A., Introduction to VLS1 Systems. Addison-Wesley, Reading, MA, 1980. Google ScholarDigital Library
- 14 Nassimi, D., and Sahni, S., Bitonic sort on a mesh-connected parallel computer. IEEE Trans. Comput. C-28, 1, (Jan. 1979) 2-7.Google ScholarDigital Library
- 15 Pease, M.C., The indirect binary n-cube microprocessor array. 1EEE Trans. Comput. C-26, 5, (May 1977) 458-473.Google ScholarDigital Library
- 16 Pease, M.C., An adaptation of the Fast Fourier Transform for parallel processing. J. A CM, 15, 2 (April 1968) 252-264. Google ScholarDigital Library
- 17 Preparata, F.P., and Vuillemin, J., Area time optimal VLSI networks based in the cube-connected cycles. Tech. Rept. INRIA n.13, Rocquencourt, France and ACT-.21, Coord. Sci. Lab., Univ. Illinois, Urbana, IL, 1980.Google Scholar
- 18 Preparata, F.P., New parallel sorting schemes. 1EEE Trans. Comput. C-27, 7, (July 1978) 669-673.Google ScholarDigital Library
- 19 Steinberg, D., and Rodeh, M., A layout for the shuffle-exchange network with O(A T/log3/2N) area. (Submitted for publication).Google Scholar
- 20 Stone, H.S,, Parallel processing with the perfect shuffle. IEEE Trans. Comput. C-20, 2, (Feb. 1971) 153-161.Google ScholarDigital Library
- 21 Thompson, C.D., A complexity theory for VSLI. Ph.D. Thesis (Preliminary draft), Dept. Comp. Sci., Carnegie-Mellon Univ., 1980. Google ScholarDigital Library
- 22 Thompson, C.D., Area-time complexity for VSL1. Proc. 1 lth Ann. Syrup. on Theory of Computing. Atlanta, GA, May 1979, 81- 88. Google ScholarDigital Library
- 23 Thompson, C.D., and Kung, H.T., Sorting on a mesh connected computer. Proc. 8th Ann. Symp. on Theory of Computing, Hershey, PA, May 1976, 58-64. Google ScholarDigital Library
- 24 Valiant, L.G., Parallelism in comparison problems. SlAM J. Comput., 4, 3, (Sept. 1975) 348-355.Google Scholar
- 25 Vuillemin, J., A combinatorial limit to the computing power of VSLI circuits. Proc. 21st Symp. on Foundations of Computer Science. Syracuse, NY, Oct. 1980, 294-300Google ScholarDigital Library
- 26 Waksman, A., A permutation network. J. ACM, 15, 1 (Jan. 1968) 159-163. Google ScholarDigital Library
Index Terms
- The cube-connected cycles: a versatile network for parallel computation
Recommendations
The Extended Cube Connected Cycles: An Efficient Interconnection for Massively Parallel Systems
The hypercube structure is a very widely used interconnection topology because of its appealing topological properties. For massively parallel systems with thousands of processors, the hypercube suffers from a high node fanout which makes such systems ...
A Cube-Connected Cycles Architecture with High Reliability and Improved Performance
The cube-connected cycles (CCC) architecture is an attractive parallel computation network, because it is suitable for VLSI implementation while preserving all the desired features of hypercubes. However, the CCC tends to suffer from considerable ...
The Adaptive Bubble Router
The design of a new adaptive virtual cut-through router for torus networks is presented in this paper. With much lower VLSI costs than adaptive wormhole routers, the adaptive Bubble router is even faster than deterministic wormhole routers based on ...
Comments