Abstract
A technique to enhance multicomputer routers for fault-tolerant routing with modest increase in routing complexity and resource requirements is described. This method handles solid faults in meshes, which includes all convex faults and many practical nonconvex faults, for example, faults in the shape of L or T. As examples of the proposed method, adaptive and nonadaptive fault-tolerant routing algorithms using four virtual channels per physical channel are described.
Chalasani's research has been partially supported by NSF grant CCR-9308966 and Boppana's research by NSF Grant CCR-9208784.
Chapter PDF
References
A. Agarwal et al., “The MIT Alewife machine: A large-scale distributed multiprocessor,” in Proc. of Workshop on Scalable Shared Memory Multiprocessors, Kluwer Academic Publishers, 1991.
K. Bolding and L. Snyder, “Overview of fault handling for the chaos router,” in Proceedings of the 1991 IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems, pp. 124–127, 1991.
R. V. Boppana and S. Chalasani, “Fault-tolerant wormhole routing algorithms for mesh networks,” IEEE Trans. on Computers. To appear. Preliminary results presented at Supercomputing '94.
S. Chalasani and R. V. Boppana, “Adaptive fault-tolerant wormhole routing algorithms with low virtual channel requirements,” in Int'l Symp. on Parallel Architectures, Algorithms and Networks, Dec. 1994.
A. A. Chien and J. H. Kim, “Planar-adaptive routing: Low-cost adaptive networks for multiprocessors,” in Proc. 19th Ann. Int. Symp. on Comput. Arch., pp. 268–277, 1992.
Cray Research Inc., Cray T3D Architectural Summary, Oct. 1993.
W. J. Dally and H. Aoki, “Deadlock-free adaptive routing in multicomputer networks using virtual channels,” IEEE Trans. on Parallel and Distributed Systems, vol. 4, pp. 466–475, April 1993.
W. J. Dally and C. L. Seitz, “Deadlock-free message routing in multiprocessor interconnection networks,” IEEE Trans. on Computers, vol. C-36, no. 5, pp. 547–553, 1987.
J. Duato, “A new theory of deadlock-free adaptive routing in wormhole networks,” IEEE Trans. on Parallel and Distributed Systems, vol. 4, pp. 1320–1331 Dec. 1993.
P. T. Gaughan and S. Yalamanchili, “A family of fault-tolerant routing protocols for direct multiprocessor networks,” IEEE Trans. on Parallel and Distributed Systems, vol. 6, pp. 482–497, May 1995.
C. J. Glass and L. M. Ni, “Fault-tolerant wormhole routing in meshes,” in Twenty-Third Annual Int. Symp. on Fault-Tolerant Computing, pp. 240–249, 1993.
Intel Corporation, Paragon XP/S Product Overview, 1991.
M. D. Noakes et al., “The J-machine multicomputer: An architectural evaluation,” in Proc. 20th Ann. Int. Symp. on Comput. Arch., pp. 224–235, May 1993.
C. L. Seitz, “Concurrent architectures,” in VLSI and Parallel Computation (R. Suaya and G. Birtwistle, eds.), ch. 1, pp. 1–84, San Mateo, California: Morgan-Kaufman Publishers, Inc., 1990.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chalasani, S., Boppana, R.V. (1995). Communication in multicomputer with nonconvex faults. In: Haridi, S., Ali, K., Magnusson, P. (eds) EURO-PAR '95 Parallel Processing. Euro-Par 1995. Lecture Notes in Computer Science, vol 966. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020501
Download citation
DOI: https://doi.org/10.1007/BFb0020501
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60247-7
Online ISBN: 978-3-540-44769-6
eBook Packages: Springer Book Archive