Abstract
This paper presents a fault tolerant multiprocessor architecture suitable for real time control applications requiring an extremely high degree of reliability. The architecture satisfies the following requirements:
l) Ability to deal with software as well as hardware faults: The proposed architecture is based on the assignment of distinct but redundant software modules to each task.
2) Efficient use of resources: The proposed architecture is a multiprocessor using time redundancy for fault correction. Thus, redundancy (beyond that needed for fault detection) is invoked only when a fault is detected. In normal operation, this extra capacity is available as an additional computing resource.
3) No hard core: In addition to the usual replication of system components, a partitioned system executive and a unique communication facility is defined which insures that the available redundancy will not be lost through a “domino” effect.
4) Interaction of computing units with sensors and effectors: The manner in which system architecture must be responsive to the amount and type of redundancy provided by the sensors and effectors is shown.
5) Use of current technology: The proposed architecture is based on the use of currently available hardware for the major system components.
After a detailed description of the architecture and the method of system operation, the system is related to existing fault tolerant systems, and unique characteristics of the present design are indicated.
- 1 Avizienis, A. A., et al., "The STAR (Self-Testing-And-Repairing) Computer: An Investigation of the Theory and Practice of Fault-Tolerant Computer Design," IEEE Trans. on Computers, Vol. C-20, No. 11 (Nov. 1971)Google Scholar
- 2 Brosius, D. B. and Jurison, J., "Design of a Voter-Comparator Switch for Redundant Computer Modules," FTC/3 (See Ref. 3)Google Scholar
- 3 Daly, T. E., Tsou, H.S.E., Lewis, J. L., and Hollowich, M. E., "The Design and Verification of a Synchronous Executive for a Fault Tolerant System," Int. Symposium on Fault Tolerant Computing (FT/3), June 1973, Palo Alto, California (IEEE Cat. No. 73CH0772-4C)Google Scholar
- 4 Daly, W. M., Hopkins, A. L., and McKenna, J. F., "A Fault-Tolerant Digital Clocking System," FTC/3 (See Ref. 3)Google Scholar
- 5 Farber, D. J., et al., "The Distributed Computer System," Sixth Annual IEEE Computer Society Int. Conf. (Compcon '72), San Francisco, California, Sept. 1972 (IEEE Catalog No. 72CH0659-3C)Google Scholar
- 6 Fischler, M. A. and Firschein, O., "A Comparison of Fault Tolerance Concepts for Computer Architecture," Lockheed Missiles & Space Company Report, Oct. 1973Google Scholar
- 7 Hopkins,Jr., A. L., "A Fault-Tolerant Information Processing Concept for Space Vehicles," IEEE Trans. on Computers, Vol C-20, No. 11 (Nov. 1971)Google ScholarDigital Library
- 8 Roberts, L. G. and Wessler, B. D., "Computer Network Development to Achieve Resource Sharing," Proc. AFIPS 1970 SJCC, Vol. 36, AFIPS Press, Montvale, N. J.Google Scholar
- 9 Wensley, J. H., "SIFT-Software Implemented Fault Tolerance," AFIPS Conf. Proc., Vol. 41, Part II, 1972, Fall Joint Computer Conf., AFIPS Press, Montvale, N. J.Google Scholar
Index Terms
- A fault tolerant multiprocessor architecture for real-time control applications
Recommendations
A fault tolerant multiprocessor architecture for real-time control applications
ISCA '73: Proceedings of the 1st annual symposium on Computer architectureThis paper presents a fault tolerant multiprocessor architecture suitable for real time control applications requiring an extremely high degree of reliability. The architecture satisfies the following requirements:
l) Ability to deal with software as ...
Graceful Degradation in Algorithm-Based Fault Tolerant Multiprocessor Systems
Algorithm-based fault tolerance (ABFT) is a technique which improves the reliability of a multiprocessor system by providing concurrent error detection and fault location capability to it. It encodes data at the system level and modifies the algorithm ...
Fault Injection and Dependability Evaluation of Fault-Tolerant Systems
The authors describe a dependability evaluation method based on fault injection that establishes the link between the experimental evaluation of the fault tolerance process and the fault occurrence process. The main characteristics of a fault injection ...
Comments