- 1 AVIZIENm, A. Fault-tolerant systems. IEEE Trans. Comput. C-25, 12 (Dec. 1976), 1304-1312.Google Scholar
- 2 BARLOW, R. W., AND PROSCHAN, F. Mathematical Theory of Reliability. Wiley, New York, 1965.Google Scholar
- 3 DENNING, P. Fault-tolerant operating systems. Comput. Surv. 8, 4 (Dec. 1976), 359-389. Google Scholar
- 4 DIJKSTRA, E.W. A Discipline of Programming. Prentice-Hall, Englewood Cliffs, N. J., 1976. Google Scholar
- 5 GRAY, J. Notes on data base operating systems. Operating Systems: An Advanced Course, Lecture Notes in Computer Science, vol. 60. Springer-Verlag, New York, 1978, pp. 393-481. Google Scholar
- 6 GRIES, D. The Science of Programming. Springer-Verlag, New York, 1981. Google Scholar
- 7 HAaTER, P., AND BERNSTEIN, A. Proving real time properties of programs with temporal logic. In Proc. SOSP-8, Asilomar, California (Dec. 1981), 1-11. Google Scholar
- 8 HOARE, C. A.R. An axiomatic basis for computer programming. Commun. ACM 12, 10 (Oct. 1969), 576-580. Google Scholar
- 9 HOARE, C. A. R., AND WIRTH, N. An axiomatic definition of the programming language PASCAL. Acta Inf. 2 (1973), 335-355.Google Scholar
- 10 HOPKINS, A. L., SMITH, T. B., AND LALA, J.H. FTMPmA highly reliable fault-tolerant multiprocessor for aircraft. Proc. IEEE 66, 10 (Oct. 1978), 1221-1239.Google Scholar
- 11 LAMPORT, L. Time, clocks and the ordering of events in a distributed system. Commun. ACM 21, 7 (July 1978), 558-565. Google Scholar
- 12 LAMPORT, L. Using time instead of timeout for fault-tolerant distributed systems. Tech. Rep. 59, SRI Int., June 1981.Google Scholar
- 13 LAMPORT, L., SHOSTAK, R., AND PEASE, M. The Byzantine Generals Problem. A CM Trans. Program. Lang. Syst. 3 (July 1982) 382-401. Google Scholar
- 14 LAMPSON, B. Atomic transactions. Distributed Systems--Architecture and Implementation. Lecture Notes in Computer Science, vol. 105, Springer-Verlag, New York, 1981, pp. 246-265. Google Scholar
- 15 LAMPSON, B., AND STURGIS, H. Crash recovery in a distributed data storage system. To be published.Google Scholar
- 16 OWICKI, S., AND LAMPORT, L. Proving liveness properties of concurrent programs. ACM Trans. Program. Lang. Syst. 4, 3 (July 1982), 455-495. Google Scholar
- 17 PEASE, M., SHOSTAK, R., AND LAMPORT, L. Reaching agreements in the presence of faults. J. ACM 27, 2 (April 1979) 228-234. Google Scholar
- 18 PNUELI, A. The temporal semantics of concurrent programs. Semantics of Concurrent Computation, Lecture Notes in Computer Science, vol. 70, Springer-Verlag, New York, 1979, pp. 1-20. Google Scholar
- 19 RANDELL, B., LEE, P. A., AND TRELEAVEN, P.C. Reliability issues in computing system design, Comput. Surv. 10, 2 (June 1978), 123-165. Google Scholar
- 20 SCHLICHTING, R. D. Axiomatic Verification to Enhance Software Reliability. Ph.D. thesis, Dept. of Comput. Sci., CorneU Univ., Jan. 1982. Google Scholar
- 21 SCHLICHTING, R. D., AND SCHNEIDER, F.B. Understanding and using asynchronous message passing. In Proc. ACM SIGACT-SIGOPS Syrup. Principles of Distributed Computing (Ottawa, Canada, Aug. 1982), ACM, New York, pp. 141-147. Google Scholar
- 22 SCHNEIDER, F.B. Synchronization in distributed programs. ACM Trans. Program. Lang. Syst. 4, 2 (Apr. 1982), 125-148. Google Scholar
- 23 SCHNEIDER, F. B. Fail-stop processors. Digest of Papers from Spring CompCon '83 (San Francisco, Calif., Mar., 1983), IEEE Computer Society, New York.Google Scholar
- 24 SCHNEIDER, F. B., AND SCHLICHTING, R.D. Towards fault-tolerant process control software. In Proc. Eleventh Ann. Int. Syrup. Fault-Tolerant Computing (Portland, Maine, June 1981), IEEE Computer Society, New York, pp. 48-55.Google Scholar
- 25 SIEWIOREK, D., AND SWARZ, R.S. The Theory and Practice of Reliable System Design. Digital Press, Bedford, Mass., 1982.Google Scholar
- 26 WENSLEY, J., WENSKY, J. H., LAMPORT, L., GOLDBERG, J., GREEN, M., LEVITT, K. N., MELLIAR- SMITH, P. M., SHOSTAK, R. E., AND WEINSTOCK, C.B. SIFT: Design and analysis of a faulttolerant computer for aircraft control. Proc. IEEE 66, 10 (Oct. 1978) 1240-1255.Google Scholar
Index Terms
- Fail-stop processors: an approach to designing fault-tolerant computing systems
Recommendations
On the Soundness of Silence: Investigating Silent Failures Using Fault Injection Experiments
EDCC '14: Proceedings of the 2014 Tenth European Dependable Computing ConferenceFault injection campaigns have been used extensively to characterize the behavior of systems under errors. Traditional characterization studies, however, focus only on analyzing fail-stop behavior, incorrect test results, and other obvious failures ...
Fail-Stop Failure Algorithm-Based Fault Tolerance for Cholesky Decomposition
Cholesky decomposition is a widely used algorithm to solve linear equations with symmetric and positive definite coefficient matrix. With large matrices, this often will be performed on high performance supercomputers with a large number of processors. ...
Realization of fault-tolerant and Fail-Safe sequential machines
Given a synchronous sequential machine M, this correspondence deals with the fault-tolerant realization M of M and also its fail-safe realization M on the assumption that the faults that can occur to the circuitry of M or M are of permanent stuck-at ...
Comments