skip to main content
article
Free Access

Rollback propagation detection and performance evaluation of FTMR2M—a fault-tolerant multiprocessor

Authors Info & Claims
Published:01 April 1982Publication History
Skip Abstract Section

Abstract

In this paper we consider the rollback propagation and the performance of a fault-tolerant multiprocessor with a rollback recovery mechanism (FTMR2M)[1], which was designed to be tolerant of hardware failure with minimum time overhead. Rollback propagation between cooperating processes is usually required to ensure correct recovery from failure. To minimize the waste of processor time and storage overhead required for handling sophisticated rollback propagations, the FTMR2M always keeps one recoverable state. Approaches for evaluating the recovery overhead and analyzing the performance of FTMR2M are presented. Two methods for detecting rollback propagations and multi-step rollbacks between cooperating processes are also proposed.

References

  1. 1 A. M. Feridun and K. G. Shin, "A Fault-Tolerant Multiprocessor System with Rollback Recovery Capabilities", Proc. 2nd Int'l Conf. on Distributed Computing System, April 1981.Google ScholarGoogle Scholar
  2. 2 B. Randell, "System Structure for Software Fault Tolerance", IEEE Trans. on Software Eng., Jun. 1975, pp. 220-232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 K. M. Chandy and C. V. Ramamoorthy, "Rollback and Recovery Strategies for Computer Program", IEEE Trans. on Comp., June 1972, pp. 546-556.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 F. T. O'Brien, "Rollback Point Insertion Strategies", Proc. of the 6th Int'l Symp. on Fault-Tolerant Computing, Pittsburg, 1976, pp. 138-142.Google ScholarGoogle Scholar
  5. 5 K. Kant and A. Silberschatz, "Error Recovery in Concurrent Processes", Proc. COMPSAC 80, Fall 1980, pp. 608-614.Google ScholarGoogle Scholar
  6. 6 C. Meraud and F. Browaeys, "Automatic Rollback Techniques of the COPRA Computer", Proc. of 6th Int'l Conf. on Fault-Tolerant Computing, 1976, pp. 23-29.Google ScholarGoogle Scholar
  7. 7 K. H. Kim, "An Approach to Programmer-Transparent Coordination of Recovering Parallel Processes and its Efficient Implementation Rules", Proc. 1978 Int'l Conf. on Parallel Processing, Aug. 1978, pp. 58-68.Google ScholarGoogle Scholar
  8. 8 K. H. Kim, "An Implementation of a Programmer-Transparent Scheme for Coordinating Concurrent Processes in Recovery", Proc. COMPSAC 80, Fall 1980, pp. 615-621.Google ScholarGoogle Scholar
  9. 9 R. J. Swan, S. H. Fuller, and D. P. Siewiorek, "Cm*: a Modular Multi-Microprocessor", AFIPS Conf. Proc., Vol. 46, 1977, pp. 637-644.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10 K. H. Kim, "Error Detection, Reconfiguration and Recovery in Distributed Processing System", Proc. Int'l Conf. on Distributed Computing Systems, Oct. 1979, pp. 284-295.Google ScholarGoogle Scholar
  11. 11 S. H. Fuller, J. K. Ornstein, L. Raskin, P. I. Rubinfeld, P. J. Swan, "Multi-Microprocessors: An Overview and Working Example", Proceedings of the IEEE, Vol. 66, No. 2, pp. 216-228, Feb. 1978.Google ScholarGoogle ScholarCross RefCross Ref
  12. 12 X. Castillo, D. P. Siewiorek, "A Performance-Reliability Model for Computing Systems", 10th Int'l Conf. on Fault-Tolerant Computing, 1980, pp. 187-192.Google ScholarGoogle Scholar

Index Terms

  1. Rollback propagation detection and performance evaluation of FTMR2M—a fault-tolerant multiprocessor

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader