Abstract
The Tandem NonStop System is a fault-tolerant [1], expandable, and distributed computer system designed expressly for online transaction processing. This paper describes the key primitives of the kernel of the operating system. The first section describes the basic hardware building blocks and introduces their software analogs: processes and messages. Using these primitives, a mechanism that allows fault-tolerant resource access, the process-pair, is described. The paper concludes with some observations on this type of system structure and on actual use of the system.
- 1 Avizennis, A., Architecture of Fault-Tolerant Computing Systems, FTC-5, IEEE and F.N.I.E., Paris, (June 1975), pp 3-16.Google Scholar
- 2 Katzman, J. A., A Fault-Tolerant Computing System, Eleventh Hawaii International Conference on System Sciences, (January 1978), pp 85-102.Google Scholar
- 3 Siewiorek, D. P, Bell, C. G., and Newell, A., Computer Structures: Readings and Examples, McGraw-Hill, Inc., (1982). Google ScholarDigital Library
- 4 Bartlett, J. F., A NonStop Operating System, Eleventh Hawaii International Conference on System Sciences, (January 1978), pp 103-117.Google Scholar
- 5 Dijkstra, E. W., The Structure of the "THE" Multiprogramming System, Comm. ACM 11, (May 1968), pp 341-346. Google ScholarDigital Library
- 6 Brinch Hansen, P., The Nucleus of a Multi-programming System, Comm. ACM 13, (April 1970), pp 238-241, 250. Google ScholarDigital Library
- 7 Liskov, B., Report on the Workshop on Fundamental Issues in Distributed Computing, Operating Systems Review, (July 1981), pp 9-38.Google Scholar
- 8 CCITT, recommendation X.25, Level 2, Geneva,(1976).Google Scholar
- 9 Gleser, Malcolm A., Bayard, J., and Lang, D. D., Benchmarking for the Best, Datamation, (May 1981).Google Scholar
- 10 Tom, G. F., Checkpointing Techniques for Fault-Tolerant Process-Pairs, BS Thesis at MIT, (June 1981).Google Scholar
- 11 Enslow, P. H. Jr., Multiprocessor Organization - A Survey, Computing Surveys, Vol 9, Number 1, (March 1977), pp 103-129. Google ScholarDigital Library
- 12 Borr, A. J., Transaction Monitoring in ENCOMPASS: Reliable Distributed Transaction Processing, 7th International Conference on Very Large Data Bases, (September 1981).Google Scholar
- 13 Blake, Russ, Tailor: A Simple Model That Works, Proc. of Conf. on Simulation, Measurement, and Modelling of Computer Systems, ACM SIGMETRICS, Boulder, (August 1979). Google ScholarDigital Library
- 14 Blake, Russ, XRAY: Instrumentation for Multiple Computers, Proc. Int'l. Symp. on Computer Performance, Modelling, Measurement, and Evaluation, ACM SIGMETRICS and IFIP WG7.3, Toronto, (May 1980). Google ScholarDigital Library
Recommendations
Unreliable failure detectors for reliable distributed systems
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties—completeness and accuracy. We ...
Time, clocks, and the ordering of events in a distributed system
The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally ...
The rise and fall of High Performance Fortran: an historical object lesson
HOPL III: Proceedings of the third ACM SIGPLAN conference on History of programming languagesHigh Performance Fortran (HPF) is a high-level data-parallel programming system based on Fortran. The effort to standardize HPF began in 1991, at the Supercomputing Conference in Albuquerque, where a group of industry leaders asked Ken Kennedy to lead ...
Comments