Abstract
In this article we examine the problem of extending modern operating systems to run efficiently on large-scale shared-memory multiprocessors without a large implementation effort. Our approach brings back an idea popular in the 1970s: virtual machine monitors. We use virtual machines to run multiple commodity operating systems on a scalable multiprocessor. This solution addresses many of the challenges facing the system software for these machines. We demonstrate our approach with a prototype called Disco that runs multiple copies of Silicon Graphics' IRIX operating system on a multiprocessor. Our experience shows that the overheads of the monitor are small and that the approach provides scalability as well as the ability to deal with the nonuniform memory access time of these systems. To reduce the memory overheads associated with running multiple operating systems, virtual machines transparently share major data structures such as the program code and the file system buffer cache. We use the distributed-system support of modern operating systems to export a partial single system image to the users. The overall solution achieves most of the benefits of operating systems customized for scalable multiprocessors, yet it can be achieved with a significantly smaller implementation effort.
- ACCETTA, M. J., BARON, R. V., BOLOSKY, W. J., GOLUB, D. B., RASHID, R. F., TEVANIAN, A., AND YOUNG, M. 1986. Mach: A new kernel foundation for UNIX development. In Proceedings of the Summer 1986 USENIX Technical Conference and Exhibition. USENIX Assoc., Berkeley, Calif.Google Scholar
- BOLOSKY, W. J., FITZGERALD, R. P., AND SCOTT, M.L. 1989. Simple but effective techniques for NUMA memory management. In Proceedings of the 12th ACM Symposium on Operating System Principles. ACM, New York, 19-31. Google Scholar
- BRESSOUD, T. C. AND SCHNEIDER, F.B. 1996. Hypervisor-based fault tolerance. ACM Trans. Comput. Syst. 14, 1 (Feb.), 80-107. Google ScholarDigital Library
- BREWER, T. AND ASTFALK, G. 1997. The evolution of the HP/Convex Exemplar. In Proceedings of COMPCON Spring '97. 81-96. Google Scholar
- CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R. L. 1990. Introduction to Algorithms. McGraw-Hill, New York. Google Scholar
- Cox, A. L. AND FOWLER, R.J. 1989. The implementation of a coherent memory abstraction on a NUMA multiprocessor: Experiences with PLATINUM. In Proceedings of the 12th ACM Symposium on Operating System Principles. ACM, New York, 32-44. Google Scholar
- CREASY, R. 1981. The origin of the VM/370 time-sharing system. IBM J. Res. Devel. 25, 5, 483-490.Google ScholarDigital Library
- CUSTER, H. 1993. Inside Windows NT. Microsoft Press, Redmond, Wash. Google Scholar
- EBCIOGLU, K. AND ALTMAN, E.R. 1997. DAISY: Dynamic compilation for 100% architectural compatibility. In Proceedings of the 24th International Symposium on Computer Architecture. 26-37. Google Scholar
- ENGLER, D. R., KAASHOEK, M. F., AND O'TOOLE, J., JR. 1995. Exokernel: An operating system architecture for application-level resource management. In Proceedings of the 15th ACM Symposium on Operating Systems Principles. ACM, New York. Google Scholar
- FORD, B., HIBLER, M., LEPREAU, J., TULLMAN, P., BACK, G., AND CLAWSON, S. 1996. Microkernels meet recursive virtual machines. In the 2nd Symposium on Operating Systems Design and Implementation. 137-151. Google Scholar
- GOLDBERG, R.P. 1974. Survey of virtual machine research. IEEE Comput. 7, 6, 34-45.Google ScholarDigital Library
- HERLIHY, M. 1991. Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13, 1 (Jan.), 124-149. Google ScholarDigital Library
- IBM. 1972. IBM Virtual Machine~370 Planning Guide. IBM Corp., Armonk, N.Y.Google Scholar
- KAASHOEK, M. F., ENGLER, D. R., GANGER, G. R., BRICENO, H. M., HUNT, R., MAZIERES, D., PINCKNEY, T., GRIMM, R., JANNOTTI, J., AND MACKENZIE, K. 1997. Application performance and flexibility on exokernel systems. In Proceedings of the 16th ACM Symposium on Operating Systems Principles. ACM, New York. Google Scholar
- KING, A. 1995. Inside Windows 95. Microsoft Press, Redmond, Wash.Google Scholar
- KUSKIN, J., OFELT, D., HEINRICH, M., HEINLEIN, J., SIMONI, R., GHARACHORLOO, K., CHAPIN, J., NAKAHIRA, D., BAXTER, J., HOROWITZ, M., GUPTA, A., ROSENBLUM, M., AND HENNESSY, J. 1994. The Stanford FLASH Multiprocessor. In Proceedings of the 21st International Symposium on Computer Architecture. 302-313. Google Scholar
- LAUDON, J. AND LENOSKI, D. 1997. The SGI Origin: A ccNUMA highly scalable server. In Proceedings of the 24th Annual International Symposium on Computer Architecture. 241- 251. Google Scholar
- LOVETT, T. AND CLAPP, R. 1996. STING: A CC-NUMA computer system for the commercial marketplace. In Proceedings of the 23rd Annual International Symposium on Computer Architecture. 308-317. Google Scholar
- PEREZ, M. 1995. Scalable hardware evolves, but what about the network OS? PCWeek (Dec.).Google Scholar
- PERL, S. E. AND SITES, R. L. 1996. Studies of windows NT performance using dynamic execution traces. In Proceedings of the 2nd Symposium on Operating System Design and Implementation. 169-184. Google Scholar
- ROSENBLUM, M., BUGNION, E., HERROD, S. A., AND DEVINE, S. 1997. Using the simOS machine simulator to study complex computer systems. ACM Trans. Modeling Comput. Sire. 7, 1 (Jan.), 78-103. Google ScholarDigital Library
- ROSENBLUM, M., BUGNION, E., HERROD, S. A., WITCHEL, E., AND GUPTA, A. 1995. The impact of architectural trends on operating system performance. In Proceedings of the 15th ACM Symposium on Operating Systems Principles. ACM, New York, 285-298. Google Scholar
- ROSENBLUM, M., CHAPIN, J., TEODOSIU, D., DEVINE, S., LAHIRI, T., AND GUPTA, A. 1996. Implementing efficient fault containment for multiprocessors: Confining faults in a sharedmemory multiprocessor environment. Commun. ACM 39, 9 (Sept.), 52-61. Google ScholarDigital Library
- SHULER, L., JONG, C., RIESER, R., VAN DRESSER, D., MACCABE, A. B., FISK, L., AND STALLCUP, T. 1995. The Puma operating system for massively parallel computers. In Proceedings of the Intel Supercomputer User Group Conference.Google Scholar
- UNRAU, R. C., KRIEGER, O., GAMSA, B., AND STUMM, M. 1995. Hierarchical clustering: A structure for scalable multiprocessor operating system design. J. Supercomput. 9, 1/2, 105-134. Google ScholarDigital Library
- VERGHESE, B., DEVINE, S., GUPTA, A., AND ROSENBLUM, M. 1996. Operating system support for improving data locality on CC-NUMA computer servers. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 279-289. Google Scholar
- Woo, S. C., OHARA, M., TORRIE, E., SHINGH, J. P., AND GUPTA, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture. 24-36. Google Scholar
Index Terms
- Disco: running commodity operating systems on scalable multiprocessors
Recommendations
Cellular disco: resource management using virtual clusters on shared-memory multiprocessors
Despite the fact that large-scale shared-memory multiprocessors have been commercially available for several years, system software that fully utilizes all their features is still not available, mostly due to the complexity and cost of making the ...
Virtio network paravirtualization driver
One of the techniques used to improve I/O performance of virtual machines is paravirtualization. Paravirtualized devices are intended to reduce the performance overhead on full virtualization where all hardware devices are emulated. The interface of a ...
Live gang migration of virtual machines
HPDC '11: Proceedings of the 20th international symposium on High performance distributed computingThis paper addresses the problem of simultaneously migrating a group of co-located and live virtual machines (VMs), i.e, VMs executing on the same physical machine. We refer to such a mass simultaneous migration of active VMs as "live gang migration". ...
Comments