Abstract
We propose a new paradigm for building scalable distributed systems. Our approach does not require dealing with message-passing protocols, a major complication in existing distributed systems. Instead, developers just design and manipulate data structures within our service called Sinfonia. Sinfonia keeps data for applications on a set of memory nodes, each exporting a linear address space. At the core of Sinfonia is a new minitransaction primitive that enables efficient and consistent access to data, while hiding the complexities that arise from concurrency and failures. Using Sinfonia, we implemented two very different and complex applications in a few months: a cluster file system and a group communication service. Our implementations perform well and scale to hundreds of machines.
- Aguilera, M. K., Golab, W., and Shah, M. 2008. A practical scalable distributed B-tree. Proc. VLDB Endowment 1, 1, 598--609. Google ScholarDigital Library
- Amir, Y. and Stanton, J. 1998. The Spread wide area group communication system. Tech. rep. CNDS-98-4, The Johns Hopkins University.Google Scholar
- Amza, C. Cox, A., Dwarkadas, S., Keleher, P., Lu, H., et al. 1996. Treadmarks: Shared memory computing on networks of workstations. IEEE Comput. 29, 2, 18--28. Google ScholarDigital Library
- Birman, K. P. and Joseph, T. A. 1987. Exploiting virtual synchrony in distributed systems. In Proceedings of the Symposium on Operating System Principles. 123--138. Google ScholarDigital Library
- Budhiraja, N., Marzullo, K., Schneider, F. B., and Toueg, S. 1993. The primary-backup approach. In Distributed Systems, S. J. Mullender, Ed. Addison-Wesley, Chapter 8. Google ScholarDigital Library
- Burrows, M. 2006. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the Symposium on Operating Systems Design and Implementation. 335--350. Google ScholarDigital Library
- Carter, J. B., Bennett, J. K., and Zwaenepoel, W. 1991. Implementation and performance of Munin. In Proceedings of the Symposium on Operating Systems Principles. 152--164. Google ScholarDigital Library
- Chandra, T. D. and Toueg, S. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM 43, 2, 225--267. Google ScholarDigital Library
- Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. 2006. BigTable: A distributed storage system for structured data. In Proceedings of the Symposium on Operating Systems Design and Implementation. 205--218. Google ScholarDigital Library
- Chao, C., English, R., Jacobson, D., Stepanov, A., and Wilkes, J. 1992. Mime: A high performance storage device with strong recovery guarantees. Tech. rep. HPL-CSP-92-9, HP Laboratories.Google Scholar
- Chockler, G. V., Keidar, I., and Vitenberg, R. 2001. Group communication specifications: A comprehensive study. ACM Comput. Surv. 33, 4, 1--43. Google ScholarDigital Library
- Dasgupta, P., LeBlanc, R. J. Jr., Ahamad, M., and Ramachandran, U. 1991. The Clouds distributed operating system. IEEE Comput. 24, 11, 34--44. Google ScholarDigital Library
- Dean, J. and Ghemawat, S. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the Symposium on Operating Systems Design and Implementation. 137--150. Google ScholarDigital Library
- Défago, X., Schiper, A., and Urbán, P. 2004. Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv. 36, 4, 372--421. Google ScholarDigital Library
- Demers, A., Petersen, K., Spreitzer, M., Terry, D., Theimer, M., and Welch, B. 1994. The Bayou architecture: Support for data sharing among mobile users. In Proceedings of the IEEE Workshop on Mobile Computing Systems and Applications. 2--7. Google ScholarDigital Library
- Fakler, M., Frenz, S., Goeckelmann, R., Schoettner, M., and Schulthess, P. 2005. Project Tetropolis—Application of grid computing to interactive virtual 3D worlds. In Proceedings of the International Conference on Hypermedia and Grid Systems.Google Scholar
- Ferreira, P., Shapiro, M., Blondel, X., Fambon, O., Garcia, J., et al. 2000. Perdis: Design, implementation, and use of a persistent distributed store. In Recent Advances in Distributed Systems. Lecture Notes in Computer Science, vol. 1752. Springer, Chapter 18. Google ScholarDigital Library
- Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the Symposium on Operating Systems Principles. 29--43. Google ScholarDigital Library
- Gray, J. and Lamport, L. 2006. Consensus on transaction commit. ACM Trans. Datab. Syst. 31, 1, 133--160. Google ScholarDigital Library
- Gribble, S. D., Brewer, E. A., Hellerstein, J. M., and Culler, D. 2000. Scalable, distributed data structures for Internet service construction. In Proceedings of the Symposium on Operating Systems Design and Implementation. 319--332. Google ScholarDigital Library
- Harris, T. and Fraser, K. 2003. Language support for lightweight transactions. In Proceedings of the Conference on Object-Oriented Programming Systems, Languages and Applications. 388--402. Google ScholarDigital Library
- Herlihy, M., Luchangco, V., Moir, M., and Scherer, W. 2003. Software transactional memory for dynamic-sized data structures. In Proceedings of the Symposium on Principles of Distributed Computing. 92--101. Google ScholarDigital Library
- Herlihy, M. and Moss, J. E. B. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the International Symposium on Computer Architecture. 289--300. Google ScholarDigital Library
- Hsiao, H.-I. and DeWitt, D. 1990. Chained declustering: A new availability strategy for multi-processor database machines. In Proceedings of the International Data Engineering Conference. 456--465. Google ScholarDigital Library
- Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., and Zhao, B. 2000. OceanStore: An architecture for global-scale persistent storage. ACM SIGPLAN Not. 35, 11, 190--201. Google ScholarDigital Library
- Lamport, L. 1998. The part-time parliament. ACM Trans. Comput. Syst. 16, 2, 133--169. Google ScholarDigital Library
- Li, K. 1988. IVY: A shared virtual memory system for parallel computing. In Proceedings of the International Conference on Parallel Processing. 94--101.Google Scholar
- Liskov, B. 1988. Distributed programming in Argus. Comm. ACM 31, 3, 300--312. Google ScholarDigital Library
- Liskov, B., Castro, M., Shrira, L., and Adya, A. 1999. Providing persistent objects in distributed systems. In Proceedings of the European Conference on Object-Oriented Programming. 230--257. Google ScholarDigital Library
- MacCormick, J., Murphy, N., Najork, M., Thekkath, C. A., and Zhou, L. 2004. Boxwood: Abstractions as the foundation for storage infrastructure. In Proceedings of the Symposium on Operating Systems Design and Implementation. 105--120. Google ScholarDigital Library
- Mehra, P. and Fineberg, S. 2004. Fast and flexible persistence: The magic potion for fault-tolerance, scalability and performance in online data stores. In Proceedings of the International Parallel and Distributed Processing Symposium - Workshop 11. 206a.Google Scholar
- Olson, M. A. 1993. The design and implementation of the Inversion File System. In Proceedings of the USENIX Winter Conference. 205--218.Google Scholar
- RDMA Consortium. http://www.rdmaconsortium.org.Google Scholar
- Rhea, S., Eaton, P., Geels, D., Weatherspoon, H., Zhao, B., and Kubiatowicz, J. 2003. Pond: The OceanStore prototype. In Proceedings of the USENIX Conference on File and Storage Technologies. 1--14. Google ScholarDigital Library
- Satyanarayanan, M., Kistler, J. J., Kumar, P., Okasaki, M. E., Siegel, E. H., and Steere, D. C. 1990. Coda: A highly available file system for a distributed workstation environment. IEEE Trans. Comput. 39, 4, 447--459. Google ScholarDigital Library
- Satyanarayanan, M., Mashburn, H. H., Kumar, P., Steere, D. C., and Kistler, J. J. 1994. Lightweight recoverable virtual memory. ACM Trans. Comput. Syst. 12, 1, 33--57. Google ScholarDigital Library
- Schiper, A. and Toueg, S. 2006. From set membership to group membership: A separation of concerns. IEEE Trans. Depend. Secure Comput. 3, 1, 2--12. Google ScholarDigital Library
- Schmuck, F. B. and Wyllie, J. C. 1991. Experience with transactions in QuickSilver. In Proceedings of the Symposium on Operating Systems Principles. 239--253. Google ScholarDigital Library
- Sears, R. and Brewer, E. 2006. Stasis: Flexible transactional storage. In Proceedings of the Symposium on Operating Systems Design and Implementation. 29--44. Google ScholarDigital Library
- Shavit, N. and Touitou, D. 1995. Software transactional memory. In Proceedings of the Symposium on Principles of Distributed Computing. 204--213. Google ScholarDigital Library
- Skeen, D. and Stonebraker, M. 1983. A formal model of crash recovery in a distributed system. IEEE Trans. Softw. Engin. 9, 3, 219--228. Google ScholarDigital Library
- Spector, A. Z., Thompson, D., Pausch, R. F., Eppinger, J. L., Duchamp, D., Draves, R., Daniels, D. S., and Bloch, J. J. 1987. Camelot: A flexible and efficient distributed transaction processing facility for Mach and the Internet—An status report. Res. paper CMU-CS-87-129, Computer Science Department, Carnegie Mellon University.Google Scholar
Index Terms
- Sinfonia: A new paradigm for building scalable distributed systems
Recommendations
Sinfonia: a new paradigm for building scalable distributed systems
SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principlesWe propose a new paradigm for building scalable distributed systems. Our approach does not require dealing with message-passing protocols -- a major complication in existing distributed systems. Instead, developers just design and manipulate data ...
Sinfonia: a new paradigm for building scalable distributed systems
SOSP '07We propose a new paradigm for building scalable distributed systems. Our approach does not require dealing with message-passing protocols -- a major complication in existing distributed systems. Instead, developers just design and manipulate data ...
An approach to efficient distributed transactions
Most distributed systems proposed on the basis of the concept of atomic action or transaction strongly limit parallelism, thus reducing their level of efficiency. In this paper, features of efficiency in a distributed transaction system are ...
Comments