ABSTRACT
RAMCloud is a DRAM-based storage system that provides inexpensive durability and availability by recovering quickly after crashes, rather than storing replicas in DRAM. RAMCloud scatters backup data across hundreds or thousands of disks, and it harnesses hundreds of servers in parallel to reconstruct lost data. The system uses a log-structured approach for all its data, in DRAM as well as on disk: this provides high performance both during normal operation and during recovery. RAMCloud employs randomized techniques to manage the system in a scalable and decentralized fashion. In a 60-node cluster, RAMCloud recovers 35 GB of data from a failed server in 1.6 seconds. Our measurements suggest that the approach will scale to recover larger memory sizes (64 GB or more) in less time with larger clusters.
- More Details on Today's Outage | Facebook, Sept. 2010. http://www.facebook.com/note.php?note_id=431441338919.Google Scholar
- Agiga tech agigaram, Mar. 2011. http://www.agigatech.com/agigaram.php.Google Scholar
- memcached: a distributed memory object caching system, Jan. 2011. http://www.memcached.org/.Google Scholar
- M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. ACM Trans. Comput. Syst., 27:5:1--5:48, November 2009. Google ScholarDigital Library
- Y. Azar, A. Z. Broder, A. R. Karlin, and E. Upfal. Balanced allocations (extended abstract). In Proceedings of the twenty-sixth annual ACM symposium on theory of computing, STOC '94, pages 593--602, New York, NY, USA, 1994. ACM. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26:4:1--4:26, June 2008. Google ScholarDigital Library
- B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo!'s hosted data serving platform. Proc. VLDB Endow., 1:1277--1288, August 2008. Google ScholarDigital Library
- J. Dean. Keynote talk: Evolution and future directions of large-scale storage and computation systems at google. In Proceedings of the 1st ACM symposium on Cloud computing, Jun 2010. Google ScholarDigital Library
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: amazon's highly available key-value store. In Proceedings of twenty-first ACM SIGOPS symposium on operating systems principles. SOSP '07, pages 205--220, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. A. Wood. Implementation techniques for main memory database systems. In Proceedings of the 1984 ACM SIGMOD international conference on management of data, SIGMOD '84, pages 1--8, New York, NY, USA, 1984. ACM. Google ScholarDigital Library
- H. Garcia-Molina and K. Salem. Main memory database systems: An overview. IEEE Trans. on Knowl. and Data Eng., 4:509--516, December 1992. Google ScholarDigital Library
- S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In Proceedings of the nineteenth ACM symposium on Operating systems principles, SOSP '03, pages 29--43, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst., 12:463--492, July 1990. Google ScholarDigital Library
- P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX annual technical conference, USENIX ATC '10, pages 11--11, Berkeley, CA, USA, 2010. USENIX Association. Google ScholarDigital Library
- R. Johnson and J. Rothschild. Personal Communications, March 24 and August 20, 2009.Google Scholar
- R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-store: a high-performance, distributed main memory transaction processing system. Proc. VLDB Endow., 1:1496--1499, August 2008. Google ScholarDigital Library
- M. D. Mitzenmacher. The power of two choices in randomized load balancing. PhD thesis, University of California, Berkeley, 1996. AAI9723118. Google ScholarDigital Library
- J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, D. Ongaro, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for ramcloud. Commun. ACM, 54:121--130, July 2011. Google ScholarDigital Library
- J. K. Ousterhout, A. R. Cherenson, F. Douglis, M. N. Nelson, and B. B. Welch. The sprite network operating system. Computer, 21:23--36, February 1988. Google ScholarDigital Library
- D. A. Patterson, G. Gibson, and R. H. Katz. A case for redundant arrays of inexpensive disks (raid). In Proceedings of the 1988 ACM SIGMOD international conference on management of data, SIGMOD '88, pages 109--116, New York, NY, USA, 1988. ACM. Google ScholarDigital Library
- M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst., 10:26--52, February 1992. Google ScholarDigital Library
- M. Seltzer, K. A. Smith, H. Balakrishnan, J. Chang, S. McMains, and V. Padmanabhan. File system logging versus clustering: a performance comparison. In Proceedings of the USENIX 1995 Technical Conference, TCON'95, pages 21--21, Berkeley, CA, USA, 1995. USENIX Association. Google ScholarDigital Library
- K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST '10, pages 1--10, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarDigital Library
- I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw., 11:17--32, February 2003. Google ScholarDigital Library
Index Terms
- Fast crash recovery in RAMCloud
Recommendations
The RAMCloud Storage System
RAMCloud is a storage system that provides low-latency access to large-scale datasets. To achieve low latency, RAMCloud stores all data in DRAM at all times. To support large capacities (1PB or more), it aggregates the memories of thousands of servers ...
The Design of efficient initialization and crash recovery for log-based file systems over flash memory
While flash memory has been widely adopted for storage systems for various embedded systems, issues of performance and reliability have started receiving growing attention in recent years. How to provide efficient roll back and quick mounting for flash-...
Deterministic Crash Recovery for NAND Flash Based Storage Systems
DAC '14: Proceedings of the 51st Annual Design Automation ConferenceNAND flash memory has long been the dominant storage medium in mobile devices. However, power failure may occur at any time and result in loss of important data. Crash recovery therefore becomes vitally important in NAND flash memory storage systems. As ...
Comments