skip to main content
research-article

PNUTS: Yahoo!'s hosted data serving platform

Published:01 August 2008Publication History
Skip Abstract Section

Abstract

We describe PNUTS, a massively parallel and geographically distributed database system for Yahoo!'s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to reduce operational complexity. The first version of the system is currently serving in production. We describe the motivation for PNUTS and the design and implementation of its table storage and replication layers, and then present experimental results.

References

  1. Eventually consistent. http://www.allthingsdistributed.com/2007/12/-eventually_consistent.html.Google ScholarGoogle Scholar
  2. Trading consistency for scalability in distributed architectures. http://www.infoq.com/news/2008/03/ebaybase, 2008.Google ScholarGoogle Scholar
  3. M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. In SOSP, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Bernstein, N. Dani, B. Khessib, R. Manne, and D. Shutt. Data management issues in supporting large-scale web services. IEEE Data Engineering Bulletin, December 2006.Google ScholarGoogle Scholar
  5. P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. A. Bernstein and N. Goodman. Timestamp-based algorithms for concurrency control in distributed database systems. In Proc. VLDB, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and zipf-like distributions: Evidence and implications. In Proc. INFOCOM, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  8. F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. F. Dabek, M. F. Kaashoek, D. R. Karger, R. Morris, and I. Stoica. Wide-area cooperative storage with CFS. In Proc. SOSP, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Daudjee and K. Salem. Lazy database replication with snapshot isolation. In Proc. VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In SOSP, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database processing. CACM, 36(6), June 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. I. Stoica et al. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In Proc. ACM STOC, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proc. SOSP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Helland. Life beyond distributed transactions: an apostate's opinion. In Proc. Conference on Innovative Data Systems Research (CIDR), 2007.Google ScholarGoogle Scholar
  18. Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, and Ion Stoica. Querying the internet with pier. In Proc. VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Kossmann. The state of the art in distributed query processing. ACM Computing Surveys, 32(4):422--469, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. MacCormick, N. Murphy, M. Najork, C. A. Thekkath, and L. Zhou. Boxwood: Abstractions as the foundation for storage infrastructure. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. In Proc. SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Pacitti, P. Minet, and E. Simon. Fast algorithms for maintaining replica consistency in lazy master replicated databases. In VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Petersen, M. J. Spreitzer, D. B. Terry, M. M. Theimer, and A. J. Demers. Flexible update propagation for weakly consistent replication. In Proc. SOSP, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In Middleware, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Silberstein, B. F. Cooper, U. Srivastava, E. Vee, R. Yerneni, and R. Ramakrishnan. Efficient bulk insertion into a distributed ordered table. In Proc. SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proc. SIGCOMM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proc. OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. A. Weil, S. A. Brandt, E. L. Miller, and C. Maltzahn. CRUSH: Controlled, scalable, decentralized placement of replicated data. In Proc. Supercomputing (SC), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PNUTS: Yahoo!'s hosted data serving platform

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader