research-article

PNUTS: Yahoo!'s hosted data serving platform

Authors:
Brian F. Cooper

Yahoo! Research

Yahoo! Research
View Profile

,
Raghu Ramakrishnan

Yahoo! Research

Yahoo! Research
View Profile

,
Utkarsh Srivastava

Yahoo! Research

Yahoo! Research
View Profile

,
Adam Silberstein

Yahoo! Research

Yahoo! Research
View Profile

,
Philip Bohannon

Yahoo! Research

Yahoo! Research
View Profile

,
Hans-Arno Jacobsen

Yahoo! Research

Yahoo! Research
View Profile

,
Nick Puz

Yahoo! Research

Yahoo! Research
View Profile

,
Daniel Weaver

Yahoo! Research

Yahoo! Research
View Profile

,
Ramana Yerneni

Yahoo! Research

Yahoo! Research
View Profile

Proceedings of the VLDB Endowment Volume 1 Issue 2pp 1277–1288https://doi.org/10.14778/1454159.1454167

Published:01 August 2008Publication History

Proceedings of the VLDB Endowment

Abstract

We describe PNUTS, a massively parallel and geographically distributed database system for Yahoo!'s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of concurrent requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to reduce operational complexity. The first version of the system is currently serving in production. We describe the motivation for PNUTS and the design and implementation of its table storage and replication layers, and then present experimental results.

References

Eventually consistent. http://www.allthingsdistributed.com/2007/12/-eventually_consistent.html.Google Scholar
Trading consistency for scalability in distributed architectures. http://www.infoq.com/news/2008/03/ebaybase, 2008.Google Scholar
M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. In SOSP, 2007. Google ScholarDigital Library
P. Bernstein, N. Dani, B. Khessib, R. Manne, and D. Shutt. Data management issues in supporting large-scale web services. IEEE Data Engineering Bulletin, December 2006.Google Scholar
P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987. Google ScholarDigital Library
P. A. Bernstein and N. Goodman. Timestamp-based algorithms for concurrency control in distributed database systems. In Proc. VLDB, 1980. Google ScholarDigital Library
L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and zipf-like distributions: Evidence and implications. In Proc. INFOCOM, 1999.Google ScholarCross Ref
F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006. Google ScholarDigital Library
F. Dabek, M. F. Kaashoek, D. R. Karger, R. Morris, and I. Stoica. Wide-area cooperative storage with CFS. In Proc. SOSP, 2001. Google ScholarDigital Library
K. Daudjee and K. Salem. Lazy database replication with snapshot isolation. In Proc. VLDB, 2006. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarDigital Library
G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In SOSP, 2007. Google ScholarDigital Library
D. J. DeWitt and J. Gray. Parallel database systems: The future of high performance database processing. CACM, 36(6), June 1992. Google ScholarDigital Library
I. Stoica et al. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In Proc. ACM STOC, 1997. Google ScholarDigital Library
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proc. SOSP, 2003. Google ScholarDigital Library
J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993. Google ScholarDigital Library
P. Helland. Life beyond distributed transactions: an apostate's opinion. In Proc. Conference on Innovative Data Systems Research (CIDR), 2007.Google Scholar
Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, and Ion Stoica. Querying the internet with pier. In Proc. VLDB, 2003. Google ScholarDigital Library
D. Kossmann. The state of the art in distributed query processing. ACM Computing Surveys, 32(4):422--469, 2000. Google ScholarDigital Library
J. MacCormick, N. Murphy, M. Najork, C. A. Thekkath, and L. Zhou. Boxwood: Abstractions as the foundation for storage infrastructure. In OSDI, 2004. Google ScholarDigital Library
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. In Proc. SIGMOD, 2008. Google ScholarDigital Library
E. Pacitti, P. Minet, and E. Simon. Fast algorithms for maintaining replica consistency in lazy master replicated databases. In VLDB, 1999. Google ScholarDigital Library
K. Petersen, M. J. Spreitzer, D. B. Terry, M. M. Theimer, and A. J. Demers. Flexible update propagation for weakly consistent replication. In Proc. SOSP, 1997. Google ScholarDigital Library
A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In Middleware, 2001. Google ScholarDigital Library
A. Silberstein, B. F. Cooper, U. Srivastava, E. Vee, R. Yerneni, and R. Ramakrishnan. Efficient bulk insertion into a distributed ordered table. In Proc. SIGMOD, 2008. Google ScholarDigital Library
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proc. SIGCOMM, 2001. Google ScholarDigital Library
S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proc. OSDI, 2006. Google ScholarDigital Library
S. A. Weil, S. A. Brandt, E. L. Miller, and C. Maltzahn. CRUSH: Controlled, scalable, decentralized placement of replicated data. In Proc. Supercomputing (SC), 2006. Google ScholarDigital Library

Index Terms

PNUTS: Yahoo!'s hosted data serving platform
1. Information systems

Recommendations

A batch of PNUTS: experiences connecting cloud batch and serving systems
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Cloud data management systems are growing in prominence, particularly at large Internet companies like Google, Yahoo!, and Amazon, which prize them for their scalability and elasticity. Each of these systems trades off between low-latency serving ...
Read More
PNUTS in Flight: Web-Scale Data Serving at Yahoo

Data management for stateful Web applications is extremely challenging. Applications must scale as they grow in popularity, serve their content with low latency on a global scale, and be highly available, even in the face of hardware failures. This need ...
Read More
Big Data Analytics with R and Hadoop
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 1, Issue 2
August 2008
461 pages
ISSN:2150-8097
Editors:
Peter Buneman,
Beng Chin Ooi,
Kenneth Ross,
Gerald Weber
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2008
Published in pvldb Volume 1, Issue 2
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 308
  Total Citations
  View Citations
- 4,025
  Total Downloads
- Downloads (Last 12 months)389
- Downloads (Last 6 weeks)234
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

A batch of PNUTS: experiences connecting cloud batch and serving systems

PNUTS in Flight: Web-Scale Data Serving at Yahoo

Big Data Analytics with R and Hadoop

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

A batch of PNUTS: experiences connecting cloud batch and serving systems

PNUTS in Flight: Web-Scale Data Serving at Yahoo

Big Data Analytics with R and Hadoop

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media