research-article

YCSB++: benchmarking and performance debugging advanced features in scalable table stores

Authors:
Swapnil Patil

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Milo Polte

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Kai Ren

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Wittawat Tantisiriroj

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Lin Xiao

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Julio López

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Garth Gibson

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Adam Fuchs

National Security Agency

National Security Agency
View Profile

,
Billie Rinaldi

National Security Agency

National Security Agency
View Profile

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud ComputingOctober 2011Article No.: 9Pages 1–14https://doi.org/10.1145/2038916.2038925

Published:26 October 2011Publication History

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

Pages 1–14

ABSTRACT

Inspired by Google's BigTable, a variety of scalable, semi-structured, weak-semantic table stores have been developed and optimized for different priorities such as query speed, ingest speed, availability, and interactivity. As these systems mature, performance benchmarking will advance from measuring the rate of simple workloads to understanding and debugging the performance of advanced features such as ingest speed-up techniques and function shipping filters from client to servers. This paper describes YCSB++, a set of extensions to the Yahoo! Cloud Serving Benchmark (YCSB) to improve performance understanding and debugging of these advanced features. YCSB++ includes multi-tester coordination for increased load and eventual consistency measurement, multi-phase workloads to quantify the consequences of work deferment and the benefits of anticipatory configuration optimization such as B-tree pre-splitting or bulk loading, and abstract APIs for explicit incorporation of advanced features in benchmark tests. To enhance performance debugging, we customized an existing cluster monitoring tool to gather the internal statistics of YCSB++, table stores, system services like HDFS, and operating systems, and to offer easy post-test correlation and reporting of performance behaviors. YCSB++ features are illustrated in case studies of two BigTable-like table stores, Apache HBase and Accumulo, developed to emphasize high ingest rates and finegrained security.

References

Apache Cassandra. http://cassandra.apache.org/.Google Scholar
MongoDB. http://www.mongodb.org/.Google Scholar
Project Voldemort: A distributed database. http://project-voldemort.com/.Google Scholar
A. S. Aiyer, E. Anderson, X. Li, M. A. Shah, and J. J. Wylie. Consistability: Describing usually consistent systems. In Proc. of the 4th Workshop on Hot Topics in Syetms Dependability (HotDep '2008), San Diego, CA, December 2008. Google ScholarDigital Library
A. Barbuzzi, P. Michiardi, E. Biersack, and G. Boggia. Parallel bulk Insertion for large-scale analytics applications. In Proc. of the 4th ACM SIGOPS/SIGACT International Workshop on Large Scale Distributed Systems and Middleware (LADIS '2010), Zurich, Switzerland, July 2010. Google ScholarDigital Library
D. Borthakur. The Hadoop Distributed File System: Architecture and Design. http://hadoop.apache.org/core/docs/r0.16.4/hdfsdesign.html.Google Scholar
E. A. Brewer. Towards robust distributed systems. Keynote at the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC '2000) on July 19, 2000 in Portland OR. Google ScholarDigital Library
M. Cafarella, E. Chang, A. Fikes, A. Halevy, W. Hsieh, A. Lerner, J. Madhavan, and S. Muthukrishnan. Data Management Projects at Google. SIGMOD Record, 37(1), 2008. Google ScholarDigital Library
Cassandra. Cassandra's Binary Memtable. http://wiki.apache.org/cassandra/BinaryMemtable.Google Scholar
Cassandra. Cassandra's Extensible Authentication/Authorization. http://wiki.apache.org/cassandra/ExtensibleAuth.Google Scholar
R. Cattell. Scalable SQL and NoSQL Data Stores. http://www.cattell.net/datastores/Datastores.pdf.Google Scholar
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Proc. of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI '2006), Seattle, WA, November 2006. Google ScholarDigital Library
Collectd: The system statistics collection daemon. http://collectd.org/.Google Scholar
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In Proc. of the 1st ACM Symposium on Cloud Computing (SOCC '2010), Indianapolis, IN, June 2010. Google ScholarDigital Library
J. Dean. Designs, Lessons and Advice from Building Large Distributed Systems. Keynote at the 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (LADIS '2009) on October 11, 2009 - http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf.Google Scholar
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proc. of the 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI '2004), San Francisco, CA, December 2004. Google ScholarDigital Library
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-Value Store. In Proc. of the 21st ACM Symposium on Operating Systems Principles (SOSP '2007), Stevenson, WA, October 2007. Google ScholarDigital Library
D. J. Dewitt and J. Gray. Parallel database systems: the future of high performance database systems. Communications of the ACM, 35(6), 1992. Google ScholarDigital Library
A. Fekete and K. Ramamritham. Consistency Models for Replicated Data. In Replication, volume 5959 of Lecture Notes in Computer Science, 2010. Google ScholarDigital Library
A. Fikes. Storage Architecture and Challenges. Talk at the Google Faculty Summit 2010 on July 29, 2010.Google Scholar
R. Geambasu, A. A. Levy, T. Kohno, A. Krishnamurthy, and H. M. Levy. Comet: An Active Distributed Key-Value Store. In Proc. of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI '2010), Vancouver, Canada, October 2010. Google ScholarDigital Library
S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2), 2002. Google ScholarDigital Library
G. Graefe. Partitioned B-trees: A user's guide. In Proc. of the 10th Conference on Database Systems for Business, Technology and Web (BTW '2003), Leipzig, Germany, February 2003.Google Scholar
G. Graefe. B-tree indexes for high update rates. SIGMOD Record, 35(1), 2006. Google ScholarDigital Library
G. Graefe and H. Kuno. Fast Loads and Queries. In Transactions on Large-Scale Data- and Knowledge-Centered Systems II, volume 6380 of Lecture Notes in Computer Science, 2010. Google ScholarDigital Library
Hadoop. Apache Hadoop. http://hadoop.apache.org/.Google Scholar
HBase. Apache HBase. http://hbase.apache.org/.Google Scholar
HBase. HBase - Bulk Loads in HBase. http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html.Google Scholar
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In Proc. of the 2010 USENIX Annual Technical Conference (USENIX ATC '2010), Boston, MA, June 2010. Google ScholarDigital Library
E. Kootz. The HBase Blog -- Secure HBase: Access Controls. http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/.Google Scholar
T. Kraska, M. Hentschel, G. Alonso, and D. Kossmann. Consistency Rationing in the Cloud: Pay only when it matters. Proc. of the VLDB Endowment, 2(1), 2009. Google ScholarDigital Library
M. Lai. HBase Coprocessors. http://hbaseblog.com/2010/11/30/hbase-coprocessors/.Google Scholar
A. Lakshman and P. Malik. Cassandra -- A Decentralized Structured Storage System. In Proc. of the 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (LADIS '2009), Big Sky, MT, October 2009.Google Scholar
A. Li, X. Yang, S. Kandula, and M. Zhang. CloudCmp: Comparing Public Cloud Providers. In Proc. of the 9th ACM SIGCOMM Conference on Internet Measurement (IMC '2009), Chicago, IL, November 2009. Google ScholarDigital Library
Lily. Bulk Imports in Lily. http://docs.outerthought.org/lily-docs-current/438-lily.html.Google Scholar
H. Liu. The cost of eventual consistency. http://huanliu.wordpress.com/2010/03/03/the-cost-of-eventual-consistency/.Google Scholar
M. L. Massie, B. N. Chun, and D. E. Culler. The Ganglia Distributed Monitoring System: Design, Implementation And Experience. Parallel Computing, 30(7), 2004.Google Scholar
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4), 1996. Google ScholarDigital Library
G. Pohl and M. Renner. Munin: Graphisches Netzwerk-und System-Monitoring. Open Source Press, 2008.Google Scholar
A. Purtell. Coprocessors: Support small query language as filter on server side. https://issues.apache.org/jira/browse/HBASE-1002.Google Scholar
K. Ren, J. López, and G. Gibson. Otus: Resource Attribution in Data-Intensive Clusters. In Proc. of the 2nd International Workshop on MapReduce and its Applications (MapReduce '2011), San Jose, CA, June 2011. Google ScholarDigital Library
E. Riedel, C. Faloutsos, G. Gibson, and D. Nagle. Active Disks for Large-Scale Data Processing. IEEE Computer, 34(6), 2001. Google ScholarDigital Library
G. Robidoux. Minimally Logging Bulk Load Inserts into SQL Server. http://www.mssqltips.com/tip.asp?tip=1185.Google Scholar
M. Rosenblum and J. K. Ousterhout. The Design and Implementation of a Log-Structured File System. ACM Transactions on Computer Systems (TOCS), 10(1), August 1992. Google ScholarDigital Library
SciDB. Use Cases for SciDB. http://www.scidb.org/use/.Google Scholar
M. Seltzer. Beyond Relational Databases. Communications of the ACM, 51(7), 2008. Google ScholarDigital Library
A. Silberstein, B. F. Cooper, U. Srivastava, E. Vee, R. Yerncni, and R. Ramakrishnan. Efficient Bulk Insertions into a Distributed Ordered Table. In Proc. of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '2008), Vancouver, BC, Canada, June 2008. Google ScholarDigital Library
M. Stonebraker, D. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. R. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-Store: A Column Oriented DBMS. In VLDB, 2005. Google ScholarDigital Library
TokuTek. Fractal Tree Indexing in TokuDB. http://tokutek.com/technology/.Google Scholar
W. Vogels. Eventually Consistent. ACM Queue, 6(6), 2008. Google ScholarDigital Library
H. Wada, A. Fekete, L. Zhao, K. Lee, and A. Liu. Data Consistency Properties and the Trade-offs in Commercial Cloud Storages: the Consumers' Perspective. In Proc. of the 5th Biennial Conference on Innovative Data Systems Research (CIDR '2011), Asilomar, CA, January 2011.Google Scholar
ZooKeeper. Apache ZooKeeper. http://zookeeper.apache.org/.Google Scholar

Index Terms

YCSB++: benchmarking and performance debugging advanced features in scalable table stores
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs
  2. Information retrieval
    1. Evaluation of retrieval results
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Benchmarking cloud serving systems with YCSB
SoCC '10: Proceedings of the 1st ACM symposium on Cloud computing

While the use of MapReduce systems (such as Hadoop) for large scale data analysis has been widely recognized and studied, we have recently seen an explosion in the number of systems developed for cloud data serving. These newer systems address "cloud ...
Read More
A Read-Optimized Index Structure for Distributed Log-Structured Key-Value Store
COMPSAC '15: Proceedings of the 2015 IEEE 39th Annual Computer Software and Applications Conference - Volume 03

Recently, Big Data processing is becoming a necessary technique to efficiently store, manage, and analyze massive data obtained by social media contents. NoSQL is one of databases that efficiently handle Big Data compared to the traditional database ...
Read More
Testing Cloud Benchmark Scalability with Cassandra
SERVICES '14: Proceedings of the 2014 IEEE World Congress on Services

NoSQL databases were developed as highly scalable databases that allow easy data distribution over a number of servers. With the increased interest of researchers and companies in non-relational technology, NoSQL databases became widely used and a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing
October 2011
377 pages
ISBN:9781450309769
DOI:10.1145/2038916
Program Chairs:
Jeffrey S. Chase
Duke University
,
Amr El Abbadi
Univ of California, Santa Barbara
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
NoSQL
YCSB
benchmarking
scalable table stores
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate169of722submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 137
  Total Citations
  View Citations
- 1,440
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

YCSB++: benchmarking and performance debugging advanced features in scalable table stores

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Benchmarking cloud serving systems with YCSB

A Read-Optimized Index Structure for Distributed Log-Structured Key-Value Store

Testing Cloud Benchmark Scalability with Cassandra

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

YCSB++: benchmarking and performance debugging advanced features in scalable table stores

SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Benchmarking cloud serving systems with YCSB

A Read-Optimized Index Structure for Distributed Log-Structured Key-Value Store

Testing Cloud Benchmark Scalability with Cassandra

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media