skip to main content
10.1145/2038916.2038925acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

YCSB++: benchmarking and performance debugging advanced features in scalable table stores

Published:26 October 2011Publication History

ABSTRACT

Inspired by Google's BigTable, a variety of scalable, semi-structured, weak-semantic table stores have been developed and optimized for different priorities such as query speed, ingest speed, availability, and interactivity. As these systems mature, performance benchmarking will advance from measuring the rate of simple workloads to understanding and debugging the performance of advanced features such as ingest speed-up techniques and function shipping filters from client to servers. This paper describes YCSB++, a set of extensions to the Yahoo! Cloud Serving Benchmark (YCSB) to improve performance understanding and debugging of these advanced features. YCSB++ includes multi-tester coordination for increased load and eventual consistency measurement, multi-phase workloads to quantify the consequences of work deferment and the benefits of anticipatory configuration optimization such as B-tree pre-splitting or bulk loading, and abstract APIs for explicit incorporation of advanced features in benchmark tests. To enhance performance debugging, we customized an existing cluster monitoring tool to gather the internal statistics of YCSB++, table stores, system services like HDFS, and operating systems, and to offer easy post-test correlation and reporting of performance behaviors. YCSB++ features are illustrated in case studies of two BigTable-like table stores, Apache HBase and Accumulo, developed to emphasize high ingest rates and finegrained security.

References

  1. Apache Cassandra. http://cassandra.apache.org/.Google ScholarGoogle Scholar
  2. MongoDB. http://www.mongodb.org/.Google ScholarGoogle Scholar
  3. Project Voldemort: A distributed database. http://project-voldemort.com/.Google ScholarGoogle Scholar
  4. A. S. Aiyer, E. Anderson, X. Li, M. A. Shah, and J. J. Wylie. Consistability: Describing usually consistent systems. In Proc. of the 4th Workshop on Hot Topics in Syetms Dependability (HotDep '2008), San Diego, CA, December 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Barbuzzi, P. Michiardi, E. Biersack, and G. Boggia. Parallel bulk Insertion for large-scale analytics applications. In Proc. of the 4th ACM SIGOPS/SIGACT International Workshop on Large Scale Distributed Systems and Middleware (LADIS '2010), Zurich, Switzerland, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Borthakur. The Hadoop Distributed File System: Architecture and Design. http://hadoop.apache.org/core/docs/r0.16.4/hdfsdesign.html.Google ScholarGoogle Scholar
  7. E. A. Brewer. Towards robust distributed systems. Keynote at the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC '2000) on July 19, 2000 in Portland OR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Cafarella, E. Chang, A. Fikes, A. Halevy, W. Hsieh, A. Lerner, J. Madhavan, and S. Muthukrishnan. Data Management Projects at Google. SIGMOD Record, 37(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cassandra. Cassandra's Binary Memtable. http://wiki.apache.org/cassandra/BinaryMemtable.Google ScholarGoogle Scholar
  10. Cassandra. Cassandra's Extensible Authentication/Authorization. http://wiki.apache.org/cassandra/ExtensibleAuth.Google ScholarGoogle Scholar
  11. R. Cattell. Scalable SQL and NoSQL Data Stores. http://www.cattell.net/datastores/Datastores.pdf.Google ScholarGoogle Scholar
  12. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: A Distributed Storage System for Structured Data. In Proc. of the 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI '2006), Seattle, WA, November 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Collectd: The system statistics collection daemon. http://collectd.org/.Google ScholarGoogle Scholar
  14. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In Proc. of the 1st ACM Symposium on Cloud Computing (SOCC '2010), Indianapolis, IN, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Dean. Designs, Lessons and Advice from Building Large Distributed Systems. Keynote at the 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (LADIS '2009) on October 11, 2009 - http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf.Google ScholarGoogle Scholar
  16. J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proc. of the 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI '2004), San Francisco, CA, December 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-Value Store. In Proc. of the 21st ACM Symposium on Operating Systems Principles (SOSP '2007), Stevenson, WA, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. J. Dewitt and J. Gray. Parallel database systems: the future of high performance database systems. Communications of the ACM, 35(6), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Fekete and K. Ramamritham. Consistency Models for Replicated Data. In Replication, volume 5959 of Lecture Notes in Computer Science, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Fikes. Storage Architecture and Challenges. Talk at the Google Faculty Summit 2010 on July 29, 2010.Google ScholarGoogle Scholar
  21. R. Geambasu, A. A. Levy, T. Kohno, A. Krishnamurthy, and H. M. Levy. Comet: An Active Distributed Key-Value Store. In Proc. of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI '2010), Vancouver, Canada, October 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Graefe. Partitioned B-trees: A user's guide. In Proc. of the 10th Conference on Database Systems for Business, Technology and Web (BTW '2003), Leipzig, Germany, February 2003.Google ScholarGoogle Scholar
  24. G. Graefe. B-tree indexes for high update rates. SIGMOD Record, 35(1), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Graefe and H. Kuno. Fast Loads and Queries. In Transactions on Large-Scale Data- and Knowledge-Centered Systems II, volume 6380 of Lecture Notes in Computer Science, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hadoop. Apache Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  27. HBase. Apache HBase. http://hbase.apache.org/.Google ScholarGoogle Scholar
  28. HBase. HBase - Bulk Loads in HBase. http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html.Google ScholarGoogle Scholar
  29. P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In Proc. of the 2010 USENIX Annual Technical Conference (USENIX ATC '2010), Boston, MA, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. Kootz. The HBase Blog -- Secure HBase: Access Controls. http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/.Google ScholarGoogle Scholar
  31. T. Kraska, M. Hentschel, G. Alonso, and D. Kossmann. Consistency Rationing in the Cloud: Pay only when it matters. Proc. of the VLDB Endowment, 2(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Lai. HBase Coprocessors. http://hbaseblog.com/2010/11/30/hbase-coprocessors/.Google ScholarGoogle Scholar
  33. A. Lakshman and P. Malik. Cassandra -- A Decentralized Structured Storage System. In Proc. of the 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (LADIS '2009), Big Sky, MT, October 2009.Google ScholarGoogle Scholar
  34. A. Li, X. Yang, S. Kandula, and M. Zhang. CloudCmp: Comparing Public Cloud Providers. In Proc. of the 9th ACM SIGCOMM Conference on Internet Measurement (IMC '2009), Chicago, IL, November 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Lily. Bulk Imports in Lily. http://docs.outerthought.org/lily-docs-current/438-lily.html.Google ScholarGoogle Scholar
  36. H. Liu. The cost of eventual consistency. http://huanliu.wordpress.com/2010/03/03/the-cost-of-eventual-consistency/.Google ScholarGoogle Scholar
  37. M. L. Massie, B. N. Chun, and D. E. Culler. The Ganglia Distributed Monitoring System: Design, Implementation And Experience. Parallel Computing, 30(7), 2004.Google ScholarGoogle Scholar
  38. P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. G. Pohl and M. Renner. Munin: Graphisches Netzwerk-und System-Monitoring. Open Source Press, 2008.Google ScholarGoogle Scholar
  40. A. Purtell. Coprocessors: Support small query language as filter on server side. https://issues.apache.org/jira/browse/HBASE-1002.Google ScholarGoogle Scholar
  41. K. Ren, J. López, and G. Gibson. Otus: Resource Attribution in Data-Intensive Clusters. In Proc. of the 2nd International Workshop on MapReduce and its Applications (MapReduce '2011), San Jose, CA, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. E. Riedel, C. Faloutsos, G. Gibson, and D. Nagle. Active Disks for Large-Scale Data Processing. IEEE Computer, 34(6), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. G. Robidoux. Minimally Logging Bulk Load Inserts into SQL Server. http://www.mssqltips.com/tip.asp?tip=1185.Google ScholarGoogle Scholar
  44. M. Rosenblum and J. K. Ousterhout. The Design and Implementation of a Log-Structured File System. ACM Transactions on Computer Systems (TOCS), 10(1), August 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. SciDB. Use Cases for SciDB. http://www.scidb.org/use/.Google ScholarGoogle Scholar
  46. M. Seltzer. Beyond Relational Databases. Communications of the ACM, 51(7), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. A. Silberstein, B. F. Cooper, U. Srivastava, E. Vee, R. Yerncni, and R. Ramakrishnan. Efficient Bulk Insertions into a Distributed Ordered Table. In Proc. of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '2008), Vancouver, BC, Canada, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. Stonebraker, D. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. R. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, and S. Zdonik. C-Store: A Column Oriented DBMS. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. TokuTek. Fractal Tree Indexing in TokuDB. http://tokutek.com/technology/.Google ScholarGoogle Scholar
  50. W. Vogels. Eventually Consistent. ACM Queue, 6(6), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. H. Wada, A. Fekete, L. Zhao, K. Lee, and A. Liu. Data Consistency Properties and the Trade-offs in Commercial Cloud Storages: the Consumers' Perspective. In Proc. of the 5th Biennial Conference on Innovative Data Systems Research (CIDR '2011), Asilomar, CA, January 2011.Google ScholarGoogle Scholar
  52. ZooKeeper. Apache ZooKeeper. http://zookeeper.apache.org/.Google ScholarGoogle Scholar

Index Terms

  1. YCSB++: benchmarking and performance debugging advanced features in scalable table stores

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SOCC '11: Proceedings of the 2nd ACM Symposium on Cloud Computing
          October 2011
          377 pages
          ISBN:9781450309769
          DOI:10.1145/2038916

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 October 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate169of722submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader