Abstract
Key-Value Stores (KVS) are becoming increasingly popular because they scale up and down elastically, sustain high throughputs for get/put workloads and have low latencies. KVS owe these advantages to their simplicity. This simplicity, however, comes at a cost: It is expensive to process complex, analytical queries on top of a KVS because today's generation of KVS does not support an efficient way to scan the data. The problem is that there are conflicting goals when designing a KVS for analytical queries and for simple get/put workloads: Analytical queries require high locality and a compact representation of data whereas elastic get/put workloads require sparse indexes. This paper shows that it is possible to have it all, with reasonable compromises. We studied the KVS design space and built TellStore, a distributed KVS, that performs almost as well as state-of-the-art KVS for get/put workloads and orders of magnitude better for analytical and mixed workloads. This paper presents the results of comprehensive experiments with an extended version of the YCSB benchmark and a workload from the telecommunication industry.
- Tellstore open-source project. https://github.com/tellproject/tellstore.Google Scholar
- D. J. Abadi, P. A. Boncz, and S. Harizopoulos. Column-oriented database systems. PVLDB, 2(2):1664--1665, 2009. Google ScholarDigital Library
- D. J. Abadi, S. R. Madden, and N. Hachem. Column-stores vs. row-stores: how different are they really? In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 967--980. ACM, 2008. Google ScholarDigital Library
- D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. R. Madden. Materialization strategies in a column-oriented DBMS. In Proceedings of the 23rd International Conference on Data Engineering, pages 466--475, 2007.Google ScholarCross Ref
- A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving Relations for Cache Performance. In VLDB, pages 169--180, 2001. Google ScholarDigital Library
- H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil. A critique of ansi sql isolation levels. In ACM SIGMOD Record, volume 24, pages 1--10. ACM, 1995. Google ScholarDigital Library
- P. Bernstein, C. Reid, and S. Das. Hyder - a transactional record manager for shared flash. CIDR'11, pages 9--20, 2011.Google Scholar
- P. Boncz, T. Grust, M. Van Keulen, S. Manegold, J. Rittinger, and J. Teubner. MonetDB/XQuery: a fast XQuery processor powered by a relational engine. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 479--490, 2006. Google ScholarDigital Library
- P. A. Boncz, S. Manegold, and M. L. Kersten. Database architecture optimized for the new bottleneck: Memory access. In VLDB, volume 99, pages 54--65, 1999. Google ScholarDigital Library
- L. Braun, T. Etter, G. Gasparis, M. Kaufmann, D. Kossmann, D. Widmer, A. Avitzur, A. Iliopoulos, E. Levy, and N. Liang. Analytics in motion: High performance event-processing and real-time analytics in the same database. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 251--264, 2015. Google ScholarDigital Library
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, pages 143--154, 2010. Google ScholarDigital Library
- B. Dageville, T. Cruanes, M. Zukowski, V. Antonov, A. Avanes, J. Bock, J. Claybaugh, D. Engovatov, M. Hentschel, J. Huang, et al. The snowflake elastic data warehouse. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data, pages 215--226, 2016. Google ScholarDigital Library
- C. Diaconu, C. Freedman, E. Ismert, P.-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: SQL Server's memory-optimized OLTP engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 1243--1254, 2013. Google ScholarDigital Library
- A. Dragojević, D. Narayanan, E. B. Nightingale, M. Renzelmann, A. Shamis, A. Badam, and M. Castro. No compromises: Distributed transactions with consistency, availability, and performance. In Proceedings of the 25th Symposium on Operating Systems Principles, pages 54--70, 2015. Google ScholarDigital Library
- Facebook. Presto. http://prestodb.io. May. 02, 2016.Google Scholar
- Facebook. RocksDB. http://rocksdb.org. May. 02, 2016.Google Scholar
- F. Färber, S. K. Cha, J. Primsch, C. Bornhövd, S. Sigg, and W. Lehner. SAP HANA database: Data management for modern business applications. SIGMOD Rec., 40(4):45--51, Jan. 2012. Google ScholarDigital Library
- F. Färber et al. The SAP HANA Database - An Architecture Overview. IEEE Data Eng. Bull., 35(1), 2012.Google Scholar
- A. Foundation. Kudu. http://getkudu.io. May. 02, 2016.Google Scholar
- A. Foundation. HBase. http://hbase.apache.org/. May. 27, 2017.Google Scholar
- FoundationDB. https://foundationdb.com/. Feb. 07, 2015.Google Scholar
- G. Giannikis, G. Alonso, and D. Kossmann. SharedDB: Killing one thousand queries with one stone. PVLDB, 5(6):526--537, Feb. 2012. Google ScholarDigital Library
- Google. LevelDB. http://leveldb.org. May. 02, 2016.Google Scholar
- J. Gray and A. Reuter. Transaction processing. Morgan Kaufíann Publishers, 1993.Google Scholar
- A. Halverson, J. L. Beckmann, J. F. Naughton, and D. J. Dewitt. A comparison of c-store and row-store in a common framework. University of Wisconsin-Madison, Tech. Rep. TR1570, 2006.Google Scholar
- A. Kemper and T. Neumann. HyPer: A hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. In ICDE, pages 195--206, 2011. Google ScholarDigital Library
- Y. Klonatos, C. Koch, T. Rompf, and H. Chafi. Building efficient query engines in a high-level language. PVLDB, 7(10):853--864, June 2014. Google ScholarDigital Library
- A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35--40, 2010. Google ScholarDigital Library
- J. J. Levandoski, D. B. Lomet, S. Sengupta, R. Stutsman, and R. Wang. High performance transactions in deuteronomy. In CIDR, 2015.Google Scholar
- S. Loesing, M. Pilman, T. Etter, and D. Kossmann. On the design and scalability of distributed shared-data databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 663--676, 2015. Google ScholarDigital Library
- MemSQL. http://www.memsql.com/. May. 02, 2016.Google Scholar
- MongoDB. http://mongodb.com/. May. 27, 2017.Google Scholar
- T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539--550, June 2011. Google ScholarDigital Library
- J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, D. Ongaro, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for ramcloud. Commun. ACM, 54(7):121--130, July 2011. Google ScholarDigital Library
- P. ONeil, E. Cheng, D. Gawlick, and E. ONeil. The log-structured merge-tree (lsm-tree). Acta Informatica, 33(4):351--385, 1996. Google ScholarDigital Library
- M. Pilman. Tell: An Elastic Database System for mixed Workloads. PhD thesis, ETH Zürich, Under Submission, 2017.Google Scholar
- C. Purcell and T. Harris. Non-blocking hashtables with open addressing. In Proceedings of the 19th International Conference on Distributed Computing, pages 108--121, 2005. Google ScholarDigital Library
- L. Qiao, V. Raman, F. Reiss, P. J. Haas, and G. M. Lohman. Main-memory scan sharing for multi-core cpus. PVLDB, 1(1):610--621, 2008. Google ScholarDigital Library
- V. Raman, G. Attaluri, R. Barber, N. Chainani, D. Kalmuk, V. KulandaiSamy, J. Leenstra, S. Lightstone, S. Liu, G. M. Lohman, et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013. Google ScholarDigital Library
- M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst., 10(1):26--52, Feb. 1992. Google ScholarDigital Library
- S. M. Rumble, A. Kejriwal, and J. Ousterhout. Log-structured memory for DRAM-based storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies, pages 1--16, 2014. Google ScholarDigital Library
- D. Shukla, S. Thota, K. Raman, M. Gajendran, A. Shah, S. Ziuzin, K. Sundaram, M. G. Guajardo, A. Wawrzyniak, S. Boshra, et al. Schema-agnostic indexing with Azure DocumentDB. PVLDB, 8(12):1668--1679, 2015. Google ScholarDigital Library
- K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop distributed file system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, pages 1--10, 2010. Google ScholarDigital Library
- I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pages 149--160, 2001. Google ScholarDigital Library
- M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, et al. C-store: a column-oriented dbms. In Proceedings of the 31st international conference on Very large data bases, pages 553--564, 2005. Google ScholarDigital Library
- P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable performance for unpredictable workloads. PVLDB, 2(1):706--717, 2009. Google ScholarDigital Library
- T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-Scan: ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, 2009. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pages 10--17, 2010. Google ScholarDigital Library
- J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 145--156, 2002. Google ScholarDigital Library
- M. Zukowski, S. Héman, N. Nes, and P. Boncz. Cooperative scans: Dynamic bandwidth sharing in a DBMS. In Proceedings of the 33rd International Conference on Very Large Data Bases, pages 723--734, 2007. Google ScholarDigital Library
Index Terms
- Fast scans on key-value stores
Recommendations
Building Efficient Key-Value Stores via a Lightweight Compaction Tree
Special Issue on MSST 2017 and Regular PapersLog-Structure Merge tree (LSM-tree) has been one of the mainstream indexes in key-value systems supporting a variety of write-intensive Internet applications in today’s data centers. However, the performance of LSM-tree is seriously hampered by ...
Towards Read-Intensive Key-Value Stores with Tidal Structure Based on LSM-Tree
ASPDAC '20: Proceedings of the 25th Asia and South Pacific Design Automation ConferenceKey-value store has played a critical role in many large-scale data storage applications. The log-structured merge-tree (LSM-tree) based key-value store achieves excellent performance on write-intensive workloads which is mainly benefited from the ...
Optimizing key-value stores for hybrid storage architectures
CASCON '14: Proceedings of 24th Annual International Conference on Computer Science and Software EngineeringFlash-based solid state drives (SSDs) are increasingly becoming a popular choice as a storage device within database management systems and key-value stores alike. SSDs offer fast throughput and low latency access to data, but their price-per-byte cost ...
Comments