ABSTRACT
The emergence of Intel's Optane DC persistent memory (Optane Pmem) draws much interest in building persistent key-value (KV) stores to take advantage of its high throughput and low latency. A major challenge in the efforts stems from the fact that Optane Pmem is essentially a hybrid storage device with two distinct properties. On one hand, it is a high-speed byte-addressable device similar to DRAM. On the other hand, the write to the Optane media is conducted at the unit of 256 bytes, much like a block storage device. Existing KV store designs for persistent memory do not take into account of the latter property, leading to high write amplification and constraining both write and read throughput. In the meantime, a direct re-use of a KV store design intended for block devices, such as LSM-based ones, would cause much higher read latency due to the former property.
In this paper, we propose ChameleonDB, a KV store design specifically for this important hybrid memory/storage device by considering and exploiting these two properties in one design. It uses LSM tree structure to efficiently admit writes with low write amplification. It uses an in-DRAM hash table to bypass LSM-tree's multiple levels for fast reads. In the meantime, ChameleonDB may choose to opportunistically maintain the LSM multi-level structure in the background to achieve short recovery time after a system crash. ChameleonDB's hybrid structure is designed to be able to absorb sudden bursts of a write workload, which helps avoid long-tail read latency.
Our experiment results show that ChameleonDB improves write throughput by 3.3× and reduces read latency by around 60% compared with a legacy LSM-tree based KV store design. ChameleonDB provides performance competitive even with KV stores using fully in-DRAM index by using much less DRAM space. Compared with CCEH, a persistent hash table design, ChameleonDB provides 6.4× higher write throughput.
- Shimin Chen and Qin Jin. 2015. Persistent B+-Trees in Non-Volatile Main Memory. Proc. VLDB Endow. 8, 7 (Feb. 2015), 786--797. Google ScholarCross Ref
- Youmin Chen, Youyou Lu, Fan Yang, Qing Wang, Yang Wang, and Jiwu Shu. 2020. FlatStore: An Efficient Log-Structured Key-Value Storage Engine for Persistent Memory. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'20). Association for Computing Machinery, New York, NY, USA, 1077--1091. Google ScholarDigital Library
- Brian Choi, Parv Saxena, Ryan Huang, and Randal Burns. 2020. Observations on Porting In-memory KV stores to Persistent Memory. arXiv:cs.DB/2002.02017Google Scholar
- Alexander Conway, Abhishek Gupta, Vijay Chidambaram, Martin Farach-Colton, Richard Spillane, Amy Tai, and Rob Johnson. 2020. SplinterDB: Closing the Bandwidth Gap for NVMe Key-Value Stores. In Proceedings of the 2020 USENIX Annual Technical Conference (ATC'20). USENIX Association, 49--63. https://www.usenix.org/conference/atc20/presentation/conwayGoogle Scholar
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SOCC'10). Association for Computing Machinery, New York, NY, USA, 143--154. Google ScholarDigital Library
- Intel Corporation. 2019. Intel® Optane™ Persistent Memory 128GB Module. https://www.intel.com/content/www/us/en/products/memory-storage/optane-dc-persistent-memory/optane-dc-128gb-persistent-memory-module.htmlGoogle Scholar
- Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal Navigable Key-Value Store. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD'17). Association for Computing Machinery, New York, NY, USA, 79--94. Google ScholarDigital Library
- Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD'18). Association for Computing Machinery, New York, NY, USA, 505--520. Google ScholarDigital Library
- Biplob Debnath, Alireza Haghdoost, Asim Kadav, Mohammed G. Khatib, and Cristian Ungureanu. 2015. Revisiting Hash Table Design for Phase Change Memory. In Proceedings of the 3rd Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads (INFLOW'15). Association for Computing Machinery, New York, NY, USA, Article 1, 9 pages. Google ScholarDigital Library
- Assaf Eisenman, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, and Sachin Katti. 2018. Reducing DRAM Footprint with NVM in Facebook. In Proceedings of the 13th EuroSys Conference (EuroSys'18). Association for Computing Machinery, New York, NY, USA, Article 42, 13 pages. Google ScholarDigital Library
- Facebook. 2020. RocksDB: a persistent key-value store. https://rocksdb.org/Google Scholar
- P. Flajolet, P. Poblete, and A. Viola. 1998. On the Analysis of Linear Probing Hashing. Algorithmica 22, 4 (1998), 490--515. Google ScholarCross Ref
- Google. 2020. LevelDB: a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. https://github.com/google/leveldbGoogle Scholar
- Yihe Huang, Matej Pavlovic, Virendra Marathe, Margo Seltzer, Tim Harris, and Steve Byan. 2018. Closing the Performance Gap Between Volatile and Persistent Key-Value Stores Using Cross-Referencing Logs. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC'18). USENIX Association, Boston, MA, 967--979. https://www.usenix.org/conference/atc18/presentation/huangGoogle Scholar
- Deukyeon Hwang, Wook-Hee Kim, Youjip Won, and Beomseok Nam. 2018. Endurable Transient Inconsistency in Byte-Addressable Persistent B+-Tree. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST'18). USENIX Association, Oakland, CA, 187--200. https://www.usenix.org/conference/fast18/presentation/hwangGoogle Scholar
- Junsu Im, Jinwook Bae, Chanwoo Chung, Arvind, and Sungjin Lee. 2020. PinK: High-speed In-storage Key-value Store with Bounded Tails. In Proceedings of the 2020 USENIX Annual Technical Conference (ATC'20). USENIX Association, 173--187. https://www.usenix.org/conference/atc20/presentation/imGoogle Scholar
- Intel. 2020. Intel® Optane™ SSD 9 Series. https://www.intel.com/content/www/us/en/products/memory-storage/solid-state-drives/consumer-ssds/optane-ssd-9-series.html.Google Scholar
- Micron Intel. 2020. 3D XPoint Technology. https://www.micron.com/products/advanced-solutions/3d-xpoint-technology.Google Scholar
- Olzhas Kaiyrakhmet, Songyi Lee, Beomseok Nam, Sam H. Noh, and Young ri Choi. 2019. SLM-DB: Single-Level Key-Value Store with Persistent Memory. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST'19). USENIX Association, Boston, MA, 191--205. https://www.usenix.org/conference/fast19/presentation/kaiyrakhmetGoogle Scholar
- Sudarsun Kannan. 2021. NoveLSM. https://github.com/sudarsunkannan/lsm_nvmGoogle Scholar
- Sudarsun Kannan, Nitish Bhat, Ada Gavrilovska, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. Redesigning LSMs for Nonvolatile Memory with NoveLSM. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC'18). USENIX Association, Boston, MA, 993--1005. https://www.usenix.org/conference/atc18/presentation/kannanGoogle Scholar
- Kornilios Kourtis, Nikolas Ioannou, and Ioannis Koltsidas. 2019. Reaping the performance of fast NVM storage with uDepot. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST'19). USENIX Association, Boston, MA, 1--15. https://www.usenix.org/conference/fast19/presentation/kourtisGoogle ScholarDigital Library
- Avinash Lakshman and Prashant Malik. 2010. Cassandra: A Decentralized Structured Storage System. SIGOPS Oper. Syst. Rev. 44, 2 (April 2010), 35--40. Google ScholarDigital Library
- Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. 2019. KVell: The Design and Implementation of a Fast Persistent Key-Value Store. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP'19). Association for Computing Machinery, New York, NY, USA, 447--461. Google ScholarDigital Library
- Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. WiscKey: Separating Keys from Values in SSD-Conscious Storage. ACM Trans. Storage 13, 1, Article 5 (March 2017), 28 pages. Google ScholarDigital Library
- martinus. 2020. robin_hood unordered map & set. https://github.com/martinus/robin-hood-hashingGoogle Scholar
- Fei Mei, Qiang Cao, Hong Jiang, and Jingjun Li. 2018. SifrDB: A Unified Solution for Write-Optimized Key-Value Stores in Large Datacenter. In Proceedings of the ACM Symposium on Cloud Computing 2018 (SoCC'18). Association for Computing Machinery, New York, NY, USA, 477--489. Google ScholarDigital Library
- Moohyeon Nam, Hokeun Cha, Young ri Choi, Sam H. Noh, and Beomseok Nam. 2019. Write-Optimized Dynamic Hashing for Persistent Memory. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST'19). USENIX Association, Boston, MA, 31--44. https://www.usenix.org/conference/fast19/presentation/namGoogle ScholarDigital Library
- PDS-Lab. 2021. MatrixKV. https://github.com/PDS-Lab/MatrixKVGoogle Scholar
- pmem.io. 2020. Persistent Memory Programming. https://pmem.io/Google Scholar
- Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building Key-Value Stores Using Fragmented Log-Structured Merge Trees. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP'17). Association for Computing Machinery, New York, NY, USA, 497--514. Google ScholarDigital Library
- Samsung. 2020. Z-SSD | Samsung Semiconductor Global Website. https://www.samsung.com/semiconductor/ssd/z-ssd/.Google Scholar
- Xingbo Wu, Zili Shao, and Song Jiang. 2015. Selfie: Co-Locating Metadata and Data to Enable Fast Virtual Block Devices. In Proceedings of the 8th ACM International Systems and Storage Conference (SYSTOR'15). Association for Computing Machinery, New York, NY, USA, Article 2, 11 pages. Google ScholarDigital Library
- Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items. In Proceedings of the 2015 USENIX Annual Technical Conference (ATC'15). USENIX Association, Santa Clara, CA, 71--82. https://www.usenix.org/conference/atc15/technical-session/presentation/wuGoogle Scholar
- Yinjun Wu, Kwanghyun Park, Rathijit Sen, Brian Kroth, and Jaeyoung Do. 2020. Lessons learned from the early performance evaluation of Intel Optane DC Persistent Memory in DBMS. arXiv:cs.DB/2005.07658Google Scholar
- Fei Xia, Dejun Jiang, Jin Xiong, and Ninghui Sun. 2017. HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems. In Proceedings of the 2017 USENIX Annual Technical Conference (ATC'17). USENIX Association, Santa Clara, CA, 349--362. https://www.usenix.org/conference/atc17/technical-sessions/presentation/xiaGoogle Scholar
- Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steve Swanson. 2020. An Empirical Guide to the Behavior and Use of Scalable Persistent Memory. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST'20). USENIX Association, Santa Clara, CA, 169--182. https://www.usenix.org/conference/fast20/presentation/yangGoogle ScholarDigital Library
- Ting Yao, Yiwen Zhang, Jiguang Wan, Qiu Cui, Liu Tang, Hong Jiang, Changsheng Xie, and Xubin He. 2020. MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with Matrix Container in NVM. In Proceedings of the 2020 USENIX Annual Technical Conference (ATC'20). USENIX Association, 17--31. https://www.usenix.org/conference/atc20/presentation/yaoGoogle Scholar
- Teng Zhang, Jianying Wang, Xuntao Cheng, Hao Xu, Nanlong Yu, Gui Huang, Tieying Zhang, Dengcheng He, Feifei Li, Wei Cao, Zhongdong Huang, and Jianling Sun. 2020. FPGA-Accelerated Compactions for LSM-based Key-Value Store. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST'20). USENIX Association, Santa Clara, CA, 225--237. https://www.usenix.org/conference/fast20/presentation/zhang-tengGoogle ScholarDigital Library
- Pengfei Zuo, Yu Hua, and Jie Wu. 2018. Write-Optimized and High-Performance Hashing Index Scheme for Persistent Memory. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI'18). USENIX Association, Carlsbad, CA, 461--476. https://www.usenix.org/conference/osdi18/presentation/zuoGoogle Scholar
Index Terms
- ChameleonDB: a key-value store for optane persistent memory
Recommendations
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingKey-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...
dCompaction: Delayed Compaction for the LSM-Tree
Key-value (KV) stores have become a backbone of large-scale applications in today's data centers. Write-optimized data structures like the Log-Structured Merge-tree (LSM-tree) and their variants are widely used in KV storage systems like BigTable and ...
An efficient design and implementation of LSM-tree based key-value store on open-channel SSD
EuroSys '14: Proceedings of the Ninth European Conference on Computer SystemsVarious key-value (KV) stores are widely employed for data management to support Internet services as they offer higher efficiency, scalability, and availability than relational database systems. The log-structured merge tree (LSM-tree) based KV stores ...
Comments