ABSTRACT
Modern decentralized key-value stores often replicate and distribute data via consistent hashing for availability and scalability. Compared to replication, erasure coding is a promising redundancy approach that provides availability guarantees at much lower cost. However, when combined with consistent hashing, erasure coding incurs a lot of parity updates during scaling (i.e., adding or removing nodes) and cannot efficiently handle degraded reads caused by scaling. In this paper, we propose a novel erasure coding model called FragEC, which incurs no parity updates during scaling. We further extend consistent hashing with multiple hash rings to enable erasure coding to seamlessly address degraded reads during scaling. We realize our design as an in-memory key-value store called ECHash, and conduct testbed experiments on different scaling workloads in both local and cloud environments. We show that ECHash achieves better scaling performance (in terms of scaling throughput and degraded read latency during scaling) over the baseline erasure coding implementation, while maintaining high basic I/O and node repair performance.
- Amazon DynamoDB. https://aws.amazon.com/dynamodb.Google Scholar
- Amazon Elastic Compute Cloud (EC2). http://aws.amazon.com/ec2.Google Scholar
- Amazon Elasticache. https://docs.aws.amazon.com/elasticache.Google Scholar
- AWS Autoscaling. https://aws.amazon.com/autoscaling.Google Scholar
- etcd. https://etcd.io.Google Scholar
- Intel ISA-L. https://github.com/01org/isal.Google Scholar
- LibMemcached. https://libmemcached.org.Google Scholar
- Memcached. https://memcached.org.Google Scholar
- Openstack. https://openstack.org.Google Scholar
- Openstack Swift. https://swift.org.Google Scholar
- Twemcache is the Twitter Memcached. https://twitter.com/twemcache.Google Scholar
- B. Atikoglu, Y. Xu, E. Frachtenberg, S.Jiang, and M. Paleczny. Workload analysis of a large-scale key-value store. In Proc. of ACM SIGMETRICS, pages 53--64, 2012.Google ScholarDigital Library
- J. C. Chan, Q. Ding, P. P. Lee, and H. H. Chan. Parity logging with reserved space: Towards efficient updates and recovery in erasure-coded clustered storage. In Proc. of USENIX FAST, pages 163--176, 2014.Google Scholar
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. on Computer Systems, 26(2):1--26, 2008.Google ScholarDigital Library
- H. Chen, H. Zhang, M. Dong, Z. Wang, Y. Xia, H. Guan, and B. Zang. Efficient and available in-memory KV-store with hybrid erasure coding and replication. ACM Trans. on Storage, 13(3):25, 2017.Google ScholarDigital Library
- M. Chen and E. Zadok. Kurma: Secure geo-distributed multi-cloud storage gateways. In Proc. of ACM SYSTOR, pages 109--120, 2019.Google ScholarDigital Library
- Y. L. Chen, S. Mu, J. Li, C. Huang, J. Li, A. Ogus, and D. Phillips. Giza: Erasure coding objects across global data centers. In Proc. of USENIX ATC, pages 539--551, 2017.Google Scholar
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In Proc. of ACM SoCC, pages 143--154, 2010.Google ScholarDigital Library
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proc. of ACM SOSP, pages 205--220, 2007.Google ScholarDigital Library
- B. Fan, D. G. Andersen, and M. Kaminsky. MemC3: Compact and concurrent MemCache with dumber caching and smarter hashing. In Proc. of USENIX NSDI, pages 371--384, 2013.Google Scholar
- D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V.-A. Truong, L. Barroso, C. Grimes, and S. Quinlan. Availability in globally distributed storage systems. In Proc. of USENIX OSDI, pages 61--74, 2010.Google Scholar
- U. U. Hafeez, M. Wajahat, and A. Gandhi. ElMem: Towards an elastic Memcached system. In Proc. of IEEE ICDCS, pages 278--289, 2018.Google ScholarCross Ref
- Y.-J. Hong and M. Thottethodi. Understanding and mitigating the impact of load imbalance in the memory caching tier. In Proc. of ACM SoCC, page 13, 2013.Google ScholarDigital Library
- X. Hu, X. Wang, Y. Li, L. Zhou, Y. Luo, C. Ding, S. Jiang, and Z. Wang. LAMA: Optimized locality-aware memory allocation for key-value cache. In Proc. of USENIX ATC, pages 57--69, 2015.Google Scholar
- Y. Hu, Y. Wang, B. Liu, D. Niu, and C. Huang. Latency reduction and load balancing in coded storage systems. In Proc. of ACM SoCC, pages 365--377, 2017.Google ScholarDigital Library
- C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin. Erasure coding in Windows Azure Storage. In Proc. of USENIX ATC, pages 15--26, 2012.Google Scholar
- J. Huang, X. Liang, X. Qin, P. Xie, and C. Xie. Scale-RS: An efficient scaling scheme for RS-coded storage clusters. IEEE Trans. on Parallel and Distributed Systems, 26(6):1704--1717, 2015.Google ScholarDigital Library
- P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free coordination for internet-scale systems. In Proc. of USENIX ATC, pages 1--14, 2010.Google Scholar
- D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In Proc. of ACM STOC, pages 654--663, 1997.Google ScholarDigital Library
- C. Lai, S.Jiang, L. Yang, S. Lin, G. Sun, Z. Hou, C. Cui, and J. Cong. Atlas: Baidu's key-value storage system for cloud data. In Proc. of IEEE MSST, pages 1--14, 2015.Google ScholarCross Ref
- A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35--40, 2010.Google ScholarDigital Library
- R. Li, X. Li, P. P. Lee, and Q. Huang. Repair pipelining for erasure-coded storage. In Proc. of USENIX ATC, pages 567--579, 2017.Google Scholar
- S. Li, Q. Zhang, Z. Yang, and Y. Dai. BCStore: Bandwidth-efficient in-memory KV-store with batch coding. In Proc. of IEEE MSST, 2017.Google Scholar
- X. Li, D. G. Andersen, M. Kaminsky, and M.J. Freedman. Algorithmic improvements for fast concurrent cuckoo hashing. In Proc. of ACM EuroSys, page 27, 2014.Google ScholarDigital Library
- H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In Proc. of USENIX NSDI, pages 429--444, 2014.Google Scholar
- W. Litwin, R. Moussa, and T. Schwarz. LH* RS: A highly-available scalable distributed data structure. ACM Trans. on Database Systems, 30(3):769--811, 2005.Google ScholarDigital Library
- S. Muralidhar, W. Lloyd, S. Roy, C. Hill, E. Lin, W. Liu, S. Pan, S. Shankar, V. Sivakumar, L. Tang, et al. f4: Facebook's warm blob storage system. In Proc. of USENIX OSDI, pages 383--398, 2014.Google Scholar
- R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, et al. Scaling Memcache at Facebook. In Proc. of USENIX NSDI, pages 385--398, 2013.Google Scholar
- M. Ovsiannikov, S. Rus, D. Reeves, P. Sutter, S. Rao, and J. Kelly. The Quantcast File System. Proc. of VLDB Endowment, 6(11):1092--1101, 2013.Google ScholarDigital Library
- K. Rashmi, M. Chowdhury, J. Kosaian, I. Stoica, and K. Ramchandran. EC-Cache: Load-balanced, low-latency cluster caching with online erasure coding. In Proc. of USENIX OSDI, pages 401--417, 2016.Google Scholar
- I. Reed and G. Solomon. Polynomial Codes over Certain Finite Fields. Journal of the Society for Industrial & Applied Mathematics, 8(2):300--304, 1960.Google ScholarCross Ref
- T. Saemundsson, H. Bjornsson, G. Chockler, and Y. Vigfusson. Dynamic performance profiling of cloud caches. In Proc. of ACM SoCC, pages 1--14, 2014.Google ScholarDigital Library
- M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur. XORing elephants: Novel erasure codes for big data. In Proc. of VLDB Endowment, volume 6, pages 325--336, 2013.Google ScholarDigital Library
- M. Silberstein, L. Ganesh, Y. Wang, L. Alvisi, and M. Dahlin. Lazy means smart: Reducing repair bandwidth costs in erasure-coded distributed storage. In Proc. of ACM SYSTOR, pages 1--7, 2014.Google ScholarDigital Library
- K. Taranov, G. Alonso, and T. Hoefler. Fast and strongly-consistent per-item resilience in key-value stores. In Proc. of ACM EuroSys, page 39, 2018.Google ScholarDigital Library
- M. Vrable, S. Savage, and G. M. Voelker. Bluesky: A cloud-backed file system for the enterprise. In Proc. of USENIX FAST, pages 19--19, 2012.Google Scholar
- H. Weatherspoon and J. D. Kubiatowicz. Erasure coding vs. replication: A quantitative comparison. In Proc. of Springer International Workshop on Peer-to-Peer Systems, pages 328--337, 2002.Google ScholarCross Ref
- S. Wu, Y. Xu, Y. Li, and Z. Yang. I/O-efficient scaling schemes for distributed storage systems with CRS codes. IEEE Trans. on Parallel and Distributed Systems, 27(9):2639--2652, Sep 2016.Google ScholarDigital Library
- M. M. Yiu, H. H. Chan, and P. P. Lee. Erasure coding for small objects in in-memory KV storage. In Proc. of ACM SYSTOR, page 14, 2017.Google ScholarDigital Library
- X. Zhang, Y. Hu, P. P. Lee, and P. Zhou. Toward optimal storage scaling via network coding: From theory to practice. In Proc. of IEEE INFOCOM, pages 1808--1816, 2018.Google ScholarDigital Library
Index Terms
- Coupling Decentralized Key-Value Stores with Erasure Coding
Recommendations
LogECMem: coupling erasure-coded in-memory key-value stores with parity logging
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisIn-memory key-value stores are often used to speed up Big Data workloads on modern HPC clusters. To maintain their high availability, erasure coding has been recently adopted as a low-cost redundancy scheme instead of replication. Existing erasure-coded ...
Erasure coding for small objects in in-memory KV storage
SYSTOR '17: Proceedings of the 10th ACM International Systems and Storage ConferenceWe present MemEC, an erasure-coding-based in-memory key-value (KV) store that achieves high availability and fast recovery while keeping low data redundancy across storage servers. MemEC is specifically designed for workloads dominated by small objects. ...
An Efficient Memory-Mapped Key-Value Store for Flash Storage
SoCC '18: Proceedings of the ACM Symposium on Cloud ComputingPersistent key-value stores have emerged as a main component in the data access path of modern data processing systems. However, they exhibit high CPU and I/O overhead. Today, due to power limitations it is important to reduce CPU overheads for data ...
Comments