skip to main content
10.1145/3581784.3607072acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections

Design Considerations and Analysis of Multi-Level Erasure Coding in Large-Scale Data Centers

Authors Info & Claims
Published:11 November 2023Publication History

ABSTRACT

Multi-level erasure coding (MLEC) has seen large deployments in the field, but there is no in-depth study of design considerations for MLEC at scale. In this paper, we provide comprehensive design considerations and analysis of MLEC at scale. We introduce the design space of MLEC in multiple dimensions, including various code parameter selections, chunk placement schemes, and various repair methods. We quantify their performance and durability, and show which MLEC schemes and repair methods can provide the best tolerance against independent/correlated failures and reduce repair network traffic by orders of magnitude. To achieve this, we use various evaluation strategies including simulation, splitting, dynamic programming, and mathematical modeling. We also compare the performance and durability of MLEC with other EC schemes such as SLEC and LRC and show that MLEC can provide high durability with higher encoding throughput and less repair network traffic over both SLEC and LRC.

References

  1. D. Colarelli and D. Grunwald. Massive Arrays of Idle Disks For Storage Archives. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (SC), 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Huaxia Xia and Andrew A. Chien. RobuSTore: Robust Performance for Distributed Storage Systems. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC), 2007.Google ScholarGoogle Scholar
  3. Zizhong Chen. Optimal real number codes for fault tolerant matrix operations. In Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Haiyang Shi and Xiaoyi Lu. TriEC: Tripartite Graph Based Erasure Coding NIC Offload. In Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2019.Google ScholarGoogle Scholar
  5. Haiyang Shi and Xiaoyi Lu. INEC: Fast and Coherent In-Network Erasure Coding. In Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2020.Google ScholarGoogle Scholar
  6. Liangfeng Cheng, Yuchong Hu, Zhaokang Ke, Jia Xu, Qiaori Yao, Dan Feng, Weichun Wang, and Wei Chen. LogECMem: Coupling Erasure-Coded In-Memory Key-Value Stores with Parity Logging. In Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yuya Uezato. Accelerating XOR-based erasure coding using program optimization techniques. In Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Salvatore Di Girolamo, Daniele De Sensi, Konstantin Taranov, Milos Malesevic, Maciej Besta, Timo Schneider, Severin Kistler, and Torsten Hoefler. Building blocks for network-accelerated distributed file systems. In Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis (SC), 2022.Google ScholarGoogle ScholarCross RefCross Ref
  9. David Patterson, Garth Gibson, and Randy Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the 1988 ACM SIGMOD Conference on the Management of Data (SIGMOD), 1988.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jaeho Kim, Jongmin Lee, Jongmoo Choi, Donghee Lee, and Sam H. Noh. Enhancing SSD reliability through efficient RAID support. In Proceedings of the Asia-Pacific Workshop on Systems (APSys), 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Guangyan Zhang, Zican Huang, Xiaosong Ma, Songlin Yang, Zhufan Wang, and Weimin Zheng. RAID+: Deterministic and Balanced Data Distribution for Large Disk Enclosures. In Proceedings of the 16th USENIX Symposium on File and Storage Technologies (FAST), 2018.Google ScholarGoogle Scholar
  12. K. V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster. In the 5th Workshop on Hot Topics in Storage and File Systems (HotStorage), 2013.Google ScholarGoogle Scholar
  13. KV Rashmi, Preetum Nakkiran, Jingyan Wang, Nihar B. Shah, and Kannan Ramchandran. Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage and Network-bandwidth. In Proceedings of the 13th USENIX Symposium on File and Storage Technologies (FAST), 2015.Google ScholarGoogle Scholar
  14. Mingyuan Xia, Mohit Saxena, Mario Blaum, and David A. Pease. A Tale of Two Erasure Codes in HDFS. In Proceedings of the 13th USENIX Symposium on File and Storage Technologies (FAST), 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jeffrey Thornton Inman, William Flynn Vining, Garrett Wilson Ransom, and Gary Alan Grider. Marfs, a near-posix interface to cloud objects. ; Login, 42(LA-UR-16-28720; LA-UR-16-28952), 2017.Google ScholarGoogle Scholar
  16. Scality ARTESCA: Object Storage for S3 Applications. https://www.scality.com/products/artesca/.Google ScholarGoogle Scholar
  17. Hierarchical Erasure Coding: Making Erasure Coding Usable. https://www.snia.org/sites/default/files/SNIA_Hierarchical_Erasure_Coding_Final.pdf.Google ScholarGoogle Scholar
  18. Jehan-François Pâris, S. J. Thomas J. E. Schwarz, Ahmed Amer, and Darrell D. E. Long. Highly reliable two-dimensional RAID arrays for archival storage. In 31th IEEE - International Performance Computing and Communications Conference (IPCCC), 2012.Google ScholarGoogle ScholarCross RefCross Ref
  19. Neng Wang, Yinlong Xu, Yongkun Li, and Si Wu. OI-RAID: A Two-Layer RAID Architecture towards Fast Recovery and High Reliability. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), 2016.Google ScholarGoogle ScholarCross RefCross Ref
  20. Alexander Thomasian. Multi-level RAID for very large disk arrays. In ACM SIGMETRICS Performance Evaluation Review, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sung Hoon Baek, Bong Wan Kim, Eui Joung Joung, and Chong Won Park. Reliability and performance of hierarchical RAID with multiple controllers. In Proceedings of the 20st ACM Symposium on Principles of Distributed Computing (PODC), 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alexander Thomasian and Yujie Tang. Performance, Reliability, and Performability Aspects of Hierarchical RAID. In 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage (NAS), 2011.Google ScholarGoogle Scholar
  23. Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. Erasure Coding in Windows Azure Storage. In Proceedings of the 2012 USENIX Annual Technical Conference (ATC), 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. MLEC Github repository. https://github.com/ucare-uchicago/mlec-sim.Google ScholarGoogle Scholar
  25. MLEC Artifact on Chameleon Trovi. https://tinyurl.com/mlec-artifact.Google ScholarGoogle Scholar
  26. Richard R. Muntz and John C. S. Lui. Performance analysis of disk arrays under failure. In Proceedings of the 16th International Conference on Very Large Data Bases (VLDB), 1990.Google ScholarGoogle Scholar
  27. Mark Holland and Garth Gibson. Parity Declustering for Continuous Operation in Redundant Disk Arrays. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Guillermo A. Alvarez, Walter A. Burkhard, and Flaviu Cristian. Tolerating Multiple Failures in RAID Architectures with Optimal Storage and Uniform Declustering. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA), 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Guillermo A. Alvarez, Walter A. Burkhard, Larry J. Stockmeyer, and Flaviu Cristian. Declustered disk array architectures with optimal and near-optimal parallelism. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA), 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Thomas J.E. Schwarz S.J., Jesse Steinberg, and Walter A. Burkhard. Permutation development data layout (PDDL). In Proceedings of the 5th International Symposium on High Performance Computer Architecture (HPCA-5), 1999.Google ScholarGoogle Scholar
  31. Huan Ke, Haryadi S Gunawi, David Bonnie, Nathan DeBardeleben, Michael Grosskopf, Terry Grové, Dominic Manno, Elisabeth Moore, and Brad Settlemyer. Extreme protection against data loss with single-overlap declustered parity. In 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 343--354. IEEE, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  32. CORVAULT - Self-Healing, High Density Data Storage. https://www.seagate.com/products/storage/data-storage-systems/corvault/.Google ScholarGoogle Scholar
  33. Jeff Bonwick and Bill Moore. Zfs: The last word in file systems, 2007.Google ScholarGoogle Scholar
  34. Dell PowerEdge RAID Controller 12. https://infohub.delltechnologies.com/p/dell-poweredge-raid-controller-12/.Google ScholarGoogle Scholar
  35. Paul Glasserman, Philip Heidelberger, Perwez Shahabuddin, and Tim Zajic. Splitting for rare event simulation: analysis of simple cases. In Proceedings of the 28th conference on Winter simulation, pages 302--308, 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Victor F Nicola, Perwez Shahabuddin, and Marvin K Nakayama. Techniques for fast simulation of models of highly dependable systems. IEEE Transactions on Reliability, 50(3):246--264, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  37. Daniel Ford, Franis Labelle, Florentina I. Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlna. Availability in Globally Distributed Storage Systems. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI), 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Kevin M. Greenan, James S. Plank, and Jay J. Wylie. Mean time to meaningless: MTTDL, Markov models, and storage system reliability. In the 2nd Workshop on Hot Topics in Storage and File Systems (HotStorage), 2010.Google ScholarGoogle Scholar
  39. Hiroaki Akutsu and Tomohiro Kawaguchi. Reliability analysis of distributed raid with priority rebuilding. In Proc. USENIX Conf., 2013.Google ScholarGoogle Scholar
  40. Kishor S Trivedi. Probability and statistics with reliability, queuing, and computer science applications. John Wiley & Sons, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. ORNL's Alpine storage system. https://www.olcf.ornl.gov/olcf-resources/data-visualization-resources/alpine.Google ScholarGoogle Scholar
  42. Personal Communication with LANL, ORNL, and Seagate Engineers and Operators.Google ScholarGoogle Scholar
  43. Yuchong Hu, Liangfeng Cheng, Qiaori Yao, Patrick P. C. Lee, Weichun Wang, and Wei Chen. Exploiting Combined Locality for Wide-Stripe Erasure Coding in Distributed Storage. In Proceedings of the 19th USENIX Symposium on File and Storage Technologies (FAST), 2021.Google ScholarGoogle Scholar
  44. Intel Intelligent Storage Acceleration Library (Intel ISA-L). https://software.intel.com/en-us/storage/ISA-L.Google ScholarGoogle Scholar
  45. Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. XORing Elephants: Novel Erasure Codes for Big Data. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB), 2013.Google ScholarGoogle Scholar
  46. Oleg Kolosov, Gala Yadgar, Matan Liram, Itzhak Tamo, and Alexander Barg. On Fault Tolerance, Locality, and Optimality in Locally Repairable Codes. In Proceedings of the 2018 USENIX Annual Technical Conference (ATC), 2018.Google ScholarGoogle Scholar
  47. Itzhak Tamo and Alexander Barg. A family of optimal locally recoverable codes. IEEE Transactions on Information Theory, 60(8):4661--4676, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  48. Saurabh Kadekodi, Shashwat Silas, David Clausen, and Arif Merchant. Practical Design Considerations for Wide Locally Recoverable Codes (LRCs). In Proceedings of the 21th USENIX Symposium on File and Storage Technologies (FAST), 2023.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Chameleon - A configurable experimental environment for large-scale cloud research. https://www.chameleoncloud.org.Google ScholarGoogle Scholar
  50. Kate Keahey, Jason Anderson, Zhuo Zhen, Pierre Riteau, Paul Ruth, Dan Stanzione, Mert Cevik, Jacob Colleran, Haryadi S. Gunawi, Cody Hammock, Joe Mambretti, Alexander Barnes, François Halbach, Alex Rocha, and Joe Stubbs. Lessons Learned from the Chameleon Testbed. In Proceedings of the 2020 USENIX Annual Technical Conference (ATC), 2020.Google ScholarGoogle Scholar

Index Terms

  1. Design Considerations and Analysis of Multi-Level Erasure Coding in Large-Scale Data Centers

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Article Metrics

            • Downloads (Last 12 months)223
            • Downloads (Last 6 weeks)33

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader