ABSTRACT
We reveal loopholes of Speculative Execution (SE) implementations under a unique fault model: node-level network throughput degradation. This problem appears in many data-parallel frameworks such as Hadoop MapReduce and Spark. To address this, we present PBSE, a robust, path-based speculative execution that employs three key ingredients: path progress, path diversity, and path-straggler detection and speculation. We show how PBSE is superior to other approaches such as cloning and aggressive speculation under the aforementioned fault model. PBSE is a general solution, applicable to many data-parallel frameworks such as Hadoop/HDFS+QFS, Spark and Flume.
- Personal Communication from datacenter operators of University of Chicago IT Services.Google Scholar
- Personal Communication from Kevin Harms of Argonne National Laboratory.Google Scholar
- Personal Communication from Robert Ricci of University of Utah.Google Scholar
- Personal Communication from Gary Grider and Parks Fields of Los Alamos National Laboratory.Google Scholar
- Personal Communication from Xing Lin of NetApp.Google Scholar
- Personal Communication from H. Birali Runesha (Director of Research Computing Center, University of Chicago).Google Scholar
- Personal Communication from Andree Jacobson (Chief Information Officer at New Mexico Consortium).Google Scholar
- Personal Communication from Dhruba Borthakur of Facebook.Google Scholar
- Apache Flume. http://flume.apache.org/.Google Scholar
- Apache Giraph. http://giraph.apache.org/.Google Scholar
- Apache Hadoop. http://hadoop.apache.org.Google Scholar
- Apache S4. http://incubator.apache.org/s4/.Google Scholar
- Apache Spark. http://spark.apache.org/.Google Scholar
- Chameleon. https://www.chameleoncloud.org.Google Scholar
- Emulab Network Emulation Testbed. http://www.emulab.net.Google Scholar
- HDFS-8009: Signal congestion on the DataNode. https://issues.apache.org/jira/browse/HDFS-8009.Google Scholar
- Introduction to HDFS Erasure Coding in Apache Hadoop. http://blog.cloudera.com/blog/2015/09/introduction-to-hdfs-erasure-coding-in-apache-hadoop/.Google Scholar
- Parallel Reconfigurable Observational Environment (PRObE). http://www.nmc-probe.org.Google Scholar
- QFS. https://quantcast.github.io/qfs/.Google Scholar
- Resource Localization in Yarn: Deep dive. http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/.Google Scholar
- RIVER: A Research Infrastructure to Explore Volatility, Energy-Efficiency, and Resilience. http://river.cs.uchicago.edu.Google Scholar
- Saving capacity with HDFS RAID. https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/.Google Scholar
- Speculative tasks in Hadoop. http://stackoverflow.com/questions/34342546/speculative-tasks-in-hadoop.Google Scholar
- Statistical Workload Injector for MapReduce (SWIM). https://github.com/SWIMProjectUCB/SWIM/wiki.Google Scholar
- Support 'hedged' reads in DFSClient. https://issues.apache.org/jira/browse/HDFS%2D5776.Google Scholar
- Worlds First 1,000-Processor Chip. https://www.ucdavis.edu/news/worlds-first-1000-processor-chip/.Google Scholar
- Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, , and Xiaoqiang Zheng. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI), 2016.Google Scholar
- Marcos K. Aguilera, Jeffrey C. Mogul, Janet L. Wiener, Patrick Reynolds, and Athicha Muthitacharoen. Performance Debugging for Distributed Systems of Black Boxes. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP), 2003.Google Scholar
- Ganesh Ananthanarayanan, Sameer Agarwal, Srikanth Kandula, Albert Greenberg, Ion Stoica, Duke Harlan, and Ed Harris. Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters. In Proceedings of the 2011 EuroSys Conference (EuroSys), 2011.Google Scholar
- Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. Effective Straggler Mitigation: Attack of the Clones. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI), 2013.Google ScholarDigital Library
- Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker, and Ion Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI), 2012.Google Scholar
- Ganesh Ananthanarayanan, Michael Chien-Chun Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, and Minlan Yu. GRASS: Trimming Stragglers in Approximation Analytics. In Proceedings of the 11th Symposium on Networked Systems Design and Implementation (NSDI), 2014.Google Scholar
- Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. Reining in the Outliers in Map-Reduce Clusters using Mantri. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI), 2010.Google ScholarDigital Library
- Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. Above the Clouds: A Berkeley View of Cloud Computing. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf.Google Scholar
- Remzi H. Arpaci-Dusseau, Eric Anderson, Noah Treuhaft, David E. Culler, Joseph M. Hellerstein, Dave Patterson, and Kathy Yelick. Cluster I/O with River: Making the Fast Case Common. In The 1999 Workshop on Input/Output in Parallel and Distributed Systems (IOPADS), 1999.Google Scholar
- Peter Bailis and Kyle Kingsbury. The Network is Reliable. An informal survey of real-world communications failures. ACM Queue, 12(7), July 2014.Google Scholar
- Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy, and Jiri Schindler. An Analysis of Latent Sector Errors in Disk Drives. In Proceedings of the 2007 ACM Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2007.Google ScholarDigital Library
- Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richar Mortier. Using Magpie for request extraction and workload modelling. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI), 2004.Google ScholarDigital Library
- Ronnie Chaiken, Bob Jenkins, Paul Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. In Proceedings of the 34th International Conference on Very Large Data Bases (VLDB), 2008.Google Scholar
- Mike Y. Chen, Anthony Accardi, Emre Kiciman, Dave Patterson, Armando Fox, and Eric Brewer. Path-Based Failure and Evolution Management. In Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI), 2004.Google Scholar
- Yanpei Chen, Sara Alspaugh, and Randy H. Katz. Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads. In Proceedings of the 38th International Conference on Very Large Data Bases (VLDB), 2012.Google ScholarDigital Library
- Jeffrey Dean and Luiz Andr Barroso. The Tail at Scale. Communications of the ACM, 56(2), February 2013.Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI), 2004.Google ScholarDigital Library
- Thanh Do, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patana-anake, and Haryadi S. Gunawi. Limplock: Understanding the Impact of Limpware on Scale-Out Cloud Systems. In Proceedings of the 4th ACM Symposium on Cloud Computing (SoCC), 2013.Google ScholarDigital Library
- Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, and Ion Stoica. X-Trace: A Pervasive Network Tracing Framework. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI), 2007.Google Scholar
- Rohan Gandhi, Di Xie, and Y. Charlie Hu. PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters. In Proceedings of the 2013 USENIX Annual Technical Conference (ATC), 2013.Google Scholar
- Garth Gibson, Gary Grider, Andree Jacobson, and Wyatt Lloyd. Probe: A thousand-node experimental cluster for computer systems research. USENIX;login:, 38(3), June 2013.Google Scholar
- Haryadi S. Gunawi, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patana-anake, Thanh Do, Jeffry Adityatama, Kurnia J. Eliazar, Agung Laksono, Jeffrey F. Lukman, Vincentius Martin, and Anang D. Satria. What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems. In Proceedings of the 5th ACM Symposium on Cloud Computing (SoCC), 2014.Google ScholarDigital Library
- Haryadi S. Gunawi, Mingzhe Hao, Riza O. Suminto, Agung Laksono, Anang D. Satria, Jeffry Adityatama, and Kurnia J. Eliazar. Why Does the Cloud Stop Computing? Lessons from Hundreds of Service Outages. In Proceedings of the 7th ACM Symposium on Cloud Computing (SoCC), 2016.Google ScholarDigital Library
- Huayang Guo, Ming Wu, Lidong Zhou, Gang Hu, Junfeng Yang, and Lintao Zhang. Practical Software Model Checking via Dynamic Interface Reduction. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP), 2011.Google ScholarDigital Library
- Mingzhe Hao, Gokul Soundararajan, Deepak Kenchammana-Hosekote, Andrew A. Chien, and Haryadi S. Gunawi. The Tail at Store: A Revelation from Millions of Hours of Disk and SSD Deployments. In Proceedings of the 14th USENIX Symposium on File and Storage Technologies (FAST), 2016.Google ScholarDigital Library
- Aaron Harlap, Henggang Cui, Wei Dai, Jinliang Wei, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, and Eric P. Xing. Addressing the straggler problem for iterative convergent parallel ML. In Proceedings of the 7th ACM Symposium on Cloud Computing (SoCC), 2016.Google ScholarDigital Library
- Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. Erasure Coding in Windows Azure Storage. In Proceedings of the 2012 USENIX Annual Technical Conference (ATC), 2012.Google ScholarDigital Library
- Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2007 EuroSys Conference (EuroSys), 2007.Google Scholar
- Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), 2009.Google Scholar
- Tanakorn Leesatapornwongsa, Mingzhe Hao, Pallavi Joshi, Jeffrey F. Lukman, and Haryadi S. Gunawi. SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), 2014.Google ScholarDigital Library
- Tanakorn Leesatapornwongsa, Jeffrey F. Lukman, Shan Lu, and Haryadi S. Gunawi. TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2016.Google ScholarDigital Library
- Tanakorn Leesatapornwongsa, Cesar A. Stuardo, Riza O. Suminto, Huan Ke, Jeffrey F. Lukman, and Haryadi S. Gunawi. Scalability Bugs: When 100-Node Testing is Not Enough. In The 16th Workshop on Hot Topics in Operating Systems (HotOS XVII), 2017.Google ScholarDigital Library
- Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble. Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency. In Proceedings of the 5th ACM Symposium on Cloud Computing (SoCC), 2014.Google ScholarDigital Library
- David Lion, Adrian Chiu, Hailong Sun, Xin Zhuang, Nikola Grcevski, and Ding Yuan. Dont Get Caught in the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-Parallel Systems. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI), 2016.Google Scholar
- Jonathan Mace, Peter Bodik, Rodrigo Fonseca, and Madanlal Musuvathi. Retro: Targeted Resource Management in Multi-tenant Distributed Systems. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI), 2015.Google Scholar
- Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), 2015.Google ScholarDigital Library
- Kristi Morton, Magdalena Balazinska, and Dan Grossman. ParaTimer: a progress indicator for MapReduce DAGs. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD), 2010.Google ScholarDigital Library
- Kristi Morton, Abram L. Friesen, Magdalena Balazinska, and Dan Grossman. Estimating the progress of MapReduce pipelines. In Proceedings of the 26th International Conference on Data Engineering (ICDE), 2010.Google ScholarCross Ref
- Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martin Abadi. Naiad: A Timely Dataflow System. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), 2013.Google ScholarDigital Library
- Neda Nasiriani, Cheng Wang, George Kesidis, and Bhuvan Urgaonkar. Using Burstable Instances in the Public Cloud: When and How? 2016.Google Scholar
- Michael Ovsiannikov, Silvius Rus, Damian Reeves, Paul Sutter, Sriram Rao, and Jim Kelly. The Quantcast File System. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB), 2013.Google Scholar
- Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. XORing Elephants: Novel Erasure Codes for Big Data. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB), 2013.Google Scholar
- Lalith Suresh, Marco Canini, Stefan Schmid, and Anja Feldmann. C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI), 2015.Google ScholarDigital Library
- Shivaram Venkataraman, Aurojit Panda, Ganesh Ananthanarayanan, Michael J. Franklin, and Ion Stoica. The Power of Choice in Data-Aware Cluster Scheduling. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), 2014.Google ScholarDigital Library
- Guohui Wang and T. S. Eugene Ng. The impact of virtualization on network performance of amazon EC2 data center. In The 29th IEEE International Conference on Computer Communications (INFOCOM), 2010.Google ScholarCross Ref
- Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike Hibler, Chad Barb, and Abhijeet Joglekar. An Integrated Experimental Environment for Distributed Systems and Networks. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), 2002.Google ScholarDigital Library
- Zhe Wu, Curtis Yu, and Harsha V. Madhyastha. CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Services. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI), 2015.Google ScholarDigital Library
- Huaxia Xia and Andrew A. Chien. RobuSTore: Robust Performance for Distributed Storage Systems. In Proceedings of the 2007 Conference on High Performance Networking and Computing (SC), 2007.Google Scholar
- Tianyin Xu, Jiaqi Zhang, Peng Huang, Jing Zheng, Tianwei Sheng, Ding Yuan, Yuanyuan Zhou, and Shankar Pasupathy. Do Not Blame Users for Misconfigurations. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), 2013.Google Scholar
- Yunjing Xu, Zachary Musgrave, Brian Noble, and Michael Bailey. Bobtail: Avoiding Long Tails in the Cloud. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI), 2013.Google ScholarDigital Library
- Neeraja J. Yadwadkar, Ganesh Ananthanarayanan, and Randy Katz. Wrangler: Predictable and Faster Jobs using Fewer Resources. In Proceedings of the 5th ACM Symposium on Cloud Computing (SoCC), 2014.Google ScholarDigital Library
- Shiqin Yan, Huaicheng Li, Mingzhe Hao, Michael Hao Tong, Swaminathan Sundararaman, Andrew A. Chien, and Haryadi S. Gunawi. Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs. In Proceedings of the 15th USENIX Symposium on File and Storage Technologies (FAST), 2017.Google ScholarDigital Library
- Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael Stonebraker. Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores. In Proceedings of the 40th International Conference on Very Large Data Bases (VLDB), 2014.Google ScholarDigital Library
- Ding Yuan, Yu Luo, Xin Zhuang, Guilherme Renna Rodrigues, Xu Zhao, Yongle Zhang, Pranay U. Jain, and Michael Stumm. Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), 2014.Google Scholar
- Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 2010 EuroSys Conference (EuroSys), 2010.Google ScholarDigital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI), 2012.Google Scholar
- Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica. Improving MapReduce Performance in Heterogeneous Environments. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), 2008.Google ScholarDigital Library
- Ennan Zhai, Ruichuan Chen, David Isaac Wolinsky, and Bryan Ford. Heading Off Correlated Failures through Independence-as-a-Service. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), 2014.Google Scholar
Index Terms
- PBSE: a robust path-based speculative execution for degraded-network tail tolerance in data-parallel frameworks
Recommendations
A comparative between hadoop mapreduce and apache Spark on HDFS
IML '17: Proceedings of the 1st International Conference on Internet of Things and Machine LearningData is growing now in a very high speed with a large volume, Spark and MapReduce1 both provide a processing model for analyzing and managing this large data -Big Data- stored on HDFS. In this paper, we discuss a comparative between Apache Spark and ...
Managing Variant Calling Files the Big Data Way: Using HDFS and Apache Parquet
BDCAT '17: Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and TechnologiesBig Data has been seen as a remedy for the efficient management of the ever-increasing genomic data. In this paper, we investigate the use of Apache Spark to store and process Variant Calling Files (VCF) on a Hadoop cluster. We demonstrate Tomatula, a ...
A Two-Stage Data Processing Algorithm to Generate Random Sample Partitions for Big Data Analysis
Cloud Computing – CLOUD 2018AbstractTo enable the individual data block files of a distributed big data set to be used as random samples for big data analysis, a two-stage data processing (TSDP) algorithm is proposed in this paper to convert a big data set into a random sample ...
Comments