research-article

Public Access

PBSE: a robust path-based speculative execution for degraded-network tail tolerance in data-parallel frameworks

Authors:
Riza O. Suminto

University of Chicago

University of Chicago
View Profile

,
Cesar A. Stuardo

University of Chicago

University of Chicago
View Profile

,
Alexandra Clark

University of Chicago

University of Chicago
View Profile

,
Huan Ke

University of Chicago

University of Chicago
View Profile

,
Tanakorn Leesatapornwongsa

University of Chicago

University of Chicago
View Profile

,
Bo Fu

University of Chicago

University of Chicago
View Profile

,
Daniar H. Kurniawan

Bandung Institute of Technology

Bandung Institute of Technology
View Profile

,
Vincentius Martin

Surya University

Surya University
View Profile

,
Maheswara Rao G. Uma

Intel Corp.

Intel Corp.
View Profile

,
Haryadi S. Gunawi

University of Chicago

University of Chicago
View Profile

SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingSeptember 2017Pages 295–308https://doi.org/10.1145/3127479.3131622

Published:24 September 2017Publication History

SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

Pages 295–308

ABSTRACT

We reveal loopholes of Speculative Execution (SE) implementations under a unique fault model: node-level network throughput degradation. This problem appears in many data-parallel frameworks such as Hadoop MapReduce and Spark. To address this, we present PBSE, a robust, path-based speculative execution that employs three key ingredients: path progress, path diversity, and path-straggler detection and speculation. We show how PBSE is superior to other approaches such as cloning and aggressive speculation under the aforementioned fault model. PBSE is a general solution, applicable to many data-parallel frameworks such as Hadoop/HDFS+QFS, Spark and Flume.

References

Personal Communication from datacenter operators of University of Chicago IT Services.Google Scholar
Personal Communication from Kevin Harms of Argonne National Laboratory.Google Scholar
Personal Communication from Robert Ricci of University of Utah.Google Scholar
Personal Communication from Gary Grider and Parks Fields of Los Alamos National Laboratory.Google Scholar
Personal Communication from Xing Lin of NetApp.Google Scholar
Personal Communication from H. Birali Runesha (Director of Research Computing Center, University of Chicago).Google Scholar
Personal Communication from Andree Jacobson (Chief Information Officer at New Mexico Consortium).Google Scholar
Personal Communication from Dhruba Borthakur of Facebook.Google Scholar
Apache Flume. http://flume.apache.org/.Google Scholar
Apache Giraph. http://giraph.apache.org/.Google Scholar
Apache Hadoop. http://hadoop.apache.org.Google Scholar
Apache S4. http://incubator.apache.org/s4/.Google Scholar
Apache Spark. http://spark.apache.org/.Google Scholar
Chameleon. https://www.chameleoncloud.org.Google Scholar
Emulab Network Emulation Testbed. http://www.emulab.net.Google Scholar
HDFS-8009: Signal congestion on the DataNode. https://issues.apache.org/jira/browse/HDFS-8009.Google Scholar
Introduction to HDFS Erasure Coding in Apache Hadoop. http://blog.cloudera.com/blog/2015/09/introduction-to-hdfs-erasure-coding-in-apache-hadoop/.Google Scholar
Parallel Reconfigurable Observational Environment (PRObE). http://www.nmc-probe.org.Google Scholar
QFS. https://quantcast.github.io/qfs/.Google Scholar
Resource Localization in Yarn: Deep dive. http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/.Google Scholar
RIVER: A Research Infrastructure to Explore Volatility, Energy-Efficiency, and Resilience. http://river.cs.uchicago.edu.Google Scholar
Saving capacity with HDFS RAID. https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/.Google Scholar
Speculative tasks in Hadoop. http://stackoverflow.com/questions/34342546/speculative-tasks-in-hadoop.Google Scholar
Statistical Workload Injector for MapReduce (SWIM). https://github.com/SWIMProjectUCB/SWIM/wiki.Google Scholar
Support 'hedged' reads in DFSClient. https://issues.apache.org/jira/browse/HDFS%2D5776.Google Scholar
Worlds First 1,000-Processor Chip. https://www.ucdavis.edu/news/worlds-first-1000-processor-chip/.Google Scholar
Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, , and Xiaoqiang Zheng. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI), 2016.Google Scholar
Marcos K. Aguilera, Jeffrey C. Mogul, Janet L. Wiener, Patrick Reynolds, and Athicha Muthitacharoen. Performance Debugging for Distributed Systems of Black Boxes. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP), 2003.Google Scholar
Ganesh Ananthanarayanan, Sameer Agarwal, Srikanth Kandula, Albert Greenberg, Ion Stoica, Duke Harlan, and Ed Harris. Scarlett: Coping with Skewed Content Popularity in MapReduce Clusters. In Proceedings of the 2011 EuroSys Conference (EuroSys), 2011.Google Scholar
Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. Effective Straggler Mitigation: Attack of the Clones. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI), 2013.Google ScholarDigital Library
Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker, and Ion Stoica. PACMan: Coordinated Memory Caching for Parallel Jobs. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI), 2012.Google Scholar
Ganesh Ananthanarayanan, Michael Chien-Chun Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, and Minlan Yu. GRASS: Trimming Stragglers in Approximation Analytics. In Proceedings of the 11th Symposium on Networked Systems Design and Implementation (NSDI), 2014.Google Scholar
Ganesh Ananthanarayanan, Srikanth Kandula, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha, and Edward Harris. Reining in the Outliers in Map-Reduce Clusters using Mantri. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI), 2010.Google ScholarDigital Library
Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy H. Katz, Andrew Konwinski, Gunho Lee, David A. Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. Above the Clouds: A Berkeley View of Cloud Computing. http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf.Google Scholar
Remzi H. Arpaci-Dusseau, Eric Anderson, Noah Treuhaft, David E. Culler, Joseph M. Hellerstein, Dave Patterson, and Kathy Yelick. Cluster I/O with River: Making the Fast Case Common. In The 1999 Workshop on Input/Output in Parallel and Distributed Systems (IOPADS), 1999.Google Scholar
Peter Bailis and Kyle Kingsbury. The Network is Reliable. An informal survey of real-world communications failures. ACM Queue, 12(7), July 2014.Google Scholar
Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy, and Jiri Schindler. An Analysis of Latent Sector Errors in Disk Drives. In Proceedings of the 2007 ACM Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2007.Google ScholarDigital Library
Paul Barham, Austin Donnelly, Rebecca Isaacs, and Richar Mortier. Using Magpie for request extraction and workload modelling. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI), 2004.Google ScholarDigital Library
Ronnie Chaiken, Bob Jenkins, Paul Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. In Proceedings of the 34th International Conference on Very Large Data Bases (VLDB), 2008.Google Scholar
Mike Y. Chen, Anthony Accardi, Emre Kiciman, Dave Patterson, Armando Fox, and Eric Brewer. Path-Based Failure and Evolution Management. In Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI), 2004.Google Scholar
Yanpei Chen, Sara Alspaugh, and Randy H. Katz. Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads. In Proceedings of the 38th International Conference on Very Large Data Bases (VLDB), 2012.Google ScholarDigital Library
Jeffrey Dean and Luiz Andr Barroso. The Tail at Scale. Communications of the ACM, 56(2), February 2013.Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI), 2004.Google ScholarDigital Library
Thanh Do, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patana-anake, and Haryadi S. Gunawi. Limplock: Understanding the Impact of Limpware on Scale-Out Cloud Systems. In Proceedings of the 4th ACM Symposium on Cloud Computing (SoCC), 2013.Google ScholarDigital Library
Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, and Ion Stoica. X-Trace: A Pervasive Network Tracing Framework. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI), 2007.Google Scholar
Rohan Gandhi, Di Xie, and Y. Charlie Hu. PIKACHU: How to Rebalance Load in Optimizing MapReduce On Heterogeneous Clusters. In Proceedings of the 2013 USENIX Annual Technical Conference (ATC), 2013.Google Scholar
Garth Gibson, Gary Grider, Andree Jacobson, and Wyatt Lloyd. Probe: A thousand-node experimental cluster for computer systems research. USENIX;login:, 38(3), June 2013.Google Scholar
Haryadi S. Gunawi, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patana-anake, Thanh Do, Jeffry Adityatama, Kurnia J. Eliazar, Agung Laksono, Jeffrey F. Lukman, Vincentius Martin, and Anang D. Satria. What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems. In Proceedings of the 5th ACM Symposium on Cloud Computing (SoCC), 2014.Google ScholarDigital Library
Haryadi S. Gunawi, Mingzhe Hao, Riza O. Suminto, Agung Laksono, Anang D. Satria, Jeffry Adityatama, and Kurnia J. Eliazar. Why Does the Cloud Stop Computing? Lessons from Hundreds of Service Outages. In Proceedings of the 7th ACM Symposium on Cloud Computing (SoCC), 2016.Google ScholarDigital Library
Huayang Guo, Ming Wu, Lidong Zhou, Gang Hu, Junfeng Yang, and Lintao Zhang. Practical Software Model Checking via Dynamic Interface Reduction. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP), 2011.Google ScholarDigital Library
Mingzhe Hao, Gokul Soundararajan, Deepak Kenchammana-Hosekote, Andrew A. Chien, and Haryadi S. Gunawi. The Tail at Store: A Revelation from Millions of Hours of Disk and SSD Deployments. In Proceedings of the 14th USENIX Symposium on File and Storage Technologies (FAST), 2016.Google ScholarDigital Library
Aaron Harlap, Henggang Cui, Wei Dai, Jinliang Wei, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, and Eric P. Xing. Addressing the straggler problem for iterative convergent parallel ML. In Proceedings of the 7th ACM Symposium on Cloud Computing (SoCC), 2016.Google ScholarDigital Library
Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. Erasure Coding in Windows Azure Storage. In Proceedings of the 2012 USENIX Annual Technical Conference (ATC), 2012.Google ScholarDigital Library
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In Proceedings of the 2007 EuroSys Conference (EuroSys), 2007.Google Scholar
Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. Quincy: Fair Scheduling for Distributed Computing Clusters. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), 2009.Google Scholar
Tanakorn Leesatapornwongsa, Mingzhe Hao, Pallavi Joshi, Jeffrey F. Lukman, and Haryadi S. Gunawi. SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), 2014.Google ScholarDigital Library
Tanakorn Leesatapornwongsa, Jeffrey F. Lukman, Shan Lu, and Haryadi S. Gunawi. TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2016.Google ScholarDigital Library
Tanakorn Leesatapornwongsa, Cesar A. Stuardo, Riza O. Suminto, Huan Ke, Jeffrey F. Lukman, and Haryadi S. Gunawi. Scalability Bugs: When 100-Node Testing is Not Enough. In The 16th Workshop on Hot Topics in Operating Systems (HotOS XVII), 2017.Google ScholarDigital Library
Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble. Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency. In Proceedings of the 5th ACM Symposium on Cloud Computing (SoCC), 2014.Google ScholarDigital Library
David Lion, Adrian Chiu, Hailong Sun, Xin Zhuang, Nikola Grcevski, and Ding Yuan. Dont Get Caught in the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-Parallel Systems. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI), 2016.Google Scholar
Jonathan Mace, Peter Bodik, Rodrigo Fonseca, and Madanlal Musuvathi. Retro: Targeted Resource Management in Multi-tenant Distributed Systems. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI), 2015.Google Scholar
Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), 2015.Google ScholarDigital Library
Kristi Morton, Magdalena Balazinska, and Dan Grossman. ParaTimer: a progress indicator for MapReduce DAGs. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD), 2010.Google ScholarDigital Library
Kristi Morton, Abram L. Friesen, Magdalena Balazinska, and Dan Grossman. Estimating the progress of MapReduce pipelines. In Proceedings of the 26th International Conference on Data Engineering (ICDE), 2010.Google ScholarCross Ref
Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martin Abadi. Naiad: A Timely Dataflow System. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), 2013.Google ScholarDigital Library
Neda Nasiriani, Cheng Wang, George Kesidis, and Bhuvan Urgaonkar. Using Burstable Instances in the Public Cloud: When and How? 2016.Google Scholar
Michael Ovsiannikov, Silvius Rus, Damian Reeves, Paul Sutter, Sriram Rao, and Jim Kelly. The Quantcast File System. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB), 2013.Google Scholar
Maheswaran Sathiamoorthy, Megasthenis Asteris, Dimitris Papailiopoulos, Alexandros G. Dimakis, Ramkumar Vadali, Scott Chen, and Dhruba Borthakur. XORing Elephants: Novel Erasure Codes for Big Data. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB), 2013.Google Scholar
Lalith Suresh, Marco Canini, Stefan Schmid, and Anja Feldmann. C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI), 2015.Google ScholarDigital Library
Shivaram Venkataraman, Aurojit Panda, Ganesh Ananthanarayanan, Michael J. Franklin, and Ion Stoica. The Power of Choice in Data-Aware Cluster Scheduling. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), 2014.Google ScholarDigital Library
Guohui Wang and T. S. Eugene Ng. The impact of virtualization on network performance of amazon EC2 data center. In The 29th IEEE International Conference on Computer Communications (INFOCOM), 2010.Google ScholarCross Ref
Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike Hibler, Chad Barb, and Abhijeet Joglekar. An Integrated Experimental Environment for Distributed Systems and Networks. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI), 2002.Google ScholarDigital Library
Zhe Wu, Curtis Yu, and Harsha V. Madhyastha. CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Services. In Proceedings of the 12th Symposium on Networked Systems Design and Implementation (NSDI), 2015.Google ScholarDigital Library
Huaxia Xia and Andrew A. Chien. RobuSTore: Robust Performance for Distributed Storage Systems. In Proceedings of the 2007 Conference on High Performance Networking and Computing (SC), 2007.Google Scholar
Tianyin Xu, Jiaqi Zhang, Peng Huang, Jing Zheng, Tianwei Sheng, Ding Yuan, Yuanyuan Zhou, and Shankar Pasupathy. Do Not Blame Users for Misconfigurations. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), 2013.Google Scholar
Yunjing Xu, Zachary Musgrave, Brian Noble, and Michael Bailey. Bobtail: Avoiding Long Tails in the Cloud. In Proceedings of the 10th Symposium on Networked Systems Design and Implementation (NSDI), 2013.Google ScholarDigital Library
Neeraja J. Yadwadkar, Ganesh Ananthanarayanan, and Randy Katz. Wrangler: Predictable and Faster Jobs using Fewer Resources. In Proceedings of the 5th ACM Symposium on Cloud Computing (SoCC), 2014.Google ScholarDigital Library
Shiqin Yan, Huaicheng Li, Mingzhe Hao, Michael Hao Tong, Swaminathan Sundararaman, Andrew A. Chien, and Haryadi S. Gunawi. Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs. In Proceedings of the 15th USENIX Symposium on File and Storage Technologies (FAST), 2017.Google ScholarDigital Library
Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael Stonebraker. Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores. In Proceedings of the 40th International Conference on Very Large Data Bases (VLDB), 2014.Google ScholarDigital Library
Ding Yuan, Yu Luo, Xin Zhuang, Guilherme Renna Rodrigues, Xu Zhao, Yongle Zhang, Pranay U. Jain, and Michael Stumm. Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), 2014.Google Scholar
Matei Zaharia, Dhruba Borthakur, Joydeep Sen Sarma, Khaled Elmeleegy, Scott Shenker, and Ion Stoica. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In Proceedings of the 2010 EuroSys Conference (EuroSys), 2010.Google ScholarDigital Library
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proceedings of the 9th Symposium on Networked Systems Design and Implementation (NSDI), 2012.Google Scholar
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica. Improving MapReduce Performance in Heterogeneous Environments. In Proceedings of the 8th Symposium on Operating Systems Design and Implementation (OSDI), 2008.Google ScholarDigital Library
Ennan Zhai, Ruichuan Chen, David Isaac Wolinsky, and Bryan Ford. Heading Off Correlated Failures through Independence-as-a-Service. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), 2014.Google Scholar

Index Terms

PBSE: a robust path-based speculative execution for degraded-network tail tolerance in data-parallel frameworks
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
  2. Dependable and fault-tolerant systems and networks

Recommendations

A comparative between hadoop mapreduce and apache Spark on HDFS
IML '17: Proceedings of the 1st International Conference on Internet of Things and Machine Learning

Data is growing now in a very high speed with a large volume, Spark and MapReduce¹ both provide a processing model for analyzing and managing this large data -Big Data- stored on HDFS. In this paper, we discuss a comparative between Apache Spark and ...
Read More
Managing Variant Calling Files the Big Data Way: Using HDFS and Apache Parquet
BDCAT '17: Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies

Big Data has been seen as a remedy for the efficient management of the ever-increasing genomic data. In this paper, we investigate the use of Apache Spark to store and process Variant Calling Files (VCF) on a Hadoop cluster. We demonstrate Tomatula, a ...
Read More
A Two-Stage Data Processing Algorithm to Generate Random Sample Partitions for Big Data Analysis
Cloud Computing – CLOUD 2018
Abstract
To enable the individual data block files of a distributed big data set to be used as random samples for big data analysis, a two-stage data processing (TSDP) algorithm is proposed in this paper to convert a big data set into a random sample ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing
September 2017
672 pages
ISBN:9781450350280
DOI:10.1145/3127479

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 September 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
HDFS
apache spark
dependability
distributed systems
hadoop mapreduce
speculative execution
tail tolerance
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate169of722submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 300
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PBSE: a robust path-based speculative execution for degraded-network tail tolerance in data-parallel frameworks

SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A comparative between hadoop mapreduce and apache Spark on HDFS

Managing Variant Calling Files the Big Data Way: Using HDFS and Apache Parquet

A Two-Stage Data Processing Algorithm to Generate Random Sample Partitions for Big Data Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

PBSE: a robust path-based speculative execution for degraded-network tail tolerance in data-parallel frameworks

SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A comparative between hadoop mapreduce and apache Spark on HDFS

Managing Variant Calling Files the Big Data Way: Using HDFS and Apache Parquet

A Two-Stage Data Processing Algorithm to Generate Random Sample Partitions for Big Data Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media