Occupy the cloud: distributed computing for the 99%

Authors:
Eric Jonas

University of California

University of California
View Profile

,
Qifan Pu

University of California

University of California
View Profile

,
Shivaram Venkataraman

University of California

University of California
View Profile

,
Ion Stoica

University of California

University of California
View Profile

,
Benjamin Recht

University of California

University of California
View Profile

SoCC '17: Proceedings of the 2017 Symposium on Cloud ComputingSeptember 2017Pages 445–451https://doi.org/10.1145/3127479.3128601

Published:24 September 2017Publication History

SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

Pages 445–451

ABSTRACT

Distributed computing remains inaccessible to a large number of users, in spite of many open source platforms and extensive commercial offerings. While distributed computation frameworks have moved beyond a simple map-reduce model, many users are still left to struggle with complex cluster management and configuration tools, even for running simple embarrassingly parallel jobs. We argue that stateless functions represent a viable platform for these users, eliminating cluster management overhead, fulfilling the promise of elasticity. Furthermore, using our prototype implementation, PyWren, we show that this model is general enough to implement a number of distributed computing models, such as BSP, efficiently. Extrapolating from recent trends in network bandwidth and the advent of disaggregated storage, we suggest that stateless functions are a natural fit for data processing in future computing environments.

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. Tensorflow: A system for large-scale machine learning. In OSDI (2016).Google ScholarDigital Library
Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., et al. A view of cloud computing. CACM 53, 4 (2010), 50--58.Google ScholarDigital Library
Asanovic, K., and Patterson, D. Firebox: A hardware building block for 2020 warehouse-scale computers. In FAST (2014).Google Scholar
Serverless Reference Architecture: MapReduce. https://github.com/awslabs/lambda-refarch-mapreduce.Google Scholar
Canny, J., and Zhao, H. Big data analytics with small footprint: Squaring the cloud. In KDD (2013).Google ScholarDigital Library
Carriero, N., and Gelernter, D. Linda in context. CACM 32, 4 (Apr. 1989).Google ScholarDigital Library
cloudpickle: Extended pickling support for python objects. https://github.com/cloudpipe/cloudpickle.Google Scholar
Douze, M., Jégou, H., Sandhawalia, H., Amsaleg, L., and Schmid, C. Evaluation of gist descriptors for web-scale image search. In ACM International Conference on Image and Video Retrieval (2009).Google ScholarDigital Library
IEEE P802.3ba, 40Gb/s and 100Gb/s Ethernet Task Force. http://www.ieee802.org/3/ba/.Google Scholar
Fang, L., Nguyen, K., Xu, G., Demsky, B., and Lu, S. Interruptible tasks: Treating memory pressure as interrupts for highly scalable data-parallel programs. In SOSP (2015).Google ScholarDigital Library
Fouladi, S., Wahby, R. S., Shacklett, B., Balasubramaniam, K. V., Zeng, W., Bhalerao, R., Sivaraman, A., Porter, G., and Winstein, K. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In NSDI (2017).Google Scholar
G. Ananthanarayanan, A. Ghodsi, S. Shenker, I. Stoica. Disk-Locality in Datacenter Computing Considered Irrelevant. In Proc. HotOS (2011).Google Scholar
Gao, P. X., Narayan, A., Karandikar, S., Carreira, J., Han, S., Agarwal, R., Ratnasamy, S., and Shenker, S. Network requirements for resource disaggregation. In OSDI (2016).Google ScholarDigital Library
Han, S., Egi, N., Panda, A., Ratnasamy, S., Shi, G., and Shenker, S. Network support for resource disaggregation in next-generation datacenters. In HotNets (2013).Google ScholarDigital Library
Han, S., and Ratnasamy, S. Large-scale computation not at the cost of expressiveness. In HotOS (2013).Google Scholar
Hendrickson, S., Sturdevant, S., Harter, T., Venkataramani, V., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. Serverless computation with OpenLambda. In HotCloud (2016).Google ScholarDigital Library
Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F. B., and Babu, S. Starfish: A self-tuning system for big data analytics. In CIDR (2011).Google Scholar
Hettrick, S., Antonioletti, M., Carr, L., Chue Hong, N., Crouch, S., De Roure, D., Emsley, I., Goble, C., Hay, A., Inupakutika, D., Jackson, M., Nenadic, A., Parkinson, T., Parsons, M. I., Pawlik, A., Peru, G., Proeme, A., Robinson, J., and Sufi, S. Uk research software survey 2014. Dec. 2014. Google ScholarCross Ref
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A., Katz, R., Shenker, S., and Stoica, I. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In Proc. NSDI (2011).Google ScholarDigital Library
HP The Machine: Our vision for the Future of Computing. https://www.labs.hpe.com/the-machine.Google Scholar
Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., and Goldberg, A. Quincy: Fair Scheduling for Distributed Computing Clusters. In Proc. SOSP (2009), pp. 261--276.Google ScholarDigital Library
Lagar-Cavilla, H. A., Whitney, J. A., Scannell, A. M., Patchin, P., Rumble, S. M., de Lara, E., Brudno, M., and Satyanarayanan, M. Snowflock: Rapid virtual machine cloning for cloud computing. In EuroSys (2009).Google ScholarDigital Library
Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., Long, J., Shekita, E. J., and Su, B.-Y. Scaling distributed machine learning with the parameter server. In OSDI (2014).Google ScholarDigital Library
McAuley, J., Targett, C., Shi, Q., and Van Den Hengel, A. Image-based recommendations on styles and substitutes. In SIGIR (2015).Google ScholarDigital Library
McSherry, F., Isard, M., and Murray, D. G. Scalability! but at what COST? In HotOS (2015).Google Scholar
Momcheva, I., and Tollerud, E. Software Use in Astronomy: an Informal Survey. arXiv 1507.03989 (2015).Google Scholar
Nightingale, E. B., Elson, J., Fan, J., Hofmann, O., Howell, J., and Suzue, Y. Flat datacenter storage. In OSDI (2012).Google ScholarDigital Library
Niu, F., Recht, B., Re, C., and Wright, S. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In NIPS (2011).Google ScholarDigital Library
Oliva, A., and Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of computer vision 42, 3 (2001), 145--175.Google Scholar
O'Malley, O. TeraByte Sort on Apache Hadoop. http://sortbenchmark.org/YahooHadoop.pdf.Google Scholar
OpenWhisk. https://developer.ibm.com/openwhisk/.Google Scholar
Ousterhout, K., Panda, A., Rosen, J., Venkataraman, S., Xin, R., Ratnasamy, S., Shenker, S., and Stoica, I. The case for tiny tasks in compute clusters. In HotOS (2013).Google ScholarDigital Library
Ousterhout, K., Wendell, P., Zaharia, M., and Stoica, I. Sparrow: distributed, low latency scheduling. In SOSP (2013).Google ScholarDigital Library
Peng, D., and Dabek, F. Large-scale incremental processing using distributed transactions and notifications. In OSDI (2010).Google ScholarDigital Library
Power, R., and Li, J. Piccolo: Building fast, distributed programs with partitioned tables. In OSDI (2010).Google ScholarDigital Library
Redis server side scripting. https://redis.io/commands/eval.Google Scholar
Redis benchmarks. https://redis.io/topics/benchmarks.Google Scholar
Rumble, S. M., Ongaro, D., Stutsman, R., Rosenblum, M., and Ousterhout, J. K. It's Time for Low Latency. In Proc. HotOS (2011).Google ScholarDigital Library
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Li, F.-F. ImageNet Large Scale Visual Recognition Challenge. IJCV 115, 3 (2015), 211--252.Google Scholar
Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., and Wilkes, J. Omega: flexible, scalable schedulers for large compute clusters. In Proc. EuroSys (2013).Google ScholarDigital Library
Scott, C. Latency trends. http://colin-scott.github.io/blog/2012/12/24/latency-trends/.Google Scholar
Shvachko, K., Kuang, H., Radia, S., and Chansler, R. The Hadoop Distributed File System. In Mass storage systems and technologies (MSST) (2010).Google ScholarDigital Library
Sort Benchmark. http://sortbenchmark.org.Google Scholar
Tuning Java Garbage Collection for Apache Spark Applications. https://goo.gl/SIWlqx.Google Scholar
Tuning Spark. https://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning.Google Scholar
Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al. Apache Hadoop YARN: Yet another resource negotiator. In SoCC (2013).Google ScholarDigital Library
Venkataraman, S., Yang, Z., Franklin, M., Recht, B., and Stoica, I. Ernest: Efficient performance prediction for large-scale advanced analytics. In NSDI (2016).Google ScholarDigital Library
X1 instances. https://aws.amazon.com/ec2/instance-types/x1/.Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S., and Stoica, I. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proc. NSDI (2011).Google Scholar

Index Terms

Occupy the cloud: distributed computing for the 99%
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed programming languages

Recommendations

Serverless Data Analytics in the IBM Cloud
Middleware '18: Proceedings of the 19th International Middleware Conference Industry

Unexpectedly, the rise of serverless computing has also collaterally started the "democratization" of massive-scale data parallelism. This new trend heralded by PyWren pursues to enable untrained users to execute single-machine code in the cloud at ...
Read More
Distributed Double Machine Learning with a Serverless Architecture
ICPE '21: Companion of the ACM/SPEC International Conference on Performance Engineering

This paper explores serverless cloud computing for double machine learning. Being based on repeated cross-fitting, double machine learning is particularly well suited to exploit the high level of parallelism achievable with serverless computing. It ...
Read More
Exploring the cost and performance benefits of AWS step functions using a data processing pipeline
UCC '21: Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing

In traditional cloud computing, dedicated hardware is substituted by dynamically allocated, utility-oriented resources such as virtualized servers. While cloud services are following the pay-as-you-go pricing model, resources are billed based on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing
September 2017
672 pages
ISBN:9781450350280
DOI:10.1145/3127479

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 September 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Paper
Author Tags
AWS lambda
PyWren
distributed computing
serverless
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate169of722submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 290
  Total Citations
  View Citations
- 4,250
  Total Downloads
- Downloads (Last 12 months)598
- Downloads (Last 6 weeks)78
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Occupy the cloud: distributed computing for the 99%

SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Serverless Data Analytics in the IBM Cloud

Distributed Double Machine Learning with a Serverless Architecture

Exploring the cost and performance benefits of AWS step functions using a data processing pipeline