ABSTRACT
Distributed computing remains inaccessible to a large number of users, in spite of many open source platforms and extensive commercial offerings. While distributed computation frameworks have moved beyond a simple map-reduce model, many users are still left to struggle with complex cluster management and configuration tools, even for running simple embarrassingly parallel jobs. We argue that stateless functions represent a viable platform for these users, eliminating cluster management overhead, fulfilling the promise of elasticity. Furthermore, using our prototype implementation, PyWren, we show that this model is general enough to implement a number of distributed computing models, such as BSP, efficiently. Extrapolating from recent trends in network bandwidth and the advent of disaggregated storage, we suggest that stateless functions are a natural fit for data processing in future computing environments.
- Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al. Tensorflow: A system for large-scale machine learning. In OSDI (2016).Google ScholarDigital Library
- Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., et al. A view of cloud computing. CACM 53, 4 (2010), 50--58.Google ScholarDigital Library
- Asanovic, K., and Patterson, D. Firebox: A hardware building block for 2020 warehouse-scale computers. In FAST (2014).Google Scholar
- Serverless Reference Architecture: MapReduce. https://github.com/awslabs/lambda-refarch-mapreduce.Google Scholar
- Canny, J., and Zhao, H. Big data analytics with small footprint: Squaring the cloud. In KDD (2013).Google ScholarDigital Library
- Carriero, N., and Gelernter, D. Linda in context. CACM 32, 4 (Apr. 1989).Google ScholarDigital Library
- cloudpickle: Extended pickling support for python objects. https://github.com/cloudpipe/cloudpickle.Google Scholar
- Douze, M., Jégou, H., Sandhawalia, H., Amsaleg, L., and Schmid, C. Evaluation of gist descriptors for web-scale image search. In ACM International Conference on Image and Video Retrieval (2009).Google ScholarDigital Library
- IEEE P802.3ba, 40Gb/s and 100Gb/s Ethernet Task Force. http://www.ieee802.org/3/ba/.Google Scholar
- Fang, L., Nguyen, K., Xu, G., Demsky, B., and Lu, S. Interruptible tasks: Treating memory pressure as interrupts for highly scalable data-parallel programs. In SOSP (2015).Google ScholarDigital Library
- Fouladi, S., Wahby, R. S., Shacklett, B., Balasubramaniam, K. V., Zeng, W., Bhalerao, R., Sivaraman, A., Porter, G., and Winstein, K. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In NSDI (2017).Google Scholar
- G. Ananthanarayanan, A. Ghodsi, S. Shenker, I. Stoica. Disk-Locality in Datacenter Computing Considered Irrelevant. In Proc. HotOS (2011).Google Scholar
- Gao, P. X., Narayan, A., Karandikar, S., Carreira, J., Han, S., Agarwal, R., Ratnasamy, S., and Shenker, S. Network requirements for resource disaggregation. In OSDI (2016).Google ScholarDigital Library
- Han, S., Egi, N., Panda, A., Ratnasamy, S., Shi, G., and Shenker, S. Network support for resource disaggregation in next-generation datacenters. In HotNets (2013).Google ScholarDigital Library
- Han, S., and Ratnasamy, S. Large-scale computation not at the cost of expressiveness. In HotOS (2013).Google Scholar
- Hendrickson, S., Sturdevant, S., Harter, T., Venkataramani, V., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. Serverless computation with OpenLambda. In HotCloud (2016).Google ScholarDigital Library
- Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F. B., and Babu, S. Starfish: A self-tuning system for big data analytics. In CIDR (2011).Google Scholar
- Hettrick, S., Antonioletti, M., Carr, L., Chue Hong, N., Crouch, S., De Roure, D., Emsley, I., Goble, C., Hay, A., Inupakutika, D., Jackson, M., Nenadic, A., Parkinson, T., Parsons, M. I., Pawlik, A., Peru, G., Proeme, A., Robinson, J., and Sufi, S. Uk research software survey 2014. Dec. 2014. Google ScholarCross Ref
- Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A., Katz, R., Shenker, S., and Stoica, I. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In Proc. NSDI (2011).Google ScholarDigital Library
- HP The Machine: Our vision for the Future of Computing. https://www.labs.hpe.com/the-machine.Google Scholar
- Isard, M., Prabhakaran, V., Currey, J., Wieder, U., Talwar, K., and Goldberg, A. Quincy: Fair Scheduling for Distributed Computing Clusters. In Proc. SOSP (2009), pp. 261--276.Google ScholarDigital Library
- Lagar-Cavilla, H. A., Whitney, J. A., Scannell, A. M., Patchin, P., Rumble, S. M., de Lara, E., Brudno, M., and Satyanarayanan, M. Snowflock: Rapid virtual machine cloning for cloud computing. In EuroSys (2009).Google ScholarDigital Library
- Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., Long, J., Shekita, E. J., and Su, B.-Y. Scaling distributed machine learning with the parameter server. In OSDI (2014).Google ScholarDigital Library
- McAuley, J., Targett, C., Shi, Q., and Van Den Hengel, A. Image-based recommendations on styles and substitutes. In SIGIR (2015).Google ScholarDigital Library
- McSherry, F., Isard, M., and Murray, D. G. Scalability! but at what COST? In HotOS (2015).Google Scholar
- Momcheva, I., and Tollerud, E. Software Use in Astronomy: an Informal Survey. arXiv 1507.03989 (2015).Google Scholar
- Nightingale, E. B., Elson, J., Fan, J., Hofmann, O., Howell, J., and Suzue, Y. Flat datacenter storage. In OSDI (2012).Google ScholarDigital Library
- Niu, F., Recht, B., Re, C., and Wright, S. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In NIPS (2011).Google ScholarDigital Library
- Oliva, A., and Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of computer vision 42, 3 (2001), 145--175.Google Scholar
- O'Malley, O. TeraByte Sort on Apache Hadoop. http://sortbenchmark.org/YahooHadoop.pdf.Google Scholar
- OpenWhisk. https://developer.ibm.com/openwhisk/.Google Scholar
- Ousterhout, K., Panda, A., Rosen, J., Venkataraman, S., Xin, R., Ratnasamy, S., Shenker, S., and Stoica, I. The case for tiny tasks in compute clusters. In HotOS (2013).Google ScholarDigital Library
- Ousterhout, K., Wendell, P., Zaharia, M., and Stoica, I. Sparrow: distributed, low latency scheduling. In SOSP (2013).Google ScholarDigital Library
- Peng, D., and Dabek, F. Large-scale incremental processing using distributed transactions and notifications. In OSDI (2010).Google ScholarDigital Library
- Power, R., and Li, J. Piccolo: Building fast, distributed programs with partitioned tables. In OSDI (2010).Google ScholarDigital Library
- Redis server side scripting. https://redis.io/commands/eval.Google Scholar
- Redis benchmarks. https://redis.io/topics/benchmarks.Google Scholar
- Rumble, S. M., Ongaro, D., Stutsman, R., Rosenblum, M., and Ousterhout, J. K. It's Time for Low Latency. In Proc. HotOS (2011).Google ScholarDigital Library
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Li, F.-F. ImageNet Large Scale Visual Recognition Challenge. IJCV 115, 3 (2015), 211--252.Google Scholar
- Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., and Wilkes, J. Omega: flexible, scalable schedulers for large compute clusters. In Proc. EuroSys (2013).Google ScholarDigital Library
- Scott, C. Latency trends. http://colin-scott.github.io/blog/2012/12/24/latency-trends/.Google Scholar
- Shvachko, K., Kuang, H., Radia, S., and Chansler, R. The Hadoop Distributed File System. In Mass storage systems and technologies (MSST) (2010).Google ScholarDigital Library
- Sort Benchmark. http://sortbenchmark.org.Google Scholar
- Tuning Java Garbage Collection for Apache Spark Applications. https://goo.gl/SIWlqx.Google Scholar
- Tuning Spark. https://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning.Google Scholar
- Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al. Apache Hadoop YARN: Yet another resource negotiator. In SoCC (2013).Google ScholarDigital Library
- Venkataraman, S., Yang, Z., Franklin, M., Recht, B., and Stoica, I. Ernest: Efficient performance prediction for large-scale advanced analytics. In NSDI (2016).Google ScholarDigital Library
- X1 instances. https://aws.amazon.com/ec2/instance-types/x1/.Google Scholar
- Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S., and Stoica, I. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Proc. NSDI (2011).Google Scholar
Index Terms
- Occupy the cloud: distributed computing for the 99%
Recommendations
Serverless Data Analytics in the IBM Cloud
Middleware '18: Proceedings of the 19th International Middleware Conference IndustryUnexpectedly, the rise of serverless computing has also collaterally started the "democratization" of massive-scale data parallelism. This new trend heralded by PyWren pursues to enable untrained users to execute single-machine code in the cloud at ...
Distributed Double Machine Learning with a Serverless Architecture
ICPE '21: Companion of the ACM/SPEC International Conference on Performance EngineeringThis paper explores serverless cloud computing for double machine learning. Being based on repeated cross-fitting, double machine learning is particularly well suited to exploit the high level of parallelism achievable with serverless computing. It ...
Exploring the cost and performance benefits of AWS step functions using a data processing pipeline
UCC '21: Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud ComputingIn traditional cloud computing, dedicated hardware is substituted by dynamically allocated, utility-oriented resources such as virtualized servers. While cloud services are following the pay-as-you-go pricing model, resources are billed based on ...
Comments