Abstract
Modern analytics applications use a diverse mix of libraries and functions. Unfortunately, there is no optimization across these libraries, resulting in performance penalties as high as an order of magnitude in many applications. To address this problem, we proposed Weld, a common runtime for existing data analytics libraries that performs key physical optimizations such as pipelining under existing, imperative library APIs. In this work, we further develop the Weld vision by designing an automatic adaptive optimizer for Weld applications, and evaluating its impact on realistic data science workloads. Our optimizer eliminates multiple forms of overhead that arise when composing imperative libraries like Pandas and NumPy, and uses lightweight measurements to make data-dependent decisions at run-time in ad-hoc workloads where no statistics are available, with sub-second overhead. We also evaluate which optimizations have the largest impact in practice and whether Weld can be integrated into libraries incrementally. Our results are promising: using our optimizer, Weld accelerates data science workloads by up to 23X on one thread and 80X on eight threads, and its adaptive optimizations provide up to a 3.75X speedup over rule-based optimization. Moreover, Weld provides benefits if even just 4--5 operators in a library are ported to use it. Our results show that common runtime designs like Weld may be a viable approach to accelerate analytics.
- M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. TensorFlow: A System for Large-Scale Machine Learning. In Proc. USENIX OSDI, pages 265--283, 2016. Google ScholarDigital Library
- S. Agarwal, D. Liu, and R. Xin. Apache Spark as a Compiler: Joining a Billion Rows per Second on a Laptop. https://databricks.com/blog/2016/05/23/, 2016.Google Scholar
- A. Alexandrov, A. Kunft, A. Katsifodimos, F. Schüler, L. Thamsen, O. Kao, T. Herb, and V. Markl. Implicit Parallelism Through Deep Language Embedding. In SIGMOD '15, 2015. Google ScholarDigital Library
- M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia. Spark SQL: Relational Data Processing in Spark. In Proc. ACM SIGMOD, pages 1383--1394, 2015. Google ScholarDigital Library
- Apache Arrow. https://arrow.apache.org/, 2018.Google Scholar
- G. E. Blelloch, J. C. Hardwick, S. Chatterjee, J. Sipelstein, and M. Zagha. Implementation of a Portable Nested Data-parallel Language. SIGPLAN Not., 28(7):102--111, 1993. Google ScholarDigital Library
- R. D. Blumenofe, C. F. Joerg, B. C. Kurzmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. Journal of Parallel and Distributed Computing, 37(1):55--69, 1996. Google ScholarDigital Library
- Bohrium. http://bohrium.readthedocs.io, 2018.Google Scholar
- K. J. Brown, H. Lee, T. Rompf, A. K. Sujeeth, C. De Sa, C. Aberger, and K. Olukotun. Have Abstraction and Eat Performance, Too: Optimized Heterogeneous Computing with Parallel Patterns. In Proceedings of the 2016 International Symposium on Code Generation and Optimization, CGO 2016, pages 194--205. ACM, 2016. Google ScholarDigital Library
- P. Buneman, L. Libkin, D. Suciu, V. Tannen, and L. Wong. Comprehension syntax. SIGMOD Rec., 23(1):87--96, March 1994. Google ScholarDigital Library
- Cran. https://cran.r-project.org, 2018.Google Scholar
- A. Crotty, A. Galakatos, K. Dursun, T. Kraska, C. Binnig, U. Cetintemel, and S. Zdonik. An Architecture for Compiling UDF-centric Workflows. PVLDB, 8(12):1466--1477, 2015. Google ScholarDigital Library
- CUDA. http://www.nvidia.com/object/cuda_home_new.html, 2018.Google Scholar
- P. Cudre-Mauroux, H. Kimura, K.-T. Lim, J. Rogers, R. Simakov, E. Soroush, P. Velikhov, D. L. Wang, M. Balazinska, J. Becla, D. DeWitt, B. Heath, D. Maier, S. Madden, J. Patel, M. Stonebraker, and S. Zdonik. A Demonstration of SciDB: A Science-oriented DBMS. PVLDB, 2(2):1534--1537, 2009. Google ScholarDigital Library
- Cython. http://cython.org, 2018.Google Scholar
- Pandas Cookbook Chapter 7: Cleaning Up Messy Data. https://github.com/jvns/pandas- cookbook/.Google Scholar
- Demand Paging. https://en.wikipedia.org/wiki/Demand_paging, 2018.Google Scholar
- A. Deshpande, Z. Ives, V. Raman, et al. Adaptive query processing. Foundations and Trends® in Databases, 1(1):1--140, 2007. Google ScholarDigital Library
- Flight Delays and Cancellations Dataset. https://www.kaggle.com/usdot/flight-delays/data.Google Scholar
- Gluon. https://gluon.mxnet.io.Google Scholar
- J. Goseme. Black Scholes Formula, 2013.Google Scholar
- G. Graefe. Encapsulation of Parallelism in the Volcano Query Processing System, volume 19. ACM, 1990. Google ScholarDigital Library
- T. Grust. Monad Comprehensions: A Versatile Representation for Queries, pages 288--311. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004.Google Scholar
- J. Hamrick. The Demise of for Loops. https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html.Google Scholar
- F. M. Harper and J. A. Konstan. The Movielens Datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4):19, 2016. Google ScholarDigital Library
- S. Heisler. A Beginner's Guide to Optimizing Pandas Code for Speed. goo.gl/dqwmrG, 2017.Google Scholar
- F. Hueske, M. Peters, A. Krettek, M. Ringwald, K. Tzoumas, V. Markl, and J.-C. Freytag. Peeking into the Optimization of Data Flow Programs with MapReduce-style UDFs. In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 1292--1295. IEEE, 2013. Google ScholarDigital Library
- HyPer Web Interface. http://hyper-db.de/interface.html, 2013.Google Scholar
- A. Kemper, F. Funke, H. Pirk, S. Manegold, U. Leser, M. Grund, T. Neumann, and M. Kersten. Cpu and cache efficient management of memory-resident databases. In Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013), ICDE '13, pages 14--25, Washington, DC, USA, 2013. IEEE Computer Society. Google ScholarDigital Library
- K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In International Workshop on Languages and Compilers for Parallel Computing, pages 301--320. Springer, 1993. Google ScholarDigital Library
- J. Kessenich. An introduction to SPIR-V. https://www.khronos.org/registry/spir-v/papers/WhitePaper.pdf, 2015.Google Scholar
- Y. Klonatos, C. Koch, T. Rompf, and H. Chafi. Building Efficient Query Engines in a High-level Language. PVLDB, 7(10):853--864, 2014. Google ScholarDigital Library
- C. Lattner and V. Adve. LLVM: a compilation framework for lifelong program analysis transformation. In Code Generation and Optimization, 2004. CGO 2004. International Symposium on, pages 75--86, 2004. Google ScholarDigital Library
- C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic Linear Algebra Subprograms for Fortran Usage. ACM Trans. Math. Softw., 5(3):308--323, 1979. Google ScholarDigital Library
- W. Liu. Python and Pandas Part 4: More Baby Names. http://beyondvalence.blogspot.com/2014/09/python-and-pandas-part-4-more-baby-names.html, 2014.Google Scholar
- Loop Unrolling. https://www.cs.umd.edu/class/fall2001/cmsc411/proj01/proja/loop.html, 2001.Google Scholar
- S. Maleki, Y. Gao, M. J. Garzar, T. Wong, D. A. Padua, et al. An evaluation of vectorizing compilers. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on, pages 372--382. IEEE, 2011. Google ScholarDigital Library
- S. Manegold, P. Boncz, and M. L. Kersten. Generic Database Cost Models for Hierarchical Memory Systems. In Proceedings of the 28th International Conference on Very Large Data Bases, VLDB '02, pages 191--202. VLDB Endowment, 2002. Google ScholarDigital Library
- W. McKinney. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, pages 51 -- 56, 2010.Google ScholarCross Ref
- Intel Math Kernel Library. https://software.intel.com/en-us/mkl, 2018.Google Scholar
- MNIST. http://yann.lecun.com/exdb/mnist/.Google Scholar
- T. Neumann. Efficiently Compiling Efficient Query Plans for Modern Hardware. PVLDB, 4(9):539--550, 2011. Google ScholarDigital Library
- NumPy. http://www.numpy.org/.Google Scholar
- NumPy Array Indexing. https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html, 2009.Google Scholar
- NYC Taxi Dataset. https://cloud.google.com/bigquery/public-data/nyc-tlc-trips.Google Scholar
- OpenMP. http://openmp.org/wp/.Google Scholar
- K. Ousterhout, R. Rasti, S. Ratnasamy, S. Shenker, and B.-G. Chun. Making Sense of Performance in Data Analytics Frameworks. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15), pages 293--307, 2015. Google ScholarDigital Library
- S. Palkar, J. J. Thomas, A. Shanbhag, D. Narayanan, H. Pirk, M. Schwarzkopf, S. Amarasinghe, and M. Zaharia. Weld: A Common Runtime for High Performance Analytics. In CIDR, 2017.Google Scholar
- H. Pirk, O. Moll, M. Zaharia, and S. Madden. Voodoo-A Vector Algebra for Portable Database Performance on Modern Hardware. PVLDB, 9(14):1707--1718, 2016. Google ScholarDigital Library
- Pypi. https://pypi.python.org, 2018.Google Scholar
- Pytorch. http://pytorch.org, 2018.Google Scholar
- L. Qiao, V. Raman, F. Reiss, P. J. Haas, and G. M. Lohman. Main-memory Scan Sharing for multi-core CPUs. PVLDB, 1(1):610--621, 2008. Google ScholarDigital Library
- 311 Service Requests Dataset. https://github.com/jvns/pandas-cookbook/blob/master/data/311-service-requests.csv.Google Scholar
- T. Rompf, A. K. Sujeeth, N. Amin, K. J. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing Data Structures in High-level Programs: New Directions for Extensible Compilers Based on Staging. In POPL '13, 2013. Google ScholarDigital Library
- T. K. Sellis. Multiple-query optimization. ACM Transactions on Database Systems (TODS), 13(1):23--52, 1988. Google ScholarDigital Library
- A. Shaikhha, Y. Klonatos, L. Parreaux, L. Brown, M. Dashti, and C. Koch. How to architect a query compiler. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD '16, pages 1907--1922, New York, NY, USA, 2016. ACM. Google ScholarDigital Library
- J. E. Stone, D. Gohara, and G. Shi. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. Computing in Science Engineering, 12(3):66--73, 2010. Google ScholarDigital Library
- N. Sundaram, N. Satish, M. M. A. Patwary, S. R. Dulloor, M. J. Anderson, S. G. Vadlamudi, D. Das, and P. Dubey. GraphMat: High Performance Graph Analytics Made Productive. PVLDB, 8(11):1214--1225, 2015. Google ScholarDigital Library
- TensorFlow XLA. https://www.tensorflow.org/performance/xla/, 2018.Google Scholar
- Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A System for General-purpose Distributed Data-parallel Computing Using a High-level Language. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI'08, pages 1--14, Berkeley, CA, USA, 2008. USENIX Association. Google ScholarDigital Library
- M. Zaharia, R. S. Xin, P. Wendell, T. Das, M. Armbrust, A. Dave, X. Meng, J. Rosen, S. Venkataraman, M. J. Franklin, A. Ghodsi, J. Gonzalez, S. Shenker, and I. Stoica. Apache Spark: A Unified Engine for Big Data Processing. Commun. ACM, 59(11):56--65, October 2016. Google ScholarDigital Library
Recommendations
SIMD parallel MCMC sampling with applications for big-data Bayesian analytics
Computational intensity and sequential nature of estimation techniques for Bayesian methods in statistics and machine learning, combined with their increasing applications for big data analytics, necessitate both the identification of potential ...
Characterizing Data Analytics Workloads on Intel Xeon Phi
IISWC '15: Proceedings of the 2015 IEEE International Symposium on Workload CharacterizationWith the growing computation demands of data analytics, heterogeneous architectures become popular for their support of high parallelism. Intel Xeon Phi, a many-core coprocessor originally designed for high performance computing applications, is ...
Comments