ABSTRACT
We present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework. Automating such a translation is challenging: imperative updates must be translated into a functional MapReduce form in a manner that both preserves semantics and enables parallelism. Our approach works by first translating the input code into a functional representation, with loops succinctly represented by fold operations. Then, guided by rewrite rules, our system searches a space of equivalent programs for an effective MapReduce implementation. The rules include a novel technique for handling irregular loop-carried dependencies using group-by operations to enable greater parallelism. We have implemented our technique in a tool called Mold. It translates sequential Java code into code targeting the Apache Spark runtime. We evaluated Mold on several real-world kernels and found that in most cases Mold generated the desired MapReduce program, even for codes with complex indirect updates.
- Apache Hadoop. http://hadoop.apache.org. Accessed on 03/05/2014.Google Scholar
- Apache Spark. https://spark.apache.org. Accessed on 03/20/2014.Google Scholar
- Breeze. http://www.scalanlp.org. Accessed on 03/20/2014.Google Scholar
- Scala Parallel Collections. http://docs.scala-lang.org/overviews/parallel-collections/overview.html. Accessed on 03/20/2014.Google Scholar
- T. J. Watson Libraries for Analysis. http://wala.sf.net. Accessed: 2013-05-20.Google Scholar
- A. W. Appel. SSA is functional programming. SIGPLAN Not., 33(4):17--20, Apr. 1998. Google ScholarDigital Library
- R. S. Bird. Algebraic identities for program calculation. Comput. J., 32(2):122--126, Apr. 1989. Google ScholarDigital Library
- R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--478, Sept. 1994. Google ScholarDigital Library
- J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI'04, 2004. Google ScholarDigital Library
- D. Dig, M. Tarce, C. Radoi, M. Minea, and R. Johnson. Relooper: Refactoring for loop parallelism in java. OOPSLA '09, pp. 793--794, 2009. Google ScholarDigital Library
- L. Franklin, A. Gyori, J. Lahoda, and D. Dig. Lambdaficator: From imperative to functional programming through automated refactoring. ICSE '13, pp. 1287--1290, 2013. Google ScholarDigital Library
- S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loop-free programs. PLDI '11, pp. 62--73, 2011. Google ScholarDigital Library
- M. H. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. Supercomputing '95, 1995. Google ScholarDigital Library
- R. Joshi, G. Nelson, and K. Randall. Denali: A goal-directed superoptimizer. PLDI '02, pp. 304--314, 2002. Google ScholarDigital Library
- R. A. Kelsey. A correspondence between continuation passing style and static single assignment form. IR '95, pp. 13--22, 1995. Google ScholarDigital Library
- Y. Klonatos, A. Nötzli, A. Spielmann, C. Koch, and V. Kuncak. Automatic synthesis of out-of-core algorithms. SIGMOD '13, pp. 133--144, 2013. Google ScholarDigital Library
- K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL '98, pp. 107--120, 1998. Google ScholarDigital Library
- R. Lämmel. Google's MapReduce programming model - revisited. Science of Computer Programming, 70(1):1--30, 2008. Google ScholarDigital Library
- S.-w. Liao. Parallelizing user-defined and implicit reductions globally on multiprocessors. ACSAC'06, pp. 189--202, 2006. Google ScholarDigital Library
- E. Meijer, M. Fokkinga, and R. Paterson. Functional programming with bananas, lenses, envelopes and barbed wire. FPCA '91, pp. 124--144, 1991. Google ScholarDigital Library
- C. Nugteren and H. Corporaal. Introducing Bones: a parallelizing source-to-source compiler based on algorithmic skeletons. GPGPU-5, pp. 1--10, 2012. Google ScholarDigital Library
- B. C. Oliveira, A. Moors, and M. Odersky. Type classes as objects and implicits. OOPSLA '10, pp. 341--360, 2010. Google ScholarDigital Library
- N. Ramsey. Unparsing expressions with prefix and postfix operators. Software: Practice and Experience, 28(12):1327--1356, 1998. Google ScholarDigital Library
- C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for multi-core and multiprocessor systems. HPCA '07, pp. 13--24, 2007. Google ScholarDigital Library
- M. Ravishankar, J. Eisenlohr, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. Code generation for parallel execution of a class of irregular loops on distributed memory systems. SC '12, pp. 72:1--72:11, 2012. Google ScholarDigital Library
- E. Schkufza, R. Sharma, and A. Aiken. Stochastic superoptimization. ASPLOS '13, pp. 305--316, 2013. Google ScholarDigital Library
- A. M. Sloane. Lightweight language processing in Kiama. GTTSE III, pp. 408--425. Springer, 2011. Google ScholarDigital Library
- M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. PLDI '03, pp. 91--102, 2003. Google ScholarDigital Library
- S. d. Swierstra and O. Chitil. Linear, bounded, functional pretty-printing. J. Funct. Program., 19(1):1--16, Jan. 2009. Google ScholarDigital Library
- V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache Hadoop YARN: Yet another resource negotiator. SOCC '13, pp. 5:1--5:16, 2013. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. Hot-Cloud'10, pp. 10--10, 2010. Google ScholarDigital Library
Index Terms
- Translating imperative code to MapReduce
Recommendations
Translating imperative code to MapReduce
OOPSLA '14We present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework. Automating such a translation is challenging: imperative updates must be translated into a functional MapReduce form in a manner that ...
MapReduce: Review and open challenges
The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
A Transportable Programming Language (TPL) System. II. The Bifunctional Compiler System
The transportable programming language (TPL) method is a high-level-language approach that uses a bifunctional compiler to efficiently convert code among various dialects of a particular high-level language (HLL) via the hypothetical parent of the high-...
Comments