research-article

Translating imperative code to MapReduce

Authors:
Cosmin Radoi

University of Illinois, Urbana, IL, USA

University of Illinois, Urbana, IL, USA
View Profile

,
Stephen J. Fink

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

,
Rodric Rabbah

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

,
Manu Sridharan

Samsung Research America, San Jose, CA, USA

Samsung Research America, San Jose, CA, USA
View Profile

OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & ApplicationsOctober 2014Pages 909–927https://doi.org/10.1145/2660193.2660228

Published:15 October 2014Publication History

OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications

Pages 909–927

ABSTRACT

We present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework. Automating such a translation is challenging: imperative updates must be translated into a functional MapReduce form in a manner that both preserves semantics and enables parallelism. Our approach works by first translating the input code into a functional representation, with loops succinctly represented by fold operations. Then, guided by rewrite rules, our system searches a space of equivalent programs for an effective MapReduce implementation. The rules include a novel technique for handling irregular loop-carried dependencies using group-by operations to enable greater parallelism. We have implemented our technique in a tool called Mold. It translates sequential Java code into code targeting the Apache Spark runtime. We evaluated Mold on several real-world kernels and found that in most cases Mold generated the desired MapReduce program, even for codes with complex indirect updates.

References

Apache Hadoop. http://hadoop.apache.org. Accessed on 03/05/2014.Google Scholar
Apache Spark. https://spark.apache.org. Accessed on 03/20/2014.Google Scholar
Breeze. http://www.scalanlp.org. Accessed on 03/20/2014.Google Scholar
Scala Parallel Collections. http://docs.scala-lang.org/overviews/parallel-collections/overview.html. Accessed on 03/20/2014.Google Scholar
T. J. Watson Libraries for Analysis. http://wala.sf.net. Accessed: 2013-05-20.Google Scholar
A. W. Appel. SSA is functional programming. SIGPLAN Not., 33(4):17--20, Apr. 1998. Google ScholarDigital Library
R. S. Bird. Algebraic identities for program calculation. Comput. J., 32(2):122--126, Apr. 1989. Google ScholarDigital Library
R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--478, Sept. 1994. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI'04, 2004. Google ScholarDigital Library
D. Dig, M. Tarce, C. Radoi, M. Minea, and R. Johnson. Relooper: Refactoring for loop parallelism in java. OOPSLA '09, pp. 793--794, 2009. Google ScholarDigital Library
L. Franklin, A. Gyori, J. Lahoda, and D. Dig. Lambdaficator: From imperative to functional programming through automated refactoring. ICSE '13, pp. 1287--1290, 2013. Google ScholarDigital Library
S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loop-free programs. PLDI '11, pp. 62--73, 2011. Google ScholarDigital Library
M. H. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. Supercomputing '95, 1995. Google ScholarDigital Library
R. Joshi, G. Nelson, and K. Randall. Denali: A goal-directed superoptimizer. PLDI '02, pp. 304--314, 2002. Google ScholarDigital Library
R. A. Kelsey. A correspondence between continuation passing style and static single assignment form. IR '95, pp. 13--22, 1995. Google ScholarDigital Library
Y. Klonatos, A. Nötzli, A. Spielmann, C. Koch, and V. Kuncak. Automatic synthesis of out-of-core algorithms. SIGMOD '13, pp. 133--144, 2013. Google ScholarDigital Library
K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL '98, pp. 107--120, 1998. Google ScholarDigital Library
R. Lämmel. Google's MapReduce programming model - revisited. Science of Computer Programming, 70(1):1--30, 2008. Google ScholarDigital Library
S.-w. Liao. Parallelizing user-defined and implicit reductions globally on multiprocessors. ACSAC'06, pp. 189--202, 2006. Google ScholarDigital Library
E. Meijer, M. Fokkinga, and R. Paterson. Functional programming with bananas, lenses, envelopes and barbed wire. FPCA '91, pp. 124--144, 1991. Google ScholarDigital Library
C. Nugteren and H. Corporaal. Introducing Bones: a parallelizing source-to-source compiler based on algorithmic skeletons. GPGPU-5, pp. 1--10, 2012. Google ScholarDigital Library
B. C. Oliveira, A. Moors, and M. Odersky. Type classes as objects and implicits. OOPSLA '10, pp. 341--360, 2010. Google ScholarDigital Library
N. Ramsey. Unparsing expressions with prefix and postfix operators. Software: Practice and Experience, 28(12):1327--1356, 1998. Google ScholarDigital Library
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for multi-core and multiprocessor systems. HPCA '07, pp. 13--24, 2007. Google ScholarDigital Library
M. Ravishankar, J. Eisenlohr, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. Code generation for parallel execution of a class of irregular loops on distributed memory systems. SC '12, pp. 72:1--72:11, 2012. Google ScholarDigital Library
E. Schkufza, R. Sharma, and A. Aiken. Stochastic superoptimization. ASPLOS '13, pp. 305--316, 2013. Google ScholarDigital Library
A. M. Sloane. Lightweight language processing in Kiama. GTTSE III, pp. 408--425. Springer, 2011. Google ScholarDigital Library
M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. PLDI '03, pp. 91--102, 2003. Google ScholarDigital Library
S. d. Swierstra and O. Chitil. Linear, bounded, functional pretty-printing. J. Funct. Program., 19(1):1--16, Jan. 2009. Google ScholarDigital Library
V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache Hadoop YARN: Yet another resource negotiator. SOCC '13, pp. 5:1--5:16, 2013. Google ScholarDigital Library
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. Hot-Cloud'10, pp. 10--10, 2010. Google ScholarDigital Library

Index Terms

Translating imperative code to MapReduce
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Source code generation
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Translating imperative code to MapReduce
OOPSLA '14

We present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework. Automating such a translation is challenging: imperative updates must be translated into a functional MapReduce form in a manner that ...
Read More
MapReduce: Review and open challenges

The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...
Read More
A Transportable Programming Language (TPL) System. II. The Bifunctional Compiler System

The transportable programming language (TPL) method is a high-level-language approach that uses a bifunctional compiler to efficiently convert code among various dialects of a particular high-level language (HLL) via the hypothetical parent of the high-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications
October 2014
946 pages
ISBN:9781450325851
DOI:10.1145/2660193
General Chair:
Andrew Black
Portland State University, USA
,
Program Chair:
Todd Millstein
University of California, Los Angeles, USA
ACM SIGPLAN Notices Volume 49, Issue 10
OOPSLA '14
October 2014
907 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2714064
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
functional
imperative
mapreduce
program translation
rewriting
scala
Qualifiers
- research-article
Conference

Acceptance Rates
OOPSLA '14 Paper Acceptance Rate52of186submissions,28%Overall Acceptance Rate268of1,244submissions,22%
More
Upcoming Conference
SPLASH '24

Sponsor:

sigplan

ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

October 20 - 25, 2024

Pasadena , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 523
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Translating imperative code to MapReduce

OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

Translating imperative code to MapReduce

MapReduce: Review and open challenges

A Transportable Programming Language (TPL) System. II. The Bifunctional Compiler System