skip to main content
10.1145/2660193.2660228acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Translating imperative code to MapReduce

Published:15 October 2014Publication History

ABSTRACT

We present an approach for automatic translation of sequential, imperative code into a parallel MapReduce framework. Automating such a translation is challenging: imperative updates must be translated into a functional MapReduce form in a manner that both preserves semantics and enables parallelism. Our approach works by first translating the input code into a functional representation, with loops succinctly represented by fold operations. Then, guided by rewrite rules, our system searches a space of equivalent programs for an effective MapReduce implementation. The rules include a novel technique for handling irregular loop-carried dependencies using group-by operations to enable greater parallelism. We have implemented our technique in a tool called Mold. It translates sequential Java code into code targeting the Apache Spark runtime. We evaluated Mold on several real-world kernels and found that in most cases Mold generated the desired MapReduce program, even for codes with complex indirect updates.

References

  1. Apache Hadoop. http://hadoop.apache.org. Accessed on 03/05/2014.Google ScholarGoogle Scholar
  2. Apache Spark. https://spark.apache.org. Accessed on 03/20/2014.Google ScholarGoogle Scholar
  3. Breeze. http://www.scalanlp.org. Accessed on 03/20/2014.Google ScholarGoogle Scholar
  4. Scala Parallel Collections. http://docs.scala-lang.org/overviews/parallel-collections/overview.html. Accessed on 03/20/2014.Google ScholarGoogle Scholar
  5. T. J. Watson Libraries for Analysis. http://wala.sf.net. Accessed: 2013-05-20.Google ScholarGoogle Scholar
  6. A. W. Appel. SSA is functional programming. SIGPLAN Not., 33(4):17--20, Apr. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. S. Bird. Algebraic identities for program calculation. Comput. J., 32(2):122--126, Apr. 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--478, Sept. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI'04, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Dig, M. Tarce, C. Radoi, M. Minea, and R. Johnson. Relooper: Refactoring for loop parallelism in java. OOPSLA '09, pp. 793--794, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Franklin, A. Gyori, J. Lahoda, and D. Dig. Lambdaficator: From imperative to functional programming through automated refactoring. ICSE '13, pp. 1287--1290, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Gulwani, S. Jha, A. Tiwari, and R. Venkatesan. Synthesis of loop-free programs. PLDI '11, pp. 62--73, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. H. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Detecting coarse-grain parallelism using an interprocedural parallelizing compiler. Supercomputing '95, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Joshi, G. Nelson, and K. Randall. Denali: A goal-directed superoptimizer. PLDI '02, pp. 304--314, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. A. Kelsey. A correspondence between continuation passing style and static single assignment form. IR '95, pp. 13--22, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Klonatos, A. Nötzli, A. Spielmann, C. Koch, and V. Kuncak. Automatic synthesis of out-of-core algorithms. SIGMOD '13, pp. 133--144, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Knobe and V. Sarkar. Array SSA form and its use in parallelization. POPL '98, pp. 107--120, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Lämmel. Google's MapReduce programming model - revisited. Science of Computer Programming, 70(1):1--30, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S.-w. Liao. Parallelizing user-defined and implicit reductions globally on multiprocessors. ACSAC'06, pp. 189--202, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Meijer, M. Fokkinga, and R. Paterson. Functional programming with bananas, lenses, envelopes and barbed wire. FPCA '91, pp. 124--144, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Nugteren and H. Corporaal. Introducing Bones: a parallelizing source-to-source compiler based on algorithmic skeletons. GPGPU-5, pp. 1--10, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. C. Oliveira, A. Moors, and M. Odersky. Type classes as objects and implicits. OOPSLA '10, pp. 341--360, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Ramsey. Unparsing expressions with prefix and postfix operators. Software: Practice and Experience, 28(12):1327--1356, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for multi-core and multiprocessor systems. HPCA '07, pp. 13--24, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Ravishankar, J. Eisenlohr, L.-N. Pouchet, J. Ramanujam, A. Rountev, and P. Sadayappan. Code generation for parallel execution of a class of irregular loops on distributed memory systems. SC '12, pp. 72:1--72:11, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Schkufza, R. Sharma, and A. Aiken. Stochastic superoptimization. ASPLOS '13, pp. 305--316, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. M. Sloane. Lightweight language processing in Kiama. GTTSE III, pp. 408--425. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. PLDI '03, pp. 91--102, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. d. Swierstra and O. Chitil. Linear, bounded, functional pretty-printing. J. Funct. Program., 19(1):1--16, Jan. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O'Malley, S. Radia, B. Reed, and E. Baldeschwieler. Apache Hadoop YARN: Yet another resource negotiator. SOCC '13, pp. 5:1--5:16, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. Hot-Cloud'10, pp. 10--10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Translating imperative code to MapReduce

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications
          October 2014
          946 pages
          ISBN:9781450325851
          DOI:10.1145/2660193
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 49, Issue 10
            OOPSLA '14
            October 2014
            907 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/2714064
            • Editor:
            • Andy Gill
            Issue’s Table of Contents

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 October 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          OOPSLA '14 Paper Acceptance Rate52of186submissions,28%Overall Acceptance Rate268of1,244submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader