ABSTRACT
Increasing availability of multicore systems has led to greater focus on the design and implementation of languages for writing parallel programs. Such languages support various abstractions for parallelism, such as fork-join, async-finish, futures. While they may seem similar, these abstractions lead to different semantics, language design and implementation decisions, and can significantly impact the performance of end-user applications.
In this paper, we consider the question of whether it would be possible to unify various paradigms of parallel computing. To this end, we propose a calculus, called dag calculus, that can encode fork-join, async-finish, and futures, and possibly others. We describe dag calculus and its semantics, establish translations from the aforementioned paradigms into dag calculus. These translations establish that dag calculus is sufficiently powerful for encoding programs written in prevailing paradigms of parallelism. We present concurrent algorithms and data structures for realizing dag calculus on multicore hardware and prove that the proposed techniques are consistent with the semantics. Finally, we present an implementation of the calculus and evaluate it empirically by comparing its performance to highly optimized code from prior work. The results show that the calculus is expressive and that it competes well with, and sometimes outperforms, the state of the art.
- Folly: Facebook open-source library, 2015. https://github.com/ facebook/folly.Google Scholar
- Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. The data locality of work stealing. Theory of Computing Systems (TOCS), 35(3): 321–347, 2002.Google Scholar
- Umut A. Acar, Arthur Charguéraud, and Mike Rainey. Oracle scheduling: Controlling granularity in implicitly parallel languages. In ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), 2011. Google ScholarDigital Library
- Umut A. Acar, Arthur Charguéraud, and Mike Rainey. Scheduling parallel programs by work stealing with private deques. In PPoPP ’13, 2013. Google ScholarDigital Library
- Umut A. Acar, Arthur Chargueraud, and Mike Rainey. A work-e fficient algorithm for parallel unordered depth-first search. In ACM /IEEE Conference on High Performance Computing (SC), page 1, 2015. Google ScholarDigital Library
- Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2):115–144, 2001.Google ScholarCross Ref
- J. C. M. Baeten. A brief history of process algebra. Theory of Computing Science, 335(2-3):131–146, May 2005. ISSN 0304-3975. Google ScholarDigital Library
- Dave Berry, Robin Milner, and David N. Turner. A semantics for ML concurrency primitives. In Conference Record of the Nineteenth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Albuquerque, New Mexico, USA, January 19-22, 1992, pages 119–129, 1992. Google ScholarDigital Library
- Guy Blelloch and Margaret Reid-Miller. Pipelining with futures. Theory of Computing Systems, 32(3):213–239, 1999. ISSN 1433-0490.Google ScholarCross Ref
- Guy E. Blelloch and John Greiner. A provable time and space e fficient implementation of NESL. In Proceedings of the 1st ACM SIGPLAN International Conference on Functional Programming, pages 213–225. ACM, 1996. Google ScholarDigital Library
- Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. Internally deterministic parallel algorithms can be fast. In PPoPP ’12, pages 181–192, 2012. ISBN 978-1-4503-1160-1. Google ScholarDigital Library
- Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46:720–748, September 1999. ISSN 0004-5411. Google ScholarDigital Library
- Sebastian Burckhardt, Alexandro Baldassin, and Daan Leijen. Concurrent programming with revisions and isolation types. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’10, pages 691–707, 2010. ISBN 978-1-4503-0203-6. Google ScholarDigital Library
- Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, Gabriele Keller, and Simon Marlow. Data parallel Haskell: a status report. In Workshop on Declarative Aspects of Multicore Programming, DAMP ’07, pages 10–18, 2007. ISBN 978-1-59593-690-5. Google ScholarDigital Library
- Philippe Charles, Christian Grotho ff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA ’05, pages 519–538. ACM, 2005. ISBN 1- 59593-031-0. Google ScholarDigital Library
- Guojing Cong, Sreedhar B. Kodali, Sriram Krishnamoorthy, Doug Lea, Vijay A. Saraswat, and Tong Wen. Solving large, irregular graph problems using adaptive work-stealing. In ICPP, pages 536–545, 2008. Google ScholarDigital Library
- Faith Ellen, Yossi Lev, Victor Luchangco, and Mark Moir. Snzi: Scalable nonzero indicators. In Proceedings of the Twenty-sixth Annual ACM Symposium on Principles of Distributed Computing, PODC ’07, pages 13–22, 2007. ISBN 978-1-59593-616-5. Google ScholarDigital Library
- Matthew Fluet, Mike Rainey, John Reppy, and Adam Shaw. Implicitly threaded parallelism in Manticore. Journal of Functional Programming, 20(5-6):1–40, 2011. Google ScholarDigital Library
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, pages 212–223, 1998. Google ScholarDigital Library
- John Greiner and Guy E. Blelloch. A provably time-e fficient parallel implementation of full speculation. ACM Transactions on Programming Languages and Systems, 21(2):240–285, March 1999. ISSN 0164-0925. Google ScholarDigital Library
- Robert H. Halstead, Jr. Implementation of multilisp: Lisp on a multiprocessor. In Proceedings of the 1984 ACM Symposium on LISP and functional programming, LFP ’84, pages 9–17. ACM, 1984. ISBN 0-89791-142-3. Google ScholarDigital Library
- Robert Harper. Practical Foundations for Programming Languages. Cambridge University Press, New York, NY, USA, 2012. Google ScholarDigital Library
- ISBN 1107029570, 9781107029576.Google Scholar
- Maurice Herlihy and Zhiyu Liu. Well-structured futures and cache locality. ACM Transactions on Parallel Computing, 2(4):22:1–22:20, February 2016. ISSN 2329-4949. Google ScholarDigital Library
- Carl Hewitt, Peter Bishop, and Richard Steiger. A universal modular actor formalism for artificial intelligence. In Proceedings of the 3rd International Joint Conference on Artificial Intelligence, IJCAI’73, pages 235–245, 1973. Google ScholarDigital Library
- C. A. R. Hoare. Communicating sequential processes. Communications of the ACM, 21(8):666–677, August 1978. ISSN 0001-0782. Google ScholarDigital Library
- Shams Mahmood Imam and Vivek Sarkay. Habanero-java library: a java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ ’14, Cracow, Poland, September 23-26, 2014, pages 75–86, 2014. Google ScholarCross Ref
- Joseph Jaja. An introduction to parallel algorithms. Addison Wesley Longman Publishing Company, 1992. Google ScholarDigital Library
- Gabriele Keller, Manuel M.T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, and Ben Lippmeier. Regular, shape-polymorphic, parallel arrays in haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming, ICFP ’10, pages 261–272, 2010. ISBN 978-1-60558-794-3. Google ScholarDigital Library
- Lindsey Kuper and Ryan R Newton. Lvars: lattice-based data structures for deterministic parallelism. In Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing, pages 71–84. ACM, 2013. Google ScholarDigital Library
- Lindsey Kuper, Aaron Turon, Neelakantan R. Krishnaswami, and Ryan R. Newton. Freeze after writing: Quasi-deterministic parallel programming with lvars. In Proceedings of the 41st ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL ’14, pages 257–270, New York, NY, USA, 2014. ACM. ISBN 978-1- 4503-2544-8.. Google ScholarDigital Library
- Doug Lea. A java fork /join framework. In Proceedings of the ACM 2000 conference on Java Grande, JAVA ’00, pages 36–43, 2000. ISBN 1-58113-288-3. Google ScholarDigital Library
- Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt. The design of a task parallel library. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, OOPSLA ’09, pages 227–242, 2009. ISBN 978-1-60558- 766-0. Google ScholarDigital Library
- Simon Marlow. Parallel and concurrent programming in haskell. In Central European Functional Programming School - 4th Summer School, CEFP 2011, Budapest, Hungary, June 14-24, 2011, Revised Selected Papers, pages 339–401, 2011. Google ScholarDigital Library
- Robin Milner, Joachim Parrow, and David Walker. A calculus of mobile processes, i. Inf. Comput., 100(1):1–40, September 1992. ISSN 0890-5401. Google ScholarDigital Library
- Robin Milner, Joachim Parrow, and David Walker. A calculus of mobile processes, ii. Inf. Comput., 100(1):41–77, September 1992. ISSN 0890-5401. Google ScholarDigital Library
- Scott Owens, Susmit Sarkar, and Peter Sewell. A better x86 memory model: x86-tso. In Stefan Berghofer, Tobias Nipkow, Christian Urban, and Makarius Wenzel, editors, Theorem Proving in Higher Order Logics, volume 5674 of Lecture Notes in Computer Science, pages 391–407. Springer Berlin / Heidelberg, 2009. ISBN 978-3-642-03358- 2. Google ScholarDigital Library
- Jens Palsberg. Featherweight X10: a core calculus for async-finish parallelism. In Proceedings of the 14th Workshop on Formal Techniques for Java-like Programs, FTfJP 2012, Beijing, China, June 12, 2012, page 1, 2012. Google ScholarDigital Library
- Prakash Panangaden and John H. Reppy. ML with Concurrency, chapter The Essence of Concurrent ML, pages 5–29. CRC Press, 2005.Google Scholar
- Benjamin C. Pierce and David N. Turner. Proof, language, and interaction. chapter Pict: A Programming Language Based on the Pi-Calculus, pages 455–494. 2000. ISBN 0-262-16188-5.Google Scholar
- Antoniu Pop and Albert Cohen. Openstream: Expressiveness and dataflow compilation of openmp streaming programs. TACO’13, 9(4): 53:1–53:25, January 2013. ISSN 1544-3566. Google ScholarDigital Library
- Ram Raghunathan, Stefan K. Muller, Umut A. Acar, and Guy Blelloch. Hierarchical memory management for parallel programs. In ACM International Conference on Functional Programming, 2016. Google ScholarDigital Library
- John H. Reppy. Concurrent Programming in ML. Cambridge University Press, New York, NY, USA, 1999. ISBN 0-521-48089-2. Google ScholarDigital Library
- Mads Rosendahl. Automatic complexity analysis. In FPCA ’89: Functional Programming Languages and Computer Architecture, pages 144–156. ACM, 1989. Google ScholarDigital Library
- David Sands. Calculi for Time Analysis of Functional Programs. PhD thesis, University of London, Imperial College, September 1990.Google Scholar
- Patrick M. Sansom and Simon L. Peyton Jones. Time and space profiling for non-strict, higher-order functional languages. In Principles of Programming Languages, pages 355–366, 1995. Google ScholarDigital Library
- K. C. Sivaramakrishnan, Lukasz Ziarek, and Suresh Jagannathan. Multimlton: A multicore-aware runtime for standard ml. Journal of Functional Programming, FirstView:1–62, 6 2014. ISSN 1469-7653.Google Scholar
- Daniel Spoonhower, Guy E. Blelloch, Robert Harper, and Phillip B. Gibbons. Space profiling for parallel functional programs. Journal of Functional Programming, 20:417–461, 2010. ISSN 1469-7653. Google ScholarDigital Library
- Alexandros Tzannes, George C. Caragea, Rajeev Barua, and Uzi Vishkin. Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In PPoPP ’10, pages 179–190, 2010. Google ScholarDigital Library
- Alexandros Tzannes, George C. Caragea, Uzi Vishkin, and Rajeev Barua. Lazy scheduling: A runtime adaptive scheduler for declarative parallelism. TOPLAS, 36(3):10:1–10:51, September 2014. ISSN 0164- 0925.. Google ScholarDigital Library
- Introduction Background Dag Calculus Parallelism in the Dag Calculus Fork-Join Async-Finish Futures Representing the Computation Dag Implementation and Experiments Related Work ConclusionGoogle Scholar
Index Terms
- Dag-calculus: a calculus for parallel computation
Recommendations
Dag-calculus: a calculus for parallel computation
ICFP '16Increasing availability of multicore systems has led to greater focus on the design and implementation of languages for writing parallel programs. Such languages support various abstractions for parallelism, such as fork-join, async-finish, futures. ...
Computational Soundness of a Call by Name Calculus of Recursively-scoped Records
The paper presents a calculus of recursively-scoped records: a two-level calculus with a traditional call-by-name @l-calculus at a lower level and unordered collections of labeled @l-calculus terms at a higher level. Terms in records may reference each ...
The Bang Calculus: an untyped lambda-calculus generalizing call-by-name and call-by-value
PPDP '16: Proceedings of the 18th International Symposium on Principles and Practice of Declarative ProgrammingWe introduce and study the Bang Calculus, an untyped functional calculus in which the promotion operation of Linear Logic is made explicit and where application is a bilinear operation. This calculus, which can be understood as an untyped version of ...
Comments