skip to main content
10.1145/2443416.2443421acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications

Published:20 May 2012Publication History

ABSTRACT

Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely-coupled coarse-grained tasks, each comprising a tightly-coupled parallel function or program. "Many-task" programming models such as functional parallel dataflow may be used at the upper layer to generate massive numbers of tasks, each of which generates significant tighly-coupled parallelism at the lower level via multithreading, message passing, and/or partitioned global address spaces. At large scales, however, the management of task distribution, data dependencies, and inter-task data movement is a significant performance challenge. In this work, we describe Turbine, a new highly scalable and distributed many-task dataflow engine. Turbine executes a generalized many-task intermediate representation with automated self-distribution, and is scalable to multi-petaflop infrastructures. We present here the architecture of Turbine and its performance on highly concurrent systems.

References

  1. S. Ahuja, N. Carriero, and D. Gelernter. Linda and friends. IEEE Computer, 19(8):26--34, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. G. Armstrong, J. M. Wozniak, M. Wilde, K. Maheshwari, D. S. Katz, M. Ripeanu, E. L. Lusk, and I. T. Foster. ExM: High level dataflow programming for extreme-scale systems. Under review for HotPar 2012. ANL Preprint ANL/MCS-P2045-0212, available at http://www.mcs.anl.gov/publications.Google ScholarGoogle Scholar
  3. ASCAC Subcommittee on Exascale Computing. The opportunities and challenges of exascale computing, 2010. U.S. Dept. of Energy report.Google ScholarGoogle Scholar
  4. R. D. Blumofe and P. A. Lisiecki. Adaptive and reliable parallel computing on networks of workstations. In Proc. of Annual Conf. on USENIX, page 10, Berkeley, CA, USA, 1997. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, and J. Dongarra. DAGuE: A generic distributed DAG engine for high performance computing. In Proc. Intl. Parallel and Distributed Processing Symp., 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51:107--113, January 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. SIGOPS Oper. Syst. Rev., 41:205--220, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Deelman, T. Kosar, C. Kesselman, and M. Livny. What makes workflows work in an opportunistic environment? Concurrency and Computation: Practice and Experience, 18:1187--1199, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dinan, S. Krishnamoorthy, D. B. Larkins, J. Nieplocha, and P. Sadayappan. Scioto: A framework for global-view task parallelism. Intl. Conf. on Parallel Processing, 0:586--593, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: A runtime for iterative MapReduce. In Proc. of 19th ACM Intl. Symp. on High Performance Distributed Computing, HPDC '10, pages 810--818, New York, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Evans and A. Rzhetsky. Machine science. Science, 329(5990):399--400, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  12. B. Fitzpatrick. Distributed caching with memcached. Linux Journal, 2004:5--, August 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Hategan, J. Wozniak, and K. Maheshwari. Coasters: uniform resource provisioning and access for scientific computing on clouds and grids. In Proc. Utility and Cloud Computing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. SIGOPS Oper. Syst. Rev., 41:59--72, March 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. W. Jones, G. Hoogenboom, P. Wilkens, C. Porter, and G. Tsuji, editors. Decision Support System for Agrotechnology Transfer Version 4.0: Crop Model Documentation. University of Hawaii, 2003.Google ScholarGoogle Scholar
  16. A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44:35--40, April 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Z. Li and M. Parashar. Comet: A scalable coordination space for decentralized distributed environments. In 2nd Intl. Work. on Hot Topics in Peer-to-Peer Systems, HOT-P2P 2005, pages 104--111, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. L. Lusk, S. C. Pieper, and R. M. Butler. More scalability, less pain: A simple programming model and its implementation for extreme computing. SciDAC Review, 17:30--37, January 2010.Google ScholarGoogle Scholar
  19. M. D. McCool. Structured parallel programming with deterministic patterns. In Proc. HotPar, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. G. Murray and S. Hand. Scripting the cloud with Skywriting. In HotCloud '10: Proc. of 2nd USENIX Work. on Hot Topics in Cloud Computing, Boston, MA, USA, June 2010. USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. CIEL: a universal execution engine for distributed data-flow computing. In Proc. NSDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. In Proc. of 2008 ACM SIGMOD Intl. Conf. on Management of Data, SIGMOD '08, pages 1099--1110, New York, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with Sawzall. Scientific Programming, 13(4):277--298, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. I. Raicu, Z. Zhang, M. Wilde, I. Foster, P. Beckman, K. Iskra, and B. Clifford. Toward loosely coupled programming on petascale systems. In Proc. of 2008 ACM/IEEE Conf. on Supercomputing, SC '08, pages 22:1--22:12, Piscataway, NJ, 2008. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Redis. http://redis.io/.Google ScholarGoogle Scholar
  26. J. Shalf, J. Morrison, and S. Dosanj. Exascale computing technology challenges. VECPAR'2010, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Simon, T. Zacharia, and R. Stevens. Modeling and simulation at the exascale for energy and the environment, 2007. Report on the Advanced Scientific Computing Research Town Hall Meetings on Simulation and Modeling at the Exascale for Energy, Ecological Sustainability and Global Security (E3).Google ScholarGoogle Scholar
  28. R. Stevens and A. White. Architectures and technology for extreme scale computing, 2009. U.S. Dept. of Energy report.Google ScholarGoogle Scholar
  29. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow., 2:1626--1629, August 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. von Laszewski, I. Foster, J. Gawor, and P. Lane. A Java Commodity Grid Kit. Concurrency and Computation: Practice and Experience, 13(8-9), 2001.Google ScholarGoogle Scholar
  31. E. Walker, W. Xu, and V. Chandar. Composing and executing parallel data-flow graphs with shell pipes. In Work. on Workflows in Support of Large-Scale Science at SC'09, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. B. Welch, K. Jones, and J. Hobbs. Practical programming in Tcl and Tk. Prentice Hall, 4th edition, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Wilde, I. Foster, K. Iskra, P. Beckman, Z. Zhang, A. Espinosa, M. Hategan, B. Clifford, and I. Raicu. Parallel scripting for applications at the petascale and beyond. Computer, 42(11):50--60, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz, and I. Foster. Swift: A language for distributed parallel scripting. Parallel Computing, 37:633--652, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proc. of Symp. on Operating System Design and Implementation (OSDI), December 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Zaharia, N. M. M. Chowdhury, M. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. Technical Report UCB/EECS-2010-53, EECS Department, University of California, Berkeley, May 2010.Google ScholarGoogle Scholar

Index Terms

  1. Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications

    Recommendations

    Reviews

    Michael G. Murphy

    Harnessing the potential of high-performance computing (HPC) is an ongoing challenge. In this paper, the authors present a model in which dataflow programs utilize distributed memory evaluation in an extreme-scale computing environment while broadly spreading both the evaluation of programs and the generation of tasks. The model, Turbine, is a scalable dataflow engine built around distributed memory and message passing. The authors contend that allowing distributed execution of program fragments and processing of structural fragments can better exploit parallelism and concurrency. An effective introductory section sets the tone and introduces Turbine. The second section provides motivation by looking at several applications that can benefit from an engine like Turbine. The next two sections address the foundational approach for Turbine, which builds on the Swift parallel scripting language and the asynchronous distributed load balancing (ADLB) library with extensions for dataflow processing. These sections include a number of use cases. Section 5 focuses on implementation issues, including program structure and distribution of data storage. The next section covers performance issues related to task distribution, data operations, distributed data structures, and distributed iteration. The seventh section contrasts related work with the approach taken with Turbine. A conclusion and list of 36 key references close the paper. The authors have made a substantial contribution to HPC with Turbine, and this readable and well-organized paper, together with future results, should nudge the field in a promising direction. In particular, it will be interesting to see if Turbine can successfully migrate from Swift to other dataflow languages. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SWEET '12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
      May 2012
      58 pages
      ISBN:9781450318761
      DOI:10.1145/2443416

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 May 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate4of6submissions,67%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader