research-article

Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications

Authors:
Justin M. Wozniak

Argonne National Laboratory Argonne, IL

Argonne National Laboratory Argonne, IL
View Profile

,
Timothy G. Armstrong

University of Chicago, Chicago, IL

University of Chicago, Chicago, IL
View Profile

,
Ketan Maheshwari

Argonne National Laboratory Argonne, IL

Argonne National Laboratory Argonne, IL
View Profile

,
Ewing L. Lusk

Argonne National Laboratory Argonne, IL

Argonne National Laboratory Argonne, IL
View Profile

,
Daniel S. Katz

University of Chicago & Argonne National Laboratory Chicago, IL

University of Chicago & Argonne National Laboratory Chicago, IL
View Profile

,
Michael Wilde

Argonne National Laboratory Argonne, IL

Argonne National Laboratory Argonne, IL
View Profile

,
Ian T. Foster

Argonne National Laboratory Argonne, IL

Argonne National Laboratory Argonne, IL
View Profile

SWEET '12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and TechnologiesMay 2012Article No.: 5Pages 1–12https://doi.org/10.1145/2443416.2443421

Published:20 May 2012Publication History

SWEET '12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies

Pages 1–12

ABSTRACT

Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely-coupled coarse-grained tasks, each comprising a tightly-coupled parallel function or program. "Many-task" programming models such as functional parallel dataflow may be used at the upper layer to generate massive numbers of tasks, each of which generates significant tighly-coupled parallelism at the lower level via multithreading, message passing, and/or partitioned global address spaces. At large scales, however, the management of task distribution, data dependencies, and inter-task data movement is a significant performance challenge. In this work, we describe Turbine, a new highly scalable and distributed many-task dataflow engine. Turbine executes a generalized many-task intermediate representation with automated self-distribution, and is scalable to multi-petaflop infrastructures. We present here the architecture of Turbine and its performance on highly concurrent systems.

References

S. Ahuja, N. Carriero, and D. Gelernter. Linda and friends. IEEE Computer, 19(8):26--34, 1986. Google ScholarDigital Library
T. G. Armstrong, J. M. Wozniak, M. Wilde, K. Maheshwari, D. S. Katz, M. Ripeanu, E. L. Lusk, and I. T. Foster. ExM: High level dataflow programming for extreme-scale systems. Under review for HotPar 2012. ANL Preprint ANL/MCS-P2045-0212, available at http://www.mcs.anl.gov/publications.Google Scholar
ASCAC Subcommittee on Exascale Computing. The opportunities and challenges of exascale computing, 2010. U.S. Dept. of Energy report.Google Scholar
R. D. Blumofe and P. A. Lisiecki. Adaptive and reliable parallel computing on networks of workstations. In Proc. of Annual Conf. on USENIX, page 10, Berkeley, CA, USA, 1997. USENIX Association. Google ScholarDigital Library
G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, and J. Dongarra. DAGuE: A generic distributed DAG engine for high performance computing. In Proc. Intl. Parallel and Distributed Processing Symp., 2011. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51:107--113, January 2008. Google ScholarDigital Library
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. SIGOPS Oper. Syst. Rev., 41:205--220, Oct. 2007. Google ScholarDigital Library
E. Deelman, T. Kosar, C. Kesselman, and M. Livny. What makes workflows work in an opportunistic environment? Concurrency and Computation: Practice and Experience, 18:1187--1199, 2006. Google ScholarDigital Library
J. Dinan, S. Krishnamoorthy, D. B. Larkins, J. Nieplocha, and P. Sadayappan. Scioto: A framework for global-view task parallelism. Intl. Conf. on Parallel Processing, 0:586--593, 2008. Google ScholarDigital Library
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: A runtime for iterative MapReduce. In Proc. of 19th ACM Intl. Symp. on High Performance Distributed Computing, HPDC '10, pages 810--818, New York, 2010. ACM. Google ScholarDigital Library
J. Evans and A. Rzhetsky. Machine science. Science, 329(5990):399--400, 2010.Google ScholarCross Ref
B. Fitzpatrick. Distributed caching with memcached. Linux Journal, 2004:5--, August 2004. Google ScholarDigital Library
M. Hategan, J. Wozniak, and K. Maheshwari. Coasters: uniform resource provisioning and access for scientific computing on clouds and grids. In Proc. Utility and Cloud Computing, 2011. Google ScholarDigital Library
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. SIGOPS Oper. Syst. Rev., 41:59--72, March 2007. Google ScholarDigital Library
J. W. Jones, G. Hoogenboom, P. Wilkens, C. Porter, and G. Tsuji, editors. Decision Support System for Agrotechnology Transfer Version 4.0: Crop Model Documentation. University of Hawaii, 2003.Google Scholar
A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44:35--40, April 2010. Google ScholarDigital Library
Z. Li and M. Parashar. Comet: A scalable coordination space for decentralized distributed environments. In 2nd Intl. Work. on Hot Topics in Peer-to-Peer Systems, HOT-P2P 2005, pages 104--111, 2005. Google ScholarDigital Library
E. L. Lusk, S. C. Pieper, and R. M. Butler. More scalability, less pain: A simple programming model and its implementation for extreme computing. SciDAC Review, 17:30--37, January 2010.Google Scholar
M. D. McCool. Structured parallel programming with deterministic patterns. In Proc. HotPar, 2010. Google ScholarDigital Library
D. G. Murray and S. Hand. Scripting the cloud with Skywriting. In HotCloud '10: Proc. of 2nd USENIX Work. on Hot Topics in Cloud Computing, Boston, MA, USA, June 2010. USENIX. Google ScholarDigital Library
D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. CIEL: a universal execution engine for distributed data-flow computing. In Proc. NSDI, 2011. Google ScholarDigital Library
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: A not-so-foreign language for data processing. In Proc. of 2008 ACM SIGMOD Intl. Conf. on Management of Data, SIGMOD '08, pages 1099--1110, New York, 2008. ACM. Google ScholarDigital Library
R. Pike, S. Dorward, R. Griesemer, and S. Quinlan. Interpreting the data: Parallel analysis with Sawzall. Scientific Programming, 13(4):277--298, 2005. Google ScholarDigital Library
I. Raicu, Z. Zhang, M. Wilde, I. Foster, P. Beckman, K. Iskra, and B. Clifford. Toward loosely coupled programming on petascale systems. In Proc. of 2008 ACM/IEEE Conf. on Supercomputing, SC '08, pages 22:1--22:12, Piscataway, NJ, 2008. IEEE Press. Google ScholarDigital Library
Redis. http://redis.io/.Google Scholar
J. Shalf, J. Morrison, and S. Dosanj. Exascale computing technology challenges. VECPAR'2010, 2010. Google ScholarDigital Library
H. Simon, T. Zacharia, and R. Stevens. Modeling and simulation at the exascale for energy and the environment, 2007. Report on the Advanced Scientific Computing Research Town Hall Meetings on Simulation and Modeling at the Exascale for Energy, Ecological Sustainability and Global Security (E3).Google Scholar
R. Stevens and A. White. Architectures and technology for extreme scale computing, 2009. U.S. Dept. of Energy report.Google Scholar
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow., 2:1626--1629, August 2009. Google ScholarDigital Library
G. von Laszewski, I. Foster, J. Gawor, and P. Lane. A Java Commodity Grid Kit. Concurrency and Computation: Practice and Experience, 13(8-9), 2001.Google Scholar
E. Walker, W. Xu, and V. Chandar. Composing and executing parallel data-flow graphs with shell pipes. In Work. on Workflows in Support of Large-Scale Science at SC'09, 2009. Google ScholarDigital Library
B. B. Welch, K. Jones, and J. Hobbs. Practical programming in Tcl and Tk. Prentice Hall, 4th edition, 2003. Google ScholarDigital Library
M. Wilde, I. Foster, K. Iskra, P. Beckman, Z. Zhang, A. Espinosa, M. Hategan, B. Clifford, and I. Raicu. Parallel scripting for applications at the petascale and beyond. Computer, 42(11):50--60, 2009. Google ScholarDigital Library
M. Wilde, M. Hategan, J. M. Wozniak, B. Clifford, D. S. Katz, and I. Foster. Swift: A language for distributed parallel scripting. Parallel Computing, 37:633--652, 2011. Google ScholarDigital Library
Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proc. of Symp. on Operating System Design and Implementation (OSDI), December 2008. Google ScholarDigital Library
M. Zaharia, N. M. M. Chowdhury, M. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. Technical Report UCB/EECS-2010-53, EECS Department, University of California, Berkeley, May 2010.Google Scholar

Index Terms

Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Concurrent programming structures

Recommendations

Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications
Scalable Workflow Enactment Engines and Technology

Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely coupled coarse-grained tasks, each ...
Read More
Swift/T: scalable data flow programming for many-task applications
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Swift/T, a novel programming language implementation for highly scalable data flow programs, is presented.

Read More
Dataflow coordination of data-parallel tasks via MPI 3.0
EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting

Scientific applications are often complex collections of many large-scale tasks. Mature tools exist for describing task-parallel workflows consisting of serial tasks, and a variety of tools exist for programming a single data-parallel operation. However,...
Read More

Reviews

Reviewer: Michael G. Murphy

Harnessing the potential of high-performance computing (HPC) is an ongoing challenge. In this paper, the authors present a model in which dataflow programs utilize distributed memory evaluation in an extreme-scale computing environment while broadly spreading both the evaluation of programs and the generation of tasks. The model, Turbine, is a scalable dataflow engine built around distributed memory and message passing. The authors contend that allowing distributed execution of program fragments and processing of structural fragments can better exploit parallelism and concurrency. An effective introductory section sets the tone and introduces Turbine. The second section provides motivation by looking at several applications that can benefit from an engine like Turbine. The next two sections address the foundational approach for Turbine, which builds on the Swift parallel scripting language and the asynchronous distributed load balancing (ADLB) library with extensions for dataflow processing. These sections include a number of use cases. Section 5 focuses on implementation issues, including program structure and distribution of data storage. The next section covers performance issues related to task distribution, data operations, distributed data structures, and distributed iteration. The seventh section contrasts related work with the approach taken with Turbine. A conclusion and list of 36 key references close the paper. The authors have made a substantial contribution to HPC with Turbine, and this readable and well-organized paper, together with future results, should nudge the field in a promising direction. In particular, it will be interesting to see if Turbine can successfully migrate from Swift to other dataflow languages. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SWEET '12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
May 2012
58 pages
ISBN:9781450318761
DOI:10.1145/2443416
Program Chairs:
Jan Hidders
TU Delft, The Netherlands
,
Paolo Missier
Newcastle University, UK
,
Jacek Sroka
University of Warsaw, Poland
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 May 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ADLB
MPI
concurrency
dataflow
exascale
swift
turbine
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate4of6submissions,67%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 220
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications

SWEET '12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies

ABSTRACT

References

Cited By

Index Terms

Recommendations

Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications

Swift/T: scalable data flow programming for many-task applications

Dataflow coordination of data-parallel tasks via MPI 3.0

Reviews

Access critical reviews of Computing literature here