research-article

A work-stealing scheduler for X10's task parallelism with suspension

Authors:
Olivier Tardieu

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

,
Haichuan Wang

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Haibo Lin

IBM Research - China, Beijing, China

IBM Research - China, Beijing, China
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 47 Issue 8August 2012pp 267–276https://doi.org/10.1145/2370036.2145850

Published:25 February 2012Publication History

ACM SIGPLAN Notices

Abstract

The X10 programming language is intended to ease the programming of scalable concurrent and distributed applications. X10 augments a familiar imperative object-oriented programming model with constructs to support light-weight asynchronous tasks as well as execution across multiple address spaces. A crucial aspect of X10's runtime system is the scheduling of concurrent tasks. Work-stealing schedulers have been shown to efficiently load balance fine-grain divide-and-conquer task-parallel program on SMPs and multicores. But X10 is not limited to shared-memory fork-join parallelism. X10 permits tasks to suspend and synchronize by means of conditional atomic blocks and remote task invocations.

In this paper, we demonstrate that work-stealing scheduling principles are applicable to a rich programming language such as X10, achieving performance at scale without compromising expressivity, ease of use, or portability. We design and implement a portable work-stealing execution engine for X10. While this engine is biased toward the efficient execution of fork-join parallelism in shared memory, it handles the full X10 language, especially conditional atomic blocks and distribution.

We show that this engine improves the run time of a series of benchmark programs by several orders of magnitude when used in combination with the C++ backend compiler and runtime for X10. It achieves scaling comparable to state-of-the art work-stealing scheduler implementations---the Cilk++ compiler and the Java fork/join framework---despite the dramatic increase in generality.

References

B. Alpern, C. R. Attanasio, J. J. Barton, M. G. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S. J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. F. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. C. Shepherd, S. E. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The jalapeno virtual machine. IBM Syst. J., pages 211--238, 2000. Google ScholarDigital Library
G. E. Blelloch. The problem-based benchmark suite. http://www.cs.cmu.edu/ guyb/PBBS.html.Google Scholar
G. E. Blelloch, P. B. Gibbons, G. J. Narlikar, and Y. Matias. Space-efficient scheduling of parallelism with synchronization variables. In Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, SPAA'97, pages 12--23, Newport, RI, USA, 1997. Google ScholarDigital Library
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46: 720--748, 1999. Google ScholarDigital Library
J. Brezin, S. J. Fink, B. Bloom, and C. Swart. An introduction to programming with X10. http://dist.codehaus.org/x10/documentation/guide/pguide.pdf.Google Scholar
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA'05, pages 519--538, San Diego, CA, USA, 2005. Google ScholarDigital Library
G. Cong, S. Kodali, S. Krishnamoorthy, D. Lea, V. Saraswat, and T. Wen. Solving large, irregular graph problems using adaptive work-stealing. In Processings of the 37th International Conference on Parallel Processing, ICPP'08, pages 536--545, Portland, OR, USA, 2008. Google ScholarDigital Library
S. J. Fink and F. Qian. Design, implementation and evaluation of adaptive recompilation with on-stack replacement. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, CGO'03, pages 241--252, San Francisco, CA, USA, 2003. Google ScholarDigital Library
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN conference on Programming language design and implementation, PLDI'98, pages 212--223, Montreal, QC, Canada, 1998. Google ScholarDigital Library
S. Ghemawat and P. Menage. TCMalloc : Thread-caching malloc. http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html.Google Scholar
Y. Guo, R. Barik, R. Raman, and V. Sarkar. Work-first and help-first scheduling policies for async-finish task parallelism. In Processings of the 23th IEEE International Parallel & Distributed Processing Symposium, IPDPS'09, pages 1--12, Rome, Italy, 2009. Google ScholarDigital Library
Y. Guo, J. Zhao, V. Cave, and V. Sarkar. SLAW: A scalable locality-aware adaptive work-stealing scheduler. In IEEE International Symposium on Parallel and Distributed Processing, IPDPS'10, pages 1--12, Atlanta, GA, USA, 2010.Google ScholarCross Ref
Intel. Intel cilk++ sdk. http://software.intel.com/en-us/articles/download-intel-cilk-sdk/.Google Scholar
P. Kambadur, A. Gupta, A. Ghoting, H. Avron, and A. Lumsdaine. PFunc: Modern task parallelism for modern high performance computing. In Proceedings of the ACM/IEEE conference on Supercomputing, SC'09, Portland, OR, USA, 2009. Google ScholarDigital Library
D. Lea. Concurrency jsr-166 interest site. http://g.oswego.edu/dl/concurrency-interest/.Google Scholar
D. Lea. A java fork/join framework. In Proceedings of the ACM conference on Java Grande, JAVA'00, pages 36--43, San Francisco, CA, USA, 2000. Google ScholarDigital Library
D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, OOPSLA'09, pages 227--242, Orlando, FL, USA, 2009. Google ScholarDigital Library
E. Mohr, D. A. Kranz, and R. H. Halstead, Jr. Lazy task creation: a technique for increasing the granularity of parallel programs. In Proceedings of the 1990 ACM conference on LISP and functional programming, LFP'90, pages 185--197, Nice, France, 1990. Google ScholarDigital Library
M. Odersky and al. An overview of the Scala programming language. Technical Report IC/2004/64, EPFL Lausanne, Switzerland, 2004.Google Scholar
R. Raman. Compiler support for work-stealing parallel runtime systems. Master's thesis, Rice University, Houston, Texas, 2009.Google Scholar
J. Reinders. Intel threading building blocks. O'Reilly & Associates, Inc., Sebastopol, CA, USA, first edition, 2007. ISBN 9780596514808. Google ScholarDigital Library
V. Saraswat, B. Bloom, I. Peshansky, O. Tardieu, and D. Grove. X10 language specification. http://dist.codehaus.org/x10/documentation/languagespec/x10-latest.pdf.Google Scholar
V. A. Saraswat, P. Kambadur, S. Kodali, D. Grove, and S. Krishnamoorthy. Lifeline-based global load balancing. In Proceedings of the 16th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP'11, pages 201--212, San Antonio, TX, USA, 2011. Google ScholarDigital Library
D. Spoonhower, G. E. Blelloch, P. B. Gibbons, and R. Harper. Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures. In Proceedings of the twenty-first annual ACM symposium on Parallelism in algorithms and architectures, SPAA'09, pages 91--100, Calgary, AB, Canada, 2009. Google ScholarDigital Library
The X10 team. The X10 distribution. http://sourceforge.net/projects/x10/.Google Scholar

Index Terms

A work-stealing scheduler for X10's task parallelism with suspension
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Concurrent programming structures
      2. Language types
        Parallel programming languages

Recommendations

A work-stealing scheduler for X10's task parallelism with suspension
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

The X10 programming language is intended to ease the programming of scalable concurrent and distributed applications. X10 augments a familiar imperative object-oriented programming model with constructs to support light-weight asynchronous tasks as well ...
Read More
Work-stealing without the baggage
OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications

Work-stealing is a promising approach for effectively exploiting software parallelism on parallel hardware. A programmer who uses work-stealing explicitly identifies potential parallelism and the runtime then schedules work, keeping otherwise idle ...
Read More
Work-stealing without the baggage
OOPSLA '12

Work-stealing is a promising approach for effectively exploiting software parallelism on parallel hardware. A programmer who uses work-stealing explicitly identifies potential parallelism and the runtime then schedules work, keeping otherwise idle ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 47, Issue 8
PPOPP '12
August 2012
334 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2370036
Issue’s Table of Contents
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
February 2012
352 pages
ISBN:9781450311601
DOI:10.1145/2145816
General Chair:
J. Ramanujam
Louisiana State University, USA
,
Program Chair:
P. Sadayappan
The Ohio State University, USA
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 February 2012
Check for updates
Author Tags
X10
scheduling
task parallelism
work-stealing
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 50
  Total Citations
  View Citations
- 738
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A work-stealing scheduler for X10's task parallelism with suspension

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

A work-stealing scheduler for X10's task parallelism with suspension

Work-stealing without the baggage

Work-stealing without the baggage