Abstract
The X10 programming language is intended to ease the programming of scalable concurrent and distributed applications. X10 augments a familiar imperative object-oriented programming model with constructs to support light-weight asynchronous tasks as well as execution across multiple address spaces. A crucial aspect of X10's runtime system is the scheduling of concurrent tasks. Work-stealing schedulers have been shown to efficiently load balance fine-grain divide-and-conquer task-parallel program on SMPs and multicores. But X10 is not limited to shared-memory fork-join parallelism. X10 permits tasks to suspend and synchronize by means of conditional atomic blocks and remote task invocations.
In this paper, we demonstrate that work-stealing scheduling principles are applicable to a rich programming language such as X10, achieving performance at scale without compromising expressivity, ease of use, or portability. We design and implement a portable work-stealing execution engine for X10. While this engine is biased toward the efficient execution of fork-join parallelism in shared memory, it handles the full X10 language, especially conditional atomic blocks and distribution.
We show that this engine improves the run time of a series of benchmark programs by several orders of magnitude when used in combination with the C++ backend compiler and runtime for X10. It achieves scaling comparable to state-of-the art work-stealing scheduler implementations---the Cilk++ compiler and the Java fork/join framework---despite the dramatic increase in generality.
- B. Alpern, C. R. Attanasio, J. J. Barton, M. G. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S. J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. F. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. C. Shepherd, S. E. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The jalapeno virtual machine. IBM Syst. J., pages 211--238, 2000. Google ScholarDigital Library
- G. E. Blelloch. The problem-based benchmark suite. http://www.cs.cmu.edu/ guyb/PBBS.html.Google Scholar
- G. E. Blelloch, P. B. Gibbons, G. J. Narlikar, and Y. Matias. Space-efficient scheduling of parallelism with synchronization variables. In Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, SPAA'97, pages 12--23, Newport, RI, USA, 1997. Google ScholarDigital Library
- R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46: 720--748, 1999. Google ScholarDigital Library
- J. Brezin, S. J. Fink, B. Bloom, and C. Swart. An introduction to programming with X10. http://dist.codehaus.org/x10/documentation/guide/pguide.pdf.Google Scholar
- P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA'05, pages 519--538, San Diego, CA, USA, 2005. Google ScholarDigital Library
- G. Cong, S. Kodali, S. Krishnamoorthy, D. Lea, V. Saraswat, and T. Wen. Solving large, irregular graph problems using adaptive work-stealing. In Processings of the 37th International Conference on Parallel Processing, ICPP'08, pages 536--545, Portland, OR, USA, 2008. Google ScholarDigital Library
- S. J. Fink and F. Qian. Design, implementation and evaluation of adaptive recompilation with on-stack replacement. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, CGO'03, pages 241--252, San Francisco, CA, USA, 2003. Google ScholarDigital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN conference on Programming language design and implementation, PLDI'98, pages 212--223, Montreal, QC, Canada, 1998. Google ScholarDigital Library
- S. Ghemawat and P. Menage. TCMalloc : Thread-caching malloc. http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html.Google Scholar
- Y. Guo, R. Barik, R. Raman, and V. Sarkar. Work-first and help-first scheduling policies for async-finish task parallelism. In Processings of the 23th IEEE International Parallel & Distributed Processing Symposium, IPDPS'09, pages 1--12, Rome, Italy, 2009. Google ScholarDigital Library
- Y. Guo, J. Zhao, V. Cave, and V. Sarkar. SLAW: A scalable locality-aware adaptive work-stealing scheduler. In IEEE International Symposium on Parallel and Distributed Processing, IPDPS'10, pages 1--12, Atlanta, GA, USA, 2010.Google ScholarCross Ref
- Intel. Intel cilk++ sdk. http://software.intel.com/en-us/articles/download-intel-cilk-sdk/.Google Scholar
- P. Kambadur, A. Gupta, A. Ghoting, H. Avron, and A. Lumsdaine. PFunc: Modern task parallelism for modern high performance computing. In Proceedings of the ACM/IEEE conference on Supercomputing, SC'09, Portland, OR, USA, 2009. Google ScholarDigital Library
- D. Lea. Concurrency jsr-166 interest site. http://g.oswego.edu/dl/concurrency-interest/.Google Scholar
- D. Lea. A java fork/join framework. In Proceedings of the ACM conference on Java Grande, JAVA'00, pages 36--43, San Francisco, CA, USA, 2000. Google ScholarDigital Library
- D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, OOPSLA'09, pages 227--242, Orlando, FL, USA, 2009. Google ScholarDigital Library
- E. Mohr, D. A. Kranz, and R. H. Halstead, Jr. Lazy task creation: a technique for increasing the granularity of parallel programs. In Proceedings of the 1990 ACM conference on LISP and functional programming, LFP'90, pages 185--197, Nice, France, 1990. Google ScholarDigital Library
- M. Odersky and al. An overview of the Scala programming language. Technical Report IC/2004/64, EPFL Lausanne, Switzerland, 2004.Google Scholar
- R. Raman. Compiler support for work-stealing parallel runtime systems. Master's thesis, Rice University, Houston, Texas, 2009.Google Scholar
- J. Reinders. Intel threading building blocks. O'Reilly & Associates, Inc., Sebastopol, CA, USA, first edition, 2007. ISBN 9780596514808. Google ScholarDigital Library
- V. Saraswat, B. Bloom, I. Peshansky, O. Tardieu, and D. Grove. X10 language specification. http://dist.codehaus.org/x10/documentation/languagespec/x10-latest.pdf.Google Scholar
- V. A. Saraswat, P. Kambadur, S. Kodali, D. Grove, and S. Krishnamoorthy. Lifeline-based global load balancing. In Proceedings of the 16th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP'11, pages 201--212, San Antonio, TX, USA, 2011. Google ScholarDigital Library
- D. Spoonhower, G. E. Blelloch, P. B. Gibbons, and R. Harper. Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures. In Proceedings of the twenty-first annual ACM symposium on Parallelism in algorithms and architectures, SPAA'09, pages 91--100, Calgary, AB, Canada, 2009. Google ScholarDigital Library
- The X10 team. The X10 distribution. http://sourceforge.net/projects/x10/.Google Scholar
Index Terms
- A work-stealing scheduler for X10's task parallelism with suspension
Recommendations
A work-stealing scheduler for X10's task parallelism with suspension
PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel ProgrammingThe X10 programming language is intended to ease the programming of scalable concurrent and distributed applications. X10 augments a familiar imperative object-oriented programming model with constructs to support light-weight asynchronous tasks as well ...
Work-stealing without the baggage
OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsWork-stealing is a promising approach for effectively exploiting software parallelism on parallel hardware. A programmer who uses work-stealing explicitly identifies potential parallelism and the runtime then schedules work, keeping otherwise idle ...
Work-stealing without the baggage
OOPSLA '12Work-stealing is a promising approach for effectively exploiting software parallelism on parallel hardware. A programmer who uses work-stealing explicitly identifies potential parallelism and the runtime then schedules work, keeping otherwise idle ...
Comments