skip to main content
research-article

A work-stealing scheduler for X10's task parallelism with suspension

Authors Info & Claims
Published:25 February 2012Publication History
Skip Abstract Section

Abstract

The X10 programming language is intended to ease the programming of scalable concurrent and distributed applications. X10 augments a familiar imperative object-oriented programming model with constructs to support light-weight asynchronous tasks as well as execution across multiple address spaces. A crucial aspect of X10's runtime system is the scheduling of concurrent tasks. Work-stealing schedulers have been shown to efficiently load balance fine-grain divide-and-conquer task-parallel program on SMPs and multicores. But X10 is not limited to shared-memory fork-join parallelism. X10 permits tasks to suspend and synchronize by means of conditional atomic blocks and remote task invocations.

In this paper, we demonstrate that work-stealing scheduling principles are applicable to a rich programming language such as X10, achieving performance at scale without compromising expressivity, ease of use, or portability. We design and implement a portable work-stealing execution engine for X10. While this engine is biased toward the efficient execution of fork-join parallelism in shared memory, it handles the full X10 language, especially conditional atomic blocks and distribution.

We show that this engine improves the run time of a series of benchmark programs by several orders of magnitude when used in combination with the C++ backend compiler and runtime for X10. It achieves scaling comparable to state-of-the art work-stealing scheduler implementations---the Cilk++ compiler and the Java fork/join framework---despite the dramatic increase in generality.

References

  1. B. Alpern, C. R. Attanasio, J. J. Barton, M. G. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S. J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. F. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. C. Shepherd, S. E. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The jalapeno virtual machine. IBM Syst. J., pages 211--238, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. E. Blelloch. The problem-based benchmark suite. http://www.cs.cmu.edu/ guyb/PBBS.html.Google ScholarGoogle Scholar
  3. G. E. Blelloch, P. B. Gibbons, G. J. Narlikar, and Y. Matias. Space-efficient scheduling of parallelism with synchronization variables. In Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, SPAA'97, pages 12--23, Newport, RI, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46: 720--748, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Brezin, S. J. Fink, B. Bloom, and C. Swart. An introduction to programming with X10. http://dist.codehaus.org/x10/documentation/guide/pguide.pdf.Google ScholarGoogle Scholar
  6. P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA'05, pages 519--538, San Diego, CA, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Cong, S. Kodali, S. Krishnamoorthy, D. Lea, V. Saraswat, and T. Wen. Solving large, irregular graph problems using adaptive work-stealing. In Processings of the 37th International Conference on Parallel Processing, ICPP'08, pages 536--545, Portland, OR, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. J. Fink and F. Qian. Design, implementation and evaluation of adaptive recompilation with on-stack replacement. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, CGO'03, pages 241--252, San Francisco, CA, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN conference on Programming language design and implementation, PLDI'98, pages 212--223, Montreal, QC, Canada, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Ghemawat and P. Menage. TCMalloc : Thread-caching malloc. http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html.Google ScholarGoogle Scholar
  11. Y. Guo, R. Barik, R. Raman, and V. Sarkar. Work-first and help-first scheduling policies for async-finish task parallelism. In Processings of the 23th IEEE International Parallel & Distributed Processing Symposium, IPDPS'09, pages 1--12, Rome, Italy, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Guo, J. Zhao, V. Cave, and V. Sarkar. SLAW: A scalable locality-aware adaptive work-stealing scheduler. In IEEE International Symposium on Parallel and Distributed Processing, IPDPS'10, pages 1--12, Atlanta, GA, USA, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  13. Intel. Intel cilk++ sdk. http://software.intel.com/en-us/articles/download-intel-cilk-sdk/.Google ScholarGoogle Scholar
  14. P. Kambadur, A. Gupta, A. Ghoting, H. Avron, and A. Lumsdaine. PFunc: Modern task parallelism for modern high performance computing. In Proceedings of the ACM/IEEE conference on Supercomputing, SC'09, Portland, OR, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Lea. Concurrency jsr-166 interest site. http://g.oswego.edu/dl/concurrency-interest/.Google ScholarGoogle Scholar
  16. D. Lea. A java fork/join framework. In Proceedings of the ACM conference on Java Grande, JAVA'00, pages 36--43, San Francisco, CA, USA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, OOPSLA'09, pages 227--242, Orlando, FL, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. Mohr, D. A. Kranz, and R. H. Halstead, Jr. Lazy task creation: a technique for increasing the granularity of parallel programs. In Proceedings of the 1990 ACM conference on LISP and functional programming, LFP'90, pages 185--197, Nice, France, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Odersky and al. An overview of the Scala programming language. Technical Report IC/2004/64, EPFL Lausanne, Switzerland, 2004.Google ScholarGoogle Scholar
  20. R. Raman. Compiler support for work-stealing parallel runtime systems. Master's thesis, Rice University, Houston, Texas, 2009.Google ScholarGoogle Scholar
  21. J. Reinders. Intel threading building blocks. O'Reilly & Associates, Inc., Sebastopol, CA, USA, first edition, 2007. ISBN 9780596514808. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. Saraswat, B. Bloom, I. Peshansky, O. Tardieu, and D. Grove. X10 language specification. http://dist.codehaus.org/x10/documentation/languagespec/x10-latest.pdf.Google ScholarGoogle Scholar
  23. V. A. Saraswat, P. Kambadur, S. Kodali, D. Grove, and S. Krishnamoorthy. Lifeline-based global load balancing. In Proceedings of the 16th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP'11, pages 201--212, San Antonio, TX, USA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Spoonhower, G. E. Blelloch, P. B. Gibbons, and R. Harper. Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures. In Proceedings of the twenty-first annual ACM symposium on Parallelism in algorithms and architectures, SPAA'09, pages 91--100, Calgary, AB, Canada, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. The X10 team. The X10 distribution. http://sourceforge.net/projects/x10/.Google ScholarGoogle Scholar

Index Terms

  1. A work-stealing scheduler for X10's task parallelism with suspension

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 47, Issue 8
          PPOPP '12
          August 2012
          334 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2370036
          Issue’s Table of Contents
          • cover image ACM Conferences
            PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
            February 2012
            352 pages
            ISBN:9781450311601
            DOI:10.1145/2145816

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 February 2012

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader