skip to main content
10.1145/1693453.1693487acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Helper locks for fork-join parallel programming

Authors Info & Claims
Published:09 January 2010Publication History

ABSTRACT

Helper locks allow programs with large parallel critical sections, called parallel regions, to execute more efficiently by enlisting processors that might otherwise be waiting on the helper lock to aid in the execution of the parallel region. Suppose that a processor p is executing a parallel region A after having acquired the lock L protecting A. If another processor p′ tries to acquire L, then instead of blocking and waiting for p to complete A, processor p′ joins p to help it complete A. Additional processors not blocked on L may also help to execute A.

The HELPER runtime system can execute fork-join computations augmented with helper locks and parallel regions. HELPER supports the unbounded nesting of parallel regions. We provide theoretical completion-time and space-usage bounds for a design of HELPER based on work stealing. Specifically, let V be the number of parallel regions in a computation, let T1 be its work, and let T∞ be its "aggregate span" --- the sum of the spans (critical-path lengths) of all its parallel regions. We prove that HELPER completes the computation in expected time O(T1/PP + T∞+ PV on P processors. This bound indicates that programs with a small number of highly parallel critical sections can attain linear speedup. For the space bound, we prove that HELPER completes a program using only O(PS1 stack space, where S1 is the sum, over all regions, of the stack space used by each region in a serial execution. Finally, we describe a prototype of HELPER implemented by modifying the Cilk multithreaded runtime system. We used this prototype to implement a concurrent hash table with a resize operation protected by a helper lock.

References

  1. E. Allen, D. Chase, J. Hallett, V. Luchango, J.-W. Maessen, S. Ryu, G. L. S. Jr., and S. Tobin-Hochstadt. The Fortress language specification, version 1.0. Technical report, Sun Microsystems, Inc., March 2008. URL http://research.sun.com/projects/plrg/Publications/fortress.1.0.pdf.Google ScholarGoogle Scholar
  2. N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 119--129, Puerto Vallarta, Mexico, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Barnes. A method for implementing lock-free shared-data structures. In SPAA '93: Proceedings of the Fifth Annual ACM Symposium on Parallel Algorithms and Architectures, pages 261--270, New York, NY, USA, 1993. ACM. ISBN 0-89791-599-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5):720--748, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Ebcioglu, V. Saraswat, and V. Sarkar. X10: an experimental language for high productivity programming of scalable systems. In Proceedings of the Second Workshop on Productivity and Performance in High-End Computing (PPHEC-05), Feb. 2005. Held in conjunction with the Eleventh Symposium on High Performance Computer Architecture.Google ScholarGoogle Scholar
  6. M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 212--223, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Herlihy, N. Shavit, and M. Tzafrir. Hopscotch hashing. In DISC '08: Proceedings of the 22nd International Symposium on Distributed Computing, pages 350--364, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Israeli and L. Rappoport. Disjoint-access-parallel implementations of strong shared memory primitives. In PODC '94: Proceedings of the Thirteenth Annual ACM Symposium on Principles of Distributed Computing, pages 151--160, New York, NY, USA, 1994. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. E. Leiserson. The Cilk++ concurrency platform. In DAC '09: Proceedings of the 46th Annual Design Automation Conference, pages 522--527, New York, NY, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S.-C. Lim, J. Ahn, and M. H. Kim. A concurrent Blink-tree algorithm using a cooperative locking protocol. In Lecture Notes in Computer Science, volume 2712, pages 253--260. Springer Berlin / Heidelberg, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. OpenMP Architecture Review Board. OpenMP application program interface, version 3.0. http://www.openmp.org/mp-documents/spec30.pdf, May 2008.Google ScholarGoogle Scholar
  12. J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O'Reilly, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Turek, D. Shasha, and S. Prakash. Locking without blocking: making lock based concurrent data structure algorithms nonblocking. In PODS '92: Proceedings of the Eleventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 212--222, New York, NY, USA, 1992. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Helper locks for fork-join parallel programming

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
        January 2010
        372 pages
        ISBN:9781605588773
        DOI:10.1145/1693453
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 45, Issue 5
          PPoPP '10
          May 2010
          346 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1837853
          Issue’s Table of Contents

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 January 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate230of1,014submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader