skip to main content
research-article

Commutative set: a language extension for implicit parallel programming

Published:04 June 2011Publication History
Skip Abstract Section

Abstract

Sequential programming models express a total program order, of which a partial order must be respected. This inhibits parallelizing tools from extracting scalable performance. Programmer written semantic commutativity assertions provide a natural way of relaxing this partial order, thereby exposing parallelism implicitly in a program. Existing implicit parallel programming models based on semantic commutativity either require additional programming extensions, or have limited expressiveness. This paper presents a generalized semantic commutativity based programming extension, called Commutative Set (COMMSET), and associated compiler technology that enables multiple forms of parallelism. COMMSET expressions are syntactically succinct and enable the programmer to specify commutativity relations between groups of arbitrary structured code blocks. Using only this construct, serializing constraints that inhibit parallelization can be relaxed, independent of any particular parallelization strategy or concurrency control mechanism. COMMSET enables well performing parallelizations in cases where they were inapplicable or non-performing before. By extending eight sequential programs with only 8 annotations per program on average, COMMSET and the associated compiler technology produced a geomean speedup of 5.7x on eight cores compared to 1.5x for the best non-COMMSET parallelization.

References

  1. F. Aleen and N. Clark. Commutativity analysis for software parallelization: Letting program transformations see the big picture. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Apple Open Source. md5sum: Message Digest 5 computation. http://www.opensource.apple.com/darwinsource/.Google ScholarGoogle Scholar
  3. E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. The design of OpenMP tasks. IEEE Transactions on Parallel and Distributed Systems, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. E. Blelloch and J. Greiner. A provable time and space efficient implementation of NESL. In Proceedings of the First ACM SIGPLAN International Conference on Functional Programming (ICFP), 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. L. Bocchino, Jr., V. S. Adve, D. Dig, S. V. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and effect system for Deterministic Parallel Java. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems, Languages, and Applications (OOPSLA), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. J. Bridges. The VELOCITY compiler: Extracting efficient multicore execution from legacy sequential codes. PhD thesis, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. R. Butenhof. Programming with POSIX threads. Addison-Wesley Longman Publishing Co., Inc., 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. C. Carlisle. Olden: Parallelizing programs with dynamic data structures on distributed-memory machines. PhD thesis, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. D. Carlstrom, A. McDonald, M. Carbin, C. Kozyrakis, and K. Olukotun. Transactional collection classes. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Eigenmann, J. Hoeflinger, Z. Li, and D. A. Padua. Experience in the automatic parallelization of four Perfect-benchmark programs. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing (LCPC), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3), 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Harris and S. Singh. Feedback directed implicit parallelism. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming (ICFP), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W.-m. Hwu, S. Ryoo, S.-Z. Ueng, J. Kelm, I. Gelado, S. Stone, R. Kidd, S. Baghsorkhi, A. Mahesri, S. Tsao, N. Navarro, S. Lumetta, M. Frank, and S. Patel. Implicitly parallel programming models for thousand-core microprocessors. In Proceedings of the 44th annual Design Automation Conference (DAC), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. K. Kennedy and J. R. Allen. Optimizing Compilers for Modern Architectures: a Dependence-based Approach. Morgan Kaufmann Publishers Inc., 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Koskinen, M. Parkinson, and M. Herlihy. Coarse-grained transactions. In Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of 2nd International Symposium on Code Generation and Optimization (CGO), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Leino, P. Müller, and J. Smans. Deadlock-free channels and locks. In Proceedings of the 19th European Symposium on Programming (ESOP), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Memik, W. H. Mangione-Smith, and W. Hu. NetBench: a benchmarking suite for network processors. In Proceedings of the 2001 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In IEEE International Symposium on Workload Characterization (IISWC), 2008.Google ScholarGoogle Scholar
  23. R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. MineBench: A benchmark suite for data mining workloads. In IEEE International Symposium on Workload Characterization (IIWSC), 2006.Google ScholarGoogle ScholarCross RefCross Ref
  24. G. Ottoni. Global Instruction Scheduling for Multi-Threaded Architectures. PhD thesis, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Raman, G. Ottoni, A. Raman, M. J. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In Proceedings of the 6th annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. C. Rinard. The design, implementation and evaluation of Jade, a portable, implicitly parallel programming language. PhD thesis, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. C. Rinard and P. Diniz. Commutativity analysis: A new analysis framework for parallelizing compilers. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. Selinger. potrace: Transforming bitmaps into vector graphics. http://potrace.sourceforge.net.Google ScholarGoogle Scholar
  30. H. Vandierendonck, S. Rul, and K. De Bosschere. The Paralax infrastructure: Automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. von Praun, L. Ceze, and C. Caşcaval. Implicit parallelism with ordered transactions. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Wu and D. A. Padua. Beyond arrays - a container-centric approach for parallelization of real-world symbolic applications. In Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing (LCPC), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. M. Yoo, Y. Ni, A. Welc, B. Saha, A.-R. Adl-Tabatabai, and H.-H. S. Lee. Kicking the tires of software transactional memory: Why the going gets tough. In Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures (SPAA), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of 14th International Conference on High-Performance Computer Architecture (HPCA), 2008.Google ScholarGoogle Scholar

Index Terms

  1. Commutative set: a language extension for implicit parallel programming

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 46, Issue 6
          PLDI '11
          June 2011
          652 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1993316
          Issue’s Table of Contents
          • cover image ACM Conferences
            PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
            June 2011
            668 pages
            ISBN:9781450306638
            DOI:10.1145/1993498
            • General Chair:
            • Mary Hall,
            • Program Chair:
            • David Padua

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 June 2011

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader