Abstract
Sequential programming models express a total program order, of which a partial order must be respected. This inhibits parallelizing tools from extracting scalable performance. Programmer written semantic commutativity assertions provide a natural way of relaxing this partial order, thereby exposing parallelism implicitly in a program. Existing implicit parallel programming models based on semantic commutativity either require additional programming extensions, or have limited expressiveness. This paper presents a generalized semantic commutativity based programming extension, called Commutative Set (COMMSET), and associated compiler technology that enables multiple forms of parallelism. COMMSET expressions are syntactically succinct and enable the programmer to specify commutativity relations between groups of arbitrary structured code blocks. Using only this construct, serializing constraints that inhibit parallelization can be relaxed, independent of any particular parallelization strategy or concurrency control mechanism. COMMSET enables well performing parallelizations in cases where they were inapplicable or non-performing before. By extending eight sequential programs with only 8 annotations per program on average, COMMSET and the associated compiler technology produced a geomean speedup of 5.7x on eight cores compared to 1.5x for the best non-COMMSET parallelization.
- F. Aleen and N. Clark. Commutativity analysis for software parallelization: Letting program transformations see the big picture. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009. Google ScholarDigital Library
- Apple Open Source. md5sum: Message Digest 5 computation. http://www.opensource.apple.com/darwinsource/.Google Scholar
- E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. The design of OpenMP tasks. IEEE Transactions on Parallel and Distributed Systems, 2009. Google ScholarDigital Library
- G. E. Blelloch and J. Greiner. A provable time and space efficient implementation of NESL. In Proceedings of the First ACM SIGPLAN International Conference on Functional Programming (ICFP), 1996. Google ScholarDigital Library
- R. L. Bocchino, Jr., V. S. Adve, D. Dig, S. V. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and effect system for Deterministic Parallel Java. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems, Languages, and Applications (OOPSLA), 2009. Google ScholarDigital Library
- M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2007. Google ScholarDigital Library
- M. J. Bridges. The VELOCITY compiler: Extracting efficient multicore execution from legacy sequential codes. PhD thesis, 2008. Google ScholarDigital Library
- D. R. Butenhof. Programming with POSIX threads. Addison-Wesley Longman Publishing Co., Inc., 1997. Google ScholarDigital Library
- M. C. Carlisle. Olden: Parallelizing programs with dynamic data structures on distributed-memory machines. PhD thesis, 1996. Google ScholarDigital Library
- B. D. Carlstrom, A. McDonald, M. Carbin, C. Kozyrakis, and K. Olukotun. Transactional collection classes. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2007. Google ScholarDigital Library
- R. Eigenmann, J. Hoeflinger, Z. Li, and D. A. Padua. Experience in the automatic parallelization of four Perfect-benchmark programs. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing (LCPC), 1992. Google ScholarDigital Library
- J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3), 1987. Google ScholarDigital Library
- T. Harris and S. Singh. Feedback directed implicit parallelism. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming (ICFP), 2007. Google ScholarDigital Library
- J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 2006. Google ScholarDigital Library
- W.-m. Hwu, S. Ryoo, S.-Z. Ueng, J. Kelm, I. Gelado, S. Stone, R. Kidd, S. Baghsorkhi, A. Mahesri, S. Tsao, N. Navarro, S. Lumetta, M. Frank, and S. Patel. Implicitly parallel programming models for thousand-core microprocessors. In Proceedings of the 44th annual Design Automation Conference (DAC), 2007. Google ScholarDigital Library
- K. Kennedy and J. R. Allen. Optimizing Compilers for Modern Architectures: a Dependence-based Approach. Morgan Kaufmann Publishers Inc., 2002. Google ScholarDigital Library
- E. Koskinen, M. Parkinson, and M. Herlihy. Coarse-grained transactions. In Proceedings of the 37th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2010. Google ScholarDigital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Google ScholarDigital Library
- C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis and transformation. In Proceedings of 2nd International Symposium on Code Generation and Optimization (CGO), 2004. Google ScholarDigital Library
- R. Leino, P. Müller, and J. Smans. Deadlock-free channels and locks. In Proceedings of the 19th European Symposium on Programming (ESOP), 2010. Google ScholarDigital Library
- G. Memik, W. H. Mangione-Smith, and W. Hu. NetBench: a benchmarking suite for network processors. In Proceedings of the 2001 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2001. Google ScholarDigital Library
- C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In IEEE International Symposium on Workload Characterization (IISWC), 2008.Google Scholar
- R. Narayanan, B. Ozisikyilmaz, J. Zambreno, G. Memik, and A. Choudhary. MineBench: A benchmark suite for data mining workloads. In IEEE International Symposium on Workload Characterization (IIWSC), 2006.Google ScholarCross Ref
- G. Ottoni. Global Instruction Scheduling for Multi-Threaded Architectures. PhD thesis, 2008. Google ScholarDigital Library
- G. Ottoni, R. Rangan, A. Stoler, and D. I. August. Automatic thread extraction with decoupled software pipelining. Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2005. Google ScholarDigital Library
- E. Raman, G. Ottoni, A. Raman, M. J. Bridges, and D. I. August. Parallel-stage decoupled software pipelining. In Proceedings of the 6th annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2008. Google ScholarDigital Library
- M. C. Rinard. The design, implementation and evaluation of Jade, a portable, implicitly parallel programming language. PhD thesis, 1994. Google ScholarDigital Library
- M. C. Rinard and P. Diniz. Commutativity analysis: A new analysis framework for parallelizing compilers. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation (PLDI). Google ScholarDigital Library
- P. Selinger. potrace: Transforming bitmaps into vector graphics. http://potrace.sourceforge.net.Google Scholar
- H. Vandierendonck, S. Rul, and K. De Bosschere. The Paralax infrastructure: Automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), 2010. Google ScholarDigital Library
- C. von Praun, L. Ceze, and C. Caşcaval. Implicit parallelism with ordered transactions. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2007. Google ScholarDigital Library
- P. Wu and D. A. Padua. Beyond arrays - a container-centric approach for parallelization of real-world symbolic applications. In Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing (LCPC), 1999. Google ScholarDigital Library
- R. M. Yoo, Y. Ni, A. Welc, B. Saha, A.-R. Adl-Tabatabai, and H.-H. S. Lee. Kicking the tires of software transactional memory: Why the going gets tough. In Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures (SPAA), 2008. Google ScholarDigital Library
- H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of 14th International Conference on High-Performance Computer Architecture (HPCA), 2008.Google Scholar
Index Terms
- Commutative set: a language extension for implicit parallel programming
Recommendations
Commutative set: a language extension for implicit parallel programming
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationSequential programming models express a total program order, of which a partial order must be respected. This inhibits parallelizing tools from extracting scalable performance. Programmer written semantic commutativity assertions provide a natural way ...
Commutative pseudo-equality algebras
Pseudo-equality algebras were initially introduced by Jenei and Kóródi as a possible algebraic semantic for fuzzy-type theory, and they have been revised by Dvureăźenskij and Zahiri under the name of JK-algebras. In this paper, we define and study the ...
Comparing Parallel Functional Languages: Programming and Performance
This paper presents a practical evaluation and comparison of three state-of-the-art parallel functional languages. The evaluation is based on implementations of three typical symbolic computation programs, with performance measured on a Beowulf-class ...
Comments