ABSTRACT
The Cray X1 was recently introduced as the first in a new line of parallel systems to combine high-bandwidth vector processing with an MPP system architecture. Alongside capabilities such as automatic fine-grained data parallelism through the use of vector instructions, the X1 offers hardware support for a transparent global-address space (GAS), which makes it an interesting target for GAS languages. In this paper, we describe our experience with developing a portable, open-source and high performance compiler for Unified Parallel C (UPC), a SPMD global-address space language extension of ISO C. As part of our implementation effort, we evaluate the X1's hardware support for GAS languages and provide empirical performance characterizations in the context of leveraging features such as vectorization and global pointers for the Berkeley UPC compiler. We discuss several difficulties encountered in the Cray C compiler which are likely to present challenges for many users, especially implementors of libraries and source-to-source translators. Finally, we analyze the performance of our compiler on some benchmark programs and show that, while there are some limitations of the current compilation approach, the Berkeley UPC compiler uses the X1 network more effectively than MPI or SHMEM, and generates serial code whose vectorizability is comparable to the original C code.
- A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman. LogGP: Incorporating long messages into the LogP model for parallel computation. Journal of Parallel and Distributed Computing, 44(1):71--79, 1997. Google ScholarDigital Library
- The Berkeley UPC Compiler, 2002. http://upc.lbl.gov.Google Scholar
- K. Berlin, J. Huan, M. Jacob, et al. Evaluating the impact of programming language features on the performance of parallel applications on cluster architectures. In 16th International Workshop on Languages and Compilers for Parallel Processing (LCPC), October 2003.Google Scholar
- D. Bonachea. GASNet specification. Technical Report CSD-02-1207, University of California, Berkeley, October 2002. Google ScholarDigital Library
- Programming Languages -- C, 1999. The ISO C Standard, ISO/IEC 9899:1999.Google Scholar
- S. Chakrabarti, M. Gupta, and J. Choi. Global communication analysis and optimization. In SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 68--78, 1996. Google ScholarDigital Library
- W. Chen, D. Bonachea, J. Duell, P. Husband, C. Iancu, and K. Yelick. A performance analysis of the Berkeley UPC Compiler. In Proceedings of the 17th International Conference on Supercomputing (ICS), June 2003. Google ScholarDigital Library
- C. Coarfa, Y. Dotsenko, J. Eckhardt, and J. Mellor-Crummey. Co-array Fortran performance and potential: An NPB experimental study. In 16th International Workshop on Languages and Compilers for Parallel Processing (LCPC), October 2003.Google Scholar
- Cray C/C++ reference manual. http://www.cray.com/craydoc/manuals/004-2179-003/html-004-2179-003/.Google Scholar
- Cray X1 system overview. http://www.cray.com/craydoc/20/manuals/S-2346-23/html-S-2346-23/S-2346-23-toc.html.Google Scholar
- D. Culler, A. Dusseau, S. Goldstein, A. Krishnamurthy, S. Lumetta, T. Eicken, and K. Yelick. Parallel programming in Split-C. In Supercomputing (SC1993), 1993. Google ScholarDigital Library
- T. Dunigan, M. Fahey, J. White, and P. Worley. Early evaluation of the Cray X1. In Supercomputing 2003 (SC2003), November 2003. Google ScholarDigital Library
- Earth Simulator. http://www.es.jamstec.go.jp/.Google Scholar
- T. El-Ghazawi and F. Cantonnet. UPC performance and potential: A NPB experimental study. In Supercomputing2002 (SC2002), November 2002. Google ScholarDigital Library
- T. El-Ghazawi, W. Carlson, and J. Draper. UPC specification, 2003. http://upc.gwu.edu/documentation.html.Google Scholar
- P. Hilfinger et al. Titanium language reference manual. Technical Report CSD-01-1163, University of California, Berkeley, November 2001. Google ScholarDigital Library
- A. Krishnamurthy and K. Yelick. Analyses and optimizations for shared address space programs. Jorunal of Parallel and Distributed Computing, 1996. Google ScholarDigital Library
- C. Luk and T. Mowry. Compiler-based prefetching for recursive data structures. In Architectural Support for Programming Languages and Operating Systems, pages 222--233, 1996. Google ScholarDigital Library
- S. Lumetta and D. Culler. Managing concurrent access for shared memory active messages. In Proceedings of the International Parallel Processing Symposium, pages 272--279, 1998. Google ScholarDigital Library
- F. McMahon. The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical report, Lawrence Livermore National Laboratory, December 1986.Google Scholar
- The Message Passing Interface (MPI) standard. http://www.mpi-forum.org/.Google Scholar
- R. Numwich and J. Reid. Co-Array Fortran for parallel programming. Technical Report RAL-TR-1998-060, Rutherford Appleton Lab, 1998.Google Scholar
- L. Oliker et al. Evaluation of cache-based superscalar and cacheless vector architectures for scientific computations. In Supercomputing 2003 (SC2003), November 2003. Google ScholarDigital Library
- Optimizing applications on the Cray X1 system. http://www.cray.com/craydoc/20/manuals/S-2315-51/html-S-2315-51/S-2315-51-toc.html.Google Scholar
- S. L. Scott. Synchronization and communication in the T3E multiprocessor. In Architectural Support for Programming Languages and Operating Systems, pages 26--36, 1996. Google ScholarDigital Library
- Man page collections: Shared memory access (SHMEM). http://www.cray.com/craydoc/20/manuals/S-2383-22/S-2383-22-manual.pdf.Google Scholar
- A. Wakatani. Effectiveness of Message Strip-Mining for Regular and Irregular Communication. In PDCS, Oct 94.Google Scholar
- K. Yelick, D. Bonachea, and C. Wallace. A proposal for a UPC memory consistency model. Technical Report LBNL-54983, Lawrence Berkeley National Lab, May 2004.Google ScholarCross Ref
- K. Yelick et al. Titanium: a high performance java dialect. In proceedings of ACM 1998 Workshop on Java for High-Performance Network Computing, February 1998.Google Scholar
- Y. Zhu and L. Hendren. Communication optimizations for parallel C programs. Jorunal of Parallel and Distributed Computing, 58(2):301--312, 1999. Google ScholarDigital Library
Index Terms
- Evaluating support for global address space languages on the Cray X1
Recommendations
A performance analysis of the Berkeley UPC compiler
ICS '03: Proceedings of the 17th annual international conference on SupercomputingUnified Parallel C (UPC) is a parallel language that uses a Single Program Multiple Data (SPMD) model of parallelism within a global address space. The global address space is used to simplify programming, especially on applications with irregular data ...
Productivity and performance using partitioned global address space languages
PASCO '07: Proceedings of the 2007 international workshop on Parallel symbolic computationPartitioned Global Address Space (PGAS) languages combine the programming convenience of shared memory with the locality and performance control of message passing. One such language, Unified Parallel C (UPC) is an extension of ISO C defined by a ...
An evaluation of global address space languages: co-array fortran and unified parallel C
PPoPP '05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programmingCo-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming. These languages boost programmer productivity by providing shared variables for inter-process ...
Comments