skip to main content
10.1145/1006209.1006236acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

Evaluating support for global address space languages on the Cray X1

Published:26 June 2004Publication History

ABSTRACT

The Cray X1 was recently introduced as the first in a new line of parallel systems to combine high-bandwidth vector processing with an MPP system architecture. Alongside capabilities such as automatic fine-grained data parallelism through the use of vector instructions, the X1 offers hardware support for a transparent global-address space (GAS), which makes it an interesting target for GAS languages. In this paper, we describe our experience with developing a portable, open-source and high performance compiler for Unified Parallel C (UPC), a SPMD global-address space language extension of ISO C. As part of our implementation effort, we evaluate the X1's hardware support for GAS languages and provide empirical performance characterizations in the context of leveraging features such as vectorization and global pointers for the Berkeley UPC compiler. We discuss several difficulties encountered in the Cray C compiler which are likely to present challenges for many users, especially implementors of libraries and source-to-source translators. Finally, we analyze the performance of our compiler on some benchmark programs and show that, while there are some limitations of the current compilation approach, the Berkeley UPC compiler uses the X1 network more effectively than MPI or SHMEM, and generates serial code whose vectorizability is comparable to the original C code.

References

  1. A. Alexandrov, M. F. Ionescu, K. E. Schauser, and C. Scheiman. LogGP: Incorporating long messages into the LogP model for parallel computation. Journal of Parallel and Distributed Computing, 44(1):71--79, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. The Berkeley UPC Compiler, 2002. http://upc.lbl.gov.Google ScholarGoogle Scholar
  3. K. Berlin, J. Huan, M. Jacob, et al. Evaluating the impact of programming language features on the performance of parallel applications on cluster architectures. In 16th International Workshop on Languages and Compilers for Parallel Processing (LCPC), October 2003.Google ScholarGoogle Scholar
  4. D. Bonachea. GASNet specification. Technical Report CSD-02-1207, University of California, Berkeley, October 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Programming Languages -- C, 1999. The ISO C Standard, ISO/IEC 9899:1999.Google ScholarGoogle Scholar
  6. S. Chakrabarti, M. Gupta, and J. Choi. Global communication analysis and optimization. In SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 68--78, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Chen, D. Bonachea, J. Duell, P. Husband, C. Iancu, and K. Yelick. A performance analysis of the Berkeley UPC Compiler. In Proceedings of the 17th International Conference on Supercomputing (ICS), June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Coarfa, Y. Dotsenko, J. Eckhardt, and J. Mellor-Crummey. Co-array Fortran performance and potential: An NPB experimental study. In 16th International Workshop on Languages and Compilers for Parallel Processing (LCPC), October 2003.Google ScholarGoogle Scholar
  9. Cray C/C++ reference manual. http://www.cray.com/craydoc/manuals/004-2179-003/html-004-2179-003/.Google ScholarGoogle Scholar
  10. Cray X1 system overview. http://www.cray.com/craydoc/20/manuals/S-2346-23/html-S-2346-23/S-2346-23-toc.html.Google ScholarGoogle Scholar
  11. D. Culler, A. Dusseau, S. Goldstein, A. Krishnamurthy, S. Lumetta, T. Eicken, and K. Yelick. Parallel programming in Split-C. In Supercomputing (SC1993), 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Dunigan, M. Fahey, J. White, and P. Worley. Early evaluation of the Cray X1. In Supercomputing 2003 (SC2003), November 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Earth Simulator. http://www.es.jamstec.go.jp/.Google ScholarGoogle Scholar
  14. T. El-Ghazawi and F. Cantonnet. UPC performance and potential: A NPB experimental study. In Supercomputing2002 (SC2002), November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. El-Ghazawi, W. Carlson, and J. Draper. UPC specification, 2003. http://upc.gwu.edu/documentation.html.Google ScholarGoogle Scholar
  16. P. Hilfinger et al. Titanium language reference manual. Technical Report CSD-01-1163, University of California, Berkeley, November 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Krishnamurthy and K. Yelick. Analyses and optimizations for shared address space programs. Jorunal of Parallel and Distributed Computing, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Luk and T. Mowry. Compiler-based prefetching for recursive data structures. In Architectural Support for Programming Languages and Operating Systems, pages 222--233, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Lumetta and D. Culler. Managing concurrent access for shared memory active messages. In Proceedings of the International Parallel Processing Symposium, pages 272--279, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. McMahon. The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical report, Lawrence Livermore National Laboratory, December 1986.Google ScholarGoogle Scholar
  21. The Message Passing Interface (MPI) standard. http://www.mpi-forum.org/.Google ScholarGoogle Scholar
  22. R. Numwich and J. Reid. Co-Array Fortran for parallel programming. Technical Report RAL-TR-1998-060, Rutherford Appleton Lab, 1998.Google ScholarGoogle Scholar
  23. L. Oliker et al. Evaluation of cache-based superscalar and cacheless vector architectures for scientific computations. In Supercomputing 2003 (SC2003), November 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Optimizing applications on the Cray X1 system. http://www.cray.com/craydoc/20/manuals/S-2315-51/html-S-2315-51/S-2315-51-toc.html.Google ScholarGoogle Scholar
  25. S. L. Scott. Synchronization and communication in the T3E multiprocessor. In Architectural Support for Programming Languages and Operating Systems, pages 26--36, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Man page collections: Shared memory access (SHMEM). http://www.cray.com/craydoc/20/manuals/S-2383-22/S-2383-22-manual.pdf.Google ScholarGoogle Scholar
  27. A. Wakatani. Effectiveness of Message Strip-Mining for Regular and Irregular Communication. In PDCS, Oct 94.Google ScholarGoogle Scholar
  28. K. Yelick, D. Bonachea, and C. Wallace. A proposal for a UPC memory consistency model. Technical Report LBNL-54983, Lawrence Berkeley National Lab, May 2004.Google ScholarGoogle ScholarCross RefCross Ref
  29. K. Yelick et al. Titanium: a high performance java dialect. In proceedings of ACM 1998 Workshop on Java for High-Performance Network Computing, February 1998.Google ScholarGoogle Scholar
  30. Y. Zhu and L. Hendren. Communication optimizations for parallel C programs. Jorunal of Parallel and Distributed Computing, 58(2):301--312, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Evaluating support for global address space languages on the Cray X1

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICS '04: Proceedings of the 18th annual international conference on Supercomputing
      June 2004
      360 pages
      ISBN:1581138393
      DOI:10.1145/1006209

      Copyright © 2004 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 June 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate584of2,055submissions,28%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader