skip to main content
10.1145/1178597.1178605acmconferencesArticle/Chapter ViewAbstractPublication PagesmspConference Proceedingsconference-collections
Article

Implicit and explicit optimizations for stencil computations

Authors Info & Claims
Published:22 October 2006Publication History

ABSTRACT

Stencil-based kernels constitute the core of many scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. We examine several optimizations on both the conventional cache-based memory systems of the Itanium 2, Opteron, and Power5, as well as the heterogeneous multicore design of the Cell processor. The optimizations target cache reuse across stencil sweeps, including both an implicit cache oblivious approach and a cache-aware algorithm blocked to match the cache structure. Finally, we consider stencil computations on a machine with an explicitly-managed memory hierarchy, the Cell processor. Overall, results show that a cache-aware approach is significantly faster than a cache oblivious approach and that the explicitly managed memory on Cell is more efficient: Relative to the Power5, it has almost 2x more memory bandwidth and is 3.7x faster.

References

  1. Applied Numerical Algorithms Group (ANAG), Lawrence Berkeley National Laboratory, Berkeley, CA. Chombo website. http://seesar.lbl.gov/ANAG/software.html.Google ScholarGoogle Scholar
  2. M. Berger and J. Oliger. Adaptive mesh refinement for hyperbolic partial differential equations. Journal of Computational Physics, 53:484--512, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms (extended abstract).Google ScholarGoogle Scholar
  4. M. Frigo and V. Strumpen. Evaluation of cache-based superscalar and cacheless vector architectures for scientific computations. In Proc. of the 19th ACM International Conference on Supercomputing (ICS05), Boston, MA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Kamil, P. Husbands, L. Oliker, J. Shalf, and K. Yelick. Impact of modern memory subsystems on cache optimizations for stencil computations. In 3rd Annual ACM SIGPLAN Workshop on Memory Systems Performance, Chicago, IL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. McCalpin and D. Wonnacott. Time skewing: A value-based approach to optimizing for memory locality. Technical Report DCS-TR-379, Department of Computer Science, Rugers University, 1999.Google ScholarGoogle Scholar
  7. Performance Application Programming Interface. http://icl.cs.utk.edu/papi/.Google ScholarGoogle Scholar
  8. H. Prokop. Cache-oblivious algorithms, June 1999. Master's thesis, MIT Department of Electrical Engineering and Computer Science.Google ScholarGoogle Scholar
  9. S. Sellappa and S. Chatterjee. Cache-efficient multigrid algorithms. International Journal of High Performance Computing Applications, 18(1):115--133, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, GA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Williams, J. Shalf, L. Oliker, S. Kamil, P. Husbands, and K. Yelick. The potential of the cell processor for scientific computing. In CF '06: Proceedings of the 3rd conference on Computing Frontiers, pages 9--20, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. E. Wolf. Improving locality and parallelism in nested loops. PhD thesis, Stanford University, Stanford, CA, USA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Wonnacott. Using time skewing to eliminate idle time due to memory bandwidth and network limitations. In IPDPS: Interational Conference on Parallel and Distributed Computing Systems, Cancun, Mexico, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Implicit and explicit optimizations for stencil computations

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MSPC '06: Proceedings of the 2006 workshop on Memory system performance and correctness
        October 2006
        114 pages
        ISBN:1595935789
        DOI:10.1145/1178597

        Copyright © 2006 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 October 2006

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate6of20submissions,30%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader