skip to main content
10.1145/1654059.1654078acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Implementing sparse matrix-vector multiplication on throughput-oriented processors

Published:14 November 2009Publication History

ABSTRACT

Sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations encounter a broad spectrum of matrices ranging from the regular to the highly irregular. Harnessing the tremendous potential of throughput-oriented processors for sparse operations requires that we expose substantial fine-grained parallelism and impose sufficient regularity on execution paths and memory access patterns. We explore SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes. The techniques we propose are efficient, successfully utilizing large percentages of peak bandwidth. Furthermore, they deliver excellent total throughput, averaging 16 GFLOP/s and 10 GFLOP/s in double precision for structured grid and unstructured mesh matrices, respectively, on a GeForce GTX 285. This is roughly 2.8 times the throughput previously achieved on Cell BE and more than 10 times that of a quad-core Intel Clovertown system.

References

  1. E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Barrachina, M. Castillo, F. D. Igual, R. Mayo, and E. S. Quintana-Ortí. Solving dense linear systems on graphics processors. In Proc. 14th Int'l Euro-Par Conference, volume 5168 of Lecture Notes in Computer Science, pages 739--748. Springer, Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. M. Baskaran and R. Bordawekar. Optimizing sparse matrix-vector multiplication on GPUs. IBM Research Report RC24704, IBM, Apr. 2009.Google ScholarGoogle Scholar
  4. N. Bell and M. Garland. Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, Dec. 2008.Google ScholarGoogle Scholar
  5. N. Bell and M. Garland. CUSP: Generic parallel algorithms for sparse matrix and graph computations. http://code.google.com/p/cusp-library/, 2009-.Google ScholarGoogle Scholar
  6. G. E. Blelloch, S. Chatterjee, J. C. Hardwick, J. Sipelstein, and M. Zagha. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing, 21(1):4--14, Apr. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. E. Blelloch, M. A. Heroux, and M. Zagha. Segmented operations for sparse matrix computation on vector multiprocessors. Technical Report CMU-CS-93-173, School of Computer Science, Carnegie Mellon University, Aug. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Buatois, G. Caumon, and B. Lévy. Concurrent number cruncher - a GPU implementation of a general sparse linear solver. International Journal of Parallel, Emergent and Distributed Systems, to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. CUDPP: CUDA Data-Parallel Primitives Library. http://www.gpgpu.org/developer/cudpp/, 2009.Google ScholarGoogle Scholar
  10. M. Garland. Sparse matrix computations on manycore GPU's. In DAC '08: Proc. 45th Annual Design Automation Conference, pages 2--6, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Grimes, D. Kincaid, and D. Young. ITPACK 2.0 User's Guide. Technical Report CNA-150, Center for Numerical Analysis, University of Texas, Aug. 1979.Google ScholarGoogle Scholar
  12. E.-J. Im, K. Yelick, and R. Vuduc. Sparsity: Optimization framework for sparse matrix kernels. Int. J. High Perform. Comput. Appl., 18(1):135--158, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. V. Knyazev. Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method. SIAM Journal on Scientific Computing, 23(2):517--541, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic Linear Algebra Subprograms for Fortran Usage. ACM Transactions on Mathematical Software (TOMS), 5(3):308--323, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro, 28(2):39--55, Mar/Apr 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Nickolls, I. Buck, M. Garland, and K. Skadron. Scalable parallel programming with CUDA. Queue, 6(2):40--53, Mar/Apr 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. NVIDIA Corporation. NVIDIA CUDA Programming Guide, June 2008. Version 2.0.Google ScholarGoogle Scholar
  18. Y. Saad. SPARSKIT: A basic tool kit for sparse computations; Version 2, June 1994.Google ScholarGoogle Scholar
  19. Y. Saad. Iterative Methods for Sparse Linear Systems. Society for Industrial Mathematics, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Sengupta, M. Harris, Y. Zhang, and J. D. Owens. Scan primitives for GPU computing. In Graphics Hardware 2007, pages 97--106. ACM, Aug. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Volkov and J. W. Demmel. Benchmarking GPUs to tune dense linear algebra. In Proc. 2008 ACM/IEEE Conference on Supercomputing, pages 1--11, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. W. Vuduc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, University of California, Berkeley, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Proc. 2007 ACM/IEEE Conference on Supercomputing, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Implementing sparse matrix-vector multiplication on throughput-oriented processors

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
                November 2009
                778 pages
                ISBN:9781605587448
                DOI:10.1145/1654059

                Copyright © 2009 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 14 November 2009

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                SC '09 Paper Acceptance Rate59of261submissions,23%Overall Acceptance Rate1,516of6,373submissions,24%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader