skip to main content
10.1145/1248377.1248391acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
Article

Optimal sparse matrix dense vector multiplication in the I/O-model

Published:09 June 2007Publication History

ABSTRACT

We analyze the problem of sparse-matrix dense-vector multiplication (SpMV) in the I/O-model. The task of SpMV is to compute y := Ax, where A is a sparse N x N matrix and x and y are vectors. Here, sparsity is expressed by the parameter k that states that A has a total of at most kN nonzeros, i.e., an average number of k nonzeros per column. The extreme choices for parameter k are well studied special cases, namely for k=1 permuting and for k=N dense matrix-vector multiplication.

We study the worst-case complexity of this computational task, i.e., what is the best possible upper bound on the number of I/Os depending on k and N only. We determine this complexity up to a constant factor for large ranges of the parameters. By our arguments, we find that most matrices with kN nonzeros require this number of I/Os, even if the program may depend on the structure of the matrix. The model of computation for the lower bound is a combination of the I/O-models of Aggarwal and Vitter, and of Hong and Kung.

We study two variants of the problem, depending on the memory layout of A.

If A is stored in column major layout, SpMV has I/O complexity Θ(min{kN<over>B(1+logM/BN<over>max{M,k}), kN}) for kN1-ε and any constant 1> ε > 0. If the algorithm can choose the memory layout, the I/O complexity of SpMV is Θ(min{kN<over>B(1+logM/BN<over>kM), kN]) for k3N.

In the cache oblivious setting with tall cache assumption MB1+ε, the I/O complexity is Ο(kN<over>B(1+logM/B N<over>k)) for A in column major layout.

References

  1. A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Comm. ACM, 31(9):1116--1127, September 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Arge and P. B. Miltersen. On showing lower bounds for external-memory computational geometry problems. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, vol. 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 139--159. AMS Press 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. S. Brodal, R. Fagerberg, and G. Moruz. Cache-aware and cache-oblivious adaptive sorting. In Proc. 32nd International Colloquium on Automata, Languages, and Programming, vol. 3580 of Lecture Notes in Computer Science, pages 576--588. Springer Verlag, Berlin, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. H. Cormen, T. Sundquist, and L. F. Wisniewski. Asymptotically tight bounds for performing BMMC permutations on parallel disk systems. SIAM J. Comput., 28(1):105--136, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, R. V. Antoine Petitet, R. C. Whaley, and K. Yelick. Self-adapting linear algebra algorithms and software. Proc. of the IEEE, Special Issue on Program Generation, Optimization, and Adaptation, 93(2), February 2005.Google ScholarGoogle ScholarCross RefCross Ref
  6. S. Filippone and M. Colajanni. PSBLAS: A library for parallel linear algebra computation on sparse matrices. ACM Trans. on Math. Software, 26(4):527--550, Dec. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th Annual Symp. on Foundations of Computer Science (FOCS), pages 285--297, New York, NY, Oct. 17-19 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Hartshorne. Algebraic Geometry. Springer, 1977.Google ScholarGoogle Scholar
  9. T. Haveliwala. Efficient computation of pagerank. Technical Report 1999-31, Database Group, Computer Science Department, Stanford University, Feb. 1999. Available at http://dbpubs.stanford.edu/pub/1999-31.Google ScholarGoogle Scholar
  10. E. J. Im. Optimizing the Performance of Sparse Matrix-Vector Multiplication. PhD thesis, University of California, Berkeley, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Jia-Wei and H. T. Kung. I/O complexity: The red-blue pebble game. In STOC '81: Proc. 13th annual ACM symposium on Theory of computing, pages 326--333, New York, NY, USA, 1981. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Raz. Multi-linear formulas for permanent and determinant are of super-polynomial size. In Proc. 36th Annual ACM Symposium on Theory of Computing (STOC), pages 633--641, Chicago, IL, USA, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Remington and R. Pozo. NIST sparse BLAS user's guide. Technical report, National Institute of Standards and Technology, Gaithersburg, Maryland, 1996.Google ScholarGoogle Scholar
  14. Y. Saad. Sparsekit: a basic tool kit for sparse matrix computations. Technical report, Computer Science Department, University of Minnesota, June 1994.Google ScholarGoogle Scholar
  15. S. Toledo. A survey of out-of-core algorithms in numerical linear algebra. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, vol. 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 161--179. AMS Press 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. S. Vitter. External memory algorithms and data structures. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, vol. 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 1--38. AMS Press 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Vudac, J. W. Demmel, and K. A. Yelick. The Optimized Sparse Kernel Interface (OSKI) Library: User's Guide for Version 1.0.1b. Berkeley Benchmarking and OPtimization (BeBOP) Group, March 15 2006.Google ScholarGoogle Scholar
  18. R. W. Vuduc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, University of California, Berkeley, Fall 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimal sparse matrix dense vector multiplication in the I/O-model

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
      June 2007
      376 pages
      ISBN:9781595936677
      DOI:10.1145/1248377

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate447of1,461submissions,31%

      Upcoming Conference

      SPAA '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader