ABSTRACT
We analyze the problem of sparse-matrix dense-vector multiplication (SpMV) in the I/O-model. The task of SpMV is to compute y := Ax, where A is a sparse N x N matrix and x and y are vectors. Here, sparsity is expressed by the parameter k that states that A has a total of at most kN nonzeros, i.e., an average number of k nonzeros per column. The extreme choices for parameter k are well studied special cases, namely for k=1 permuting and for k=N dense matrix-vector multiplication.
We study the worst-case complexity of this computational task, i.e., what is the best possible upper bound on the number of I/Os depending on k and N only. We determine this complexity up to a constant factor for large ranges of the parameters. By our arguments, we find that most matrices with kN nonzeros require this number of I/Os, even if the program may depend on the structure of the matrix. The model of computation for the lower bound is a combination of the I/O-models of Aggarwal and Vitter, and of Hong and Kung.
We study two variants of the problem, depending on the memory layout of A.
If A is stored in column major layout, SpMV has I/O complexity Θ(min{kN<over>B(1+logM/BN<over>max{M,k}), kN}) for k ≤ N1-ε and any constant 1> ε > 0. If the algorithm can choose the memory layout, the I/O complexity of SpMV is Θ(min{kN<over>B(1+logM/BN<over>kM), kN]) for k ≤ 3√N.
In the cache oblivious setting with tall cache assumption M ≥ B1+ε, the I/O complexity is Ο(kN<over>B(1+logM/B N<over>k)) for A in column major layout.
- A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Comm. ACM, 31(9):1116--1127, September 1988. Google ScholarDigital Library
- L. Arge and P. B. Miltersen. On showing lower bounds for external-memory computational geometry problems. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, vol. 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 139--159. AMS Press 1999. Google ScholarDigital Library
- G. S. Brodal, R. Fagerberg, and G. Moruz. Cache-aware and cache-oblivious adaptive sorting. In Proc. 32nd International Colloquium on Automata, Languages, and Programming, vol. 3580 of Lecture Notes in Computer Science, pages 576--588. Springer Verlag, Berlin, 2005. Google ScholarDigital Library
- T. H. Cormen, T. Sundquist, and L. F. Wisniewski. Asymptotically tight bounds for performing BMMC permutations on parallel disk systems. SIAM J. Comput., 28(1):105--136, 1999. Google ScholarDigital Library
- J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, R. V. Antoine Petitet, R. C. Whaley, and K. Yelick. Self-adapting linear algebra algorithms and software. Proc. of the IEEE, Special Issue on Program Generation, Optimization, and Adaptation, 93(2), February 2005.Google ScholarCross Ref
- S. Filippone and M. Colajanni. PSBLAS: A library for parallel linear algebra computation on sparse matrices. ACM Trans. on Math. Software, 26(4):527--550, Dec. 2000. Google ScholarDigital Library
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th Annual Symp. on Foundations of Computer Science (FOCS), pages 285--297, New York, NY, Oct. 17-19 1999. Google ScholarDigital Library
- R. Hartshorne. Algebraic Geometry. Springer, 1977.Google Scholar
- T. Haveliwala. Efficient computation of pagerank. Technical Report 1999-31, Database Group, Computer Science Department, Stanford University, Feb. 1999. Available at http://dbpubs.stanford.edu/pub/1999-31.Google Scholar
- E. J. Im. Optimizing the Performance of Sparse Matrix-Vector Multiplication. PhD thesis, University of California, Berkeley, May 2000. Google ScholarDigital Library
- H. Jia-Wei and H. T. Kung. I/O complexity: The red-blue pebble game. In STOC '81: Proc. 13th annual ACM symposium on Theory of computing, pages 326--333, New York, NY, USA, 1981. ACM Press. Google ScholarDigital Library
- R. Raz. Multi-linear formulas for permanent and determinant are of super-polynomial size. In Proc. 36th Annual ACM Symposium on Theory of Computing (STOC), pages 633--641, Chicago, IL, USA, June 2004. Google ScholarDigital Library
- K. Remington and R. Pozo. NIST sparse BLAS user's guide. Technical report, National Institute of Standards and Technology, Gaithersburg, Maryland, 1996.Google Scholar
- Y. Saad. Sparsekit: a basic tool kit for sparse matrix computations. Technical report, Computer Science Department, University of Minnesota, June 1994.Google Scholar
- S. Toledo. A survey of out-of-core algorithms in numerical linear algebra. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, vol. 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 161--179. AMS Press 1999. Google ScholarDigital Library
- J. S. Vitter. External memory algorithms and data structures. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, vol. 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 1--38. AMS Press 1999. Google ScholarDigital Library
- R. Vudac, J. W. Demmel, and K. A. Yelick. The Optimized Sparse Kernel Interface (OSKI) Library: User's Guide for Version 1.0.1b. Berkeley Benchmarking and OPtimization (BeBOP) Group, March 15 2006.Google Scholar
- R. W. Vuduc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, University of California, Berkeley, Fall 2003. Google ScholarDigital Library
Index Terms
- Optimal sparse matrix dense vector multiplication in the I/O-model
Recommendations
Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model
Special Title: Parallelism on Algorithms and Architectures (SPAA); Guest Editors: Cyril Gavoille, Boaz Patt-Shamir and Christian ScheidelerWe study the problem of sparse-matrix dense-vector multiplication (SpMV) in external memory. The task of SpMV is to compute y:=Ax, where A is a sparse N×N matrix and x is a vector. We express sparsity by a parameter k, and for each choice of k consider ...
A two-pronged progress in structured dense matrix vector multiplication
SODA '18: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete AlgorithmsMatrix-vector multiplication is one of the most fundamental computing primitives. Given a matrix A ∈ FN × N and a vector b ∈ FN, it is known that in the worst case Θ(N2) operations over F are needed to compute Ab. Many types of structured matrices do ...
Solving sparse linear systems faster than matrix multiplication
SODA '21: Proceedings of the Thirty-Second Annual ACM-SIAM Symposium on Discrete AlgorithmsCan linear systems be solved faster than matrix multiplication? While there has been remarkable progress for the special cases of graph structured linear systems, in the general setting, the bit complexity of solving an n × n linear system Ax = b is Õ(n...
Comments