Article

Optimal sparse matrix dense vector multiplication in the I/O-model

Authors:
Michael A. Bender

Stony Brook University, Stony Brook, NY

Stony Brook University, Stony Brook, NY
View Profile

,
Gerth Stølting Brodal

University of Aarhus, Aarhus, Denmark

University of Aarhus, Aarhus, Denmark
View Profile

,
Rolf Fagerberg

University of Southern Denmark, Odense M, Denmark

University of Southern Denmark, Odense M, Denmark
View Profile

,
Riko Jacob

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Elias Vicari

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architecturesJune 2007Pages 61–70https://doi.org/10.1145/1248377.1248391

Published:09 June 2007Publication History

SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures

Pages 61–70

ABSTRACT

We analyze the problem of sparse-matrix dense-vector multiplication (SpMV) in the I/O-model. The task of SpMV is to compute y := Ax, where A is a sparse N x N matrix and x and y are vectors. Here, sparsity is expressed by the parameter k that states that A has a total of at most kN nonzeros, i.e., an average number of k nonzeros per column. The extreme choices for parameter k are well studied special cases, namely for k=1 permuting and for k=N dense matrix-vector multiplication.

We study the worst-case complexity of this computational task, i.e., what is the best possible upper bound on the number of I/Os depending on k and N only. We determine this complexity up to a constant factor for large ranges of the parameters. By our arguments, we find that most matrices with kN nonzeros require this number of I/Os, even if the program may depend on the structure of the matrix. The model of computation for the lower bound is a combination of the I/O-models of Aggarwal and Vitter, and of Hong and Kung.

We study two variants of the problem, depending on the memory layout of A.

If A is stored in column major layout, SpMV has I/O complexity Θ(min{kN<over>B(1+log_M/BN<over>max{M,k}), kN}) for k ≤ N^1-ε and any constant 1> ε > 0. If the algorithm can choose the memory layout, the I/O complexity of SpMV is Θ(min{kN<over>B(1+log_M/BN<over>kM), kN]) for k ≤ ³√N.

In the cache oblivious setting with tall cache assumption M ≥ B^1+ε, the I/O complexity is Ο(kN<over>B(1+log_M/B N<over>k)) for A in column major layout.

References

A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Comm. ACM, 31(9):1116--1127, September 1988. Google ScholarDigital Library
L. Arge and P. B. Miltersen. On showing lower bounds for external-memory computational geometry problems. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, vol. 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 139--159. AMS Press 1999. Google ScholarDigital Library
G. S. Brodal, R. Fagerberg, and G. Moruz. Cache-aware and cache-oblivious adaptive sorting. In Proc. 32nd International Colloquium on Automata, Languages, and Programming, vol. 3580 of Lecture Notes in Computer Science, pages 576--588. Springer Verlag, Berlin, 2005. Google ScholarDigital Library
T. H. Cormen, T. Sundquist, and L. F. Wisniewski. Asymptotically tight bounds for performing BMMC permutations on parallel disk systems. SIAM J. Comput., 28(1):105--136, 1999. Google ScholarDigital Library
J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, R. V. Antoine Petitet, R. C. Whaley, and K. Yelick. Self-adapting linear algebra algorithms and software. Proc. of the IEEE, Special Issue on Program Generation, Optimization, and Adaptation, 93(2), February 2005.Google ScholarCross Ref
S. Filippone and M. Colajanni. PSBLAS: A library for parallel linear algebra computation on sparse matrices. ACM Trans. on Math. Software, 26(4):527--550, Dec. 2000. Google ScholarDigital Library
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proc. 40th Annual Symp. on Foundations of Computer Science (FOCS), pages 285--297, New York, NY, Oct. 17-19 1999. Google ScholarDigital Library
R. Hartshorne. Algebraic Geometry. Springer, 1977.Google Scholar
T. Haveliwala. Efficient computation of pagerank. Technical Report 1999-31, Database Group, Computer Science Department, Stanford University, Feb. 1999. Available at http://dbpubs.stanford.edu/pub/1999-31.Google Scholar
E. J. Im. Optimizing the Performance of Sparse Matrix-Vector Multiplication. PhD thesis, University of California, Berkeley, May 2000. Google ScholarDigital Library
H. Jia-Wei and H. T. Kung. I/O complexity: The red-blue pebble game. In STOC '81: Proc. 13th annual ACM symposium on Theory of computing, pages 326--333, New York, NY, USA, 1981. ACM Press. Google ScholarDigital Library
R. Raz. Multi-linear formulas for permanent and determinant are of super-polynomial size. In Proc. 36th Annual ACM Symposium on Theory of Computing (STOC), pages 633--641, Chicago, IL, USA, June 2004. Google ScholarDigital Library
K. Remington and R. Pozo. NIST sparse BLAS user's guide. Technical report, National Institute of Standards and Technology, Gaithersburg, Maryland, 1996.Google Scholar
Y. Saad. Sparsekit: a basic tool kit for sparse matrix computations. Technical report, Computer Science Department, University of Minnesota, June 1994.Google Scholar
S. Toledo. A survey of out-of-core algorithms in numerical linear algebra. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, vol. 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 161--179. AMS Press 1999. Google ScholarDigital Library
J. S. Vitter. External memory algorithms and data structures. In J. M. Abello and J. S. Vitter, editors, External Memory Algorithms, vol. 50 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 1--38. AMS Press 1999. Google ScholarDigital Library
R. Vudac, J. W. Demmel, and K. A. Yelick. The Optimized Sparse Kernel Interface (OSKI) Library: User's Guide for Version 1.0.1b. Berkeley Benchmarking and OPtimization (BeBOP) Group, March 15 2006.Google Scholar
R. W. Vuduc. Automatic Performance Tuning of Sparse Matrix Kernels. PhD thesis, University of California, Berkeley, Fall 2003. Google ScholarDigital Library

Index Terms

Optimal sparse matrix dense vector multiplication in the I/O-model
1. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model
Special Title: Parallelism on Algorithms and Architectures (SPAA); Guest Editors: Cyril Gavoille, Boaz Patt-Shamir and Christian Scheideler

We study the problem of sparse-matrix dense-vector multiplication (SpMV) in external memory. The task of SpMV is to compute y:=Ax, where A is a sparse N×N matrix and x is a vector. We express sparsity by a parameter k, and for each choice of k consider ...
Read More
A two-pronged progress in structured dense matrix vector multiplication
SODA '18: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms

Matrix-vector multiplication is one of the most fundamental computing primitives. Given a matrix A ∈ F^{N × N} and a vector b ∈ F^N, it is known that in the worst case Θ(N²) operations over F are needed to compute Ab. Many types of structured matrices do ...
Read More
Solving sparse linear systems faster than matrix multiplication
SODA '21: Proceedings of the Thirty-Second Annual ACM-SIAM Symposium on Discrete Algorithms

Can linear systems be solved faster than matrix multiplication? While there has been remarkable progress for the special cases of graph structured linear systems, in the general setting, the bit complexity of solving an n × n linear system Ax = b is Õ(n^...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
June 2007
376 pages
ISBN:9781595936677
DOI:10.1145/1248377
General Chair:
Phillip B. Gibbons
Intel Research, USA
,
Program Chair:
Christian Scheideler
Technische Universität München, Germany
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
I/O-model
external memory algorithms
lower bound
sparse matrix dense vector multiplication
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate447of1,461submissions,31%
Upcoming Conference
SPAA '24

Sponsor:

sigact

sigact

36th ACM Symposium on Parallelism in Algorithms and Architectures

June 17 - 21, 2024

Nantes , France
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 495
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Optimal sparse matrix dense vector multiplication in the I/O-model

SPAA '07: Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures

ABSTRACT

References

Cited By

Index Terms

Recommendations

Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model

A two-pronged progress in structured dense matrix vector multiplication

Solving sparse linear systems faster than matrix multiplication