Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs

Chow, Edmond; Anzt, Hartwig; Dongarra, Jack

doi:10.1007/978-3-319-20119-1_1

Edmond Chow¹⁵,
Hartwig Anzt¹⁶ &
Jack Dongarra¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9137))

Included in the following conference series:

International Conference on High Performance Computing

2939 Accesses
15 Citations

Abstract

This paper presents a GPU implementation of an asynchronous iterative algorithm for computing incomplete factorizations. Asynchronous algorithms, with their ability to tolerate memory latency, form an important class of algorithms for modern computer architectures. Our GPU implementation considers several non-traditional techniques that can be important for asynchronous algorithms to optimize convergence and data locality. These techniques include controlling the order in which variables are updated by controlling the order of execution of thread blocks, taking advantage of cache reuse between thread blocks, and managing the amount of parallelism to control the convergence of the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anzt, H., Tomov, S., Dongarra, J., Heuveline, V.: A block-asynchronous relaxation method for graphics processing units. J. Parallel Distrib. Comput. 73(12), 1613–1626 (2013)
Article Google Scholar
Benzi, M., Joubert, W., Mateescu, G.: Numerical experiments with parallel orderings for ILU preconditioners. Electron. Trans. Numer. Anal. 8, 88–114 (1999)
MATH MathSciNet Google Scholar
Bergman, K. et al.: ExaScale computing study: technology challenges in achieving exascale systems (2008)
Google Scholar
Chow, E., Patel, A.: Fine-grained parallel incomplete LU factorization. SIAM J. Sci. Comput. 37, C169–C193 (2015)
Article MathSciNet Google Scholar
Contassot-Vivier, S., Jost, T., Vialle, S.: Impact of asynchronism on gpu accelerated parallel iterative computations. In: Jónasson, K. (ed.) PARA 2010, Part I. LNCS, vol. 7133, pp. 43–53. Springer, Heidelberg (2012)
Chapter Google Scholar
Davis, T.A.: The University of Florida Sparse Matrix Collection. NA DIGEST 92 (1994). http://www.netlib.org/na-digesthtml/
Doi, S.: On parallelism and convergence of incomplete LU factorizations. Appl. Numer. Math. 7(5), 417–436 (1991)
Article MATH MathSciNet Google Scholar
Duff, I.S., Meurant, G.A.: The effect of ordering on preconditioned conjugate gradients. BIT 29(4), 635–657 (1989)
Article MATH MathSciNet Google Scholar
Frommer, A., Szyld, D.B.: On asynchronous iterations. J. Comput. Appl. Math. 123, 201–216 (2000)
Article MATH MathSciNet Google Scholar
Innovative Computing Lab: Software distribution of MAGMA, July 2015. http://icl.cs.utk.edu/magma/
Lukarski, D.: Parallel Sparse Linear Algebra for Multi-core and Many-core Platforms - Parallel Solvers and Preconditioners. Ph.D. thesis, Karlsruhe Institute of Technology (KIT), Germany (2012)
Google Scholar
Naumov, M.: Parallel incomplete-LU and Cholesky factorization in the preconditioned iterative methods on the GPU. Technical report. NVR-2012-003, NVIDIA (2012)
Google Scholar
NVIDIA Corporation: NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK110. Whitepaper (2012)
Google Scholar
NVIDIA Corporation: CUSPARSE LIBRARY, July 2013
Google Scholar
NVIDIA Corporation: NVIDIA CUDA TOOLKIT V6.0, July 2013
Google Scholar
Poole, E.L., Ortega, J.M.: Multicolor ICCG methods for vector computers. SIAM J. Numer. Anal. 24, 1394–1417 (1987)
Article MATH MathSciNet Google Scholar
Saad, Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia (2003)
Book MATH Google Scholar
Venkatasubramanian, S., Vuduc, R.W.: Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems. In: Proceedings of the 23rd International Conference on Supercomputing, ICS 2009, pp. 244–255. ACM, New York (2009)
Google Scholar
Volkov, V.: Better performance at lower occupancy. In: GPU Technology Conference (2010)
Google Scholar

Download references

Acknowledgments

This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Numbers DE-SC-0012538 and DE-SC-0010042. Support from NVIDIA is also acknowledged.

Author information

Authors and Affiliations

Georgia Institute of Technology, Atlanta, GA, USA
Edmond Chow
University of Tennessee, Knoxville, TN, USA
Hartwig Anzt & Jack Dongarra

Authors

Edmond Chow
View author publications
You can also search for this author in PubMed Google Scholar
Hartwig Anzt
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hartwig Anzt .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Julian M. Kunkel
Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Thomas Ludwig

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chow, E., Anzt, H., Dongarra, J. (2015). Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-20119-1_1
Published: 20 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20118-4
Online ISBN: 978-3-319-20119-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics