Abstract
This paper presents a GPU implementation of an asynchronous iterative algorithm for computing incomplete factorizations. Asynchronous algorithms, with their ability to tolerate memory latency, form an important class of algorithms for modern computer architectures. Our GPU implementation considers several non-traditional techniques that can be important for asynchronous algorithms to optimize convergence and data locality. These techniques include controlling the order in which variables are updated by controlling the order of execution of thread blocks, taking advantage of cache reuse between thread blocks, and managing the amount of parallelism to control the convergence of the algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anzt, H., Tomov, S., Dongarra, J., Heuveline, V.: A block-asynchronous relaxation method for graphics processing units. J. Parallel Distrib. Comput. 73(12), 1613–1626 (2013)
Benzi, M., Joubert, W., Mateescu, G.: Numerical experiments with parallel orderings for ILU preconditioners. Electron. Trans. Numer. Anal. 8, 88–114 (1999)
Bergman, K. et al.: ExaScale computing study: technology challenges in achieving exascale systems (2008)
Chow, E., Patel, A.: Fine-grained parallel incomplete LU factorization. SIAM J. Sci. Comput. 37, C169–C193 (2015)
Contassot-Vivier, S., Jost, T., Vialle, S.: Impact of asynchronism on gpu accelerated parallel iterative computations. In: Jónasson, K. (ed.) PARA 2010, Part I. LNCS, vol. 7133, pp. 43–53. Springer, Heidelberg (2012)
Davis, T.A.: The University of Florida Sparse Matrix Collection. NA DIGEST 92 (1994). http://www.netlib.org/na-digesthtml/
Doi, S.: On parallelism and convergence of incomplete LU factorizations. Appl. Numer. Math. 7(5), 417–436 (1991)
Duff, I.S., Meurant, G.A.: The effect of ordering on preconditioned conjugate gradients. BIT 29(4), 635–657 (1989)
Frommer, A., Szyld, D.B.: On asynchronous iterations. J. Comput. Appl. Math. 123, 201–216 (2000)
Innovative Computing Lab: Software distribution of MAGMA, July 2015. http://icl.cs.utk.edu/magma/
Lukarski, D.: Parallel Sparse Linear Algebra for Multi-core and Many-core Platforms - Parallel Solvers and Preconditioners. Ph.D. thesis, Karlsruhe Institute of Technology (KIT), Germany (2012)
Naumov, M.: Parallel incomplete-LU and Cholesky factorization in the preconditioned iterative methods on the GPU. Technical report. NVR-2012-003, NVIDIA (2012)
NVIDIA Corporation: NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK110. Whitepaper (2012)
NVIDIA Corporation: CUSPARSE LIBRARY, July 2013
NVIDIA Corporation: NVIDIA CUDA TOOLKIT V6.0, July 2013
Poole, E.L., Ortega, J.M.: Multicolor ICCG methods for vector computers. SIAM J. Numer. Anal. 24, 1394–1417 (1987)
Saad, Y.: Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, Philadelphia (2003)
Venkatasubramanian, S., Vuduc, R.W.: Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems. In: Proceedings of the 23rd International Conference on Supercomputing, ICS 2009, pp. 244–255. ACM, New York (2009)
Volkov, V.: Better performance at lower occupancy. In: GPU Technology Conference (2010)
Acknowledgments
This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Numbers DE-SC-0012538 and DE-SC-0010042. Support from NVIDIA is also acknowledged.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chow, E., Anzt, H., Dongarra, J. (2015). Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-20119-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20118-4
Online ISBN: 978-3-319-20119-1
eBook Packages: Computer ScienceComputer Science (R0)