An adaptive blocking strategy for matrix factorizations

Bischof, Christian H.; Lacroute, Philippe G.

doi:10.1007/3-540-53065-7_101

Christian H. Bischof¹ &
Philippe G. Lacroute¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 457))

Included in the following conference series:

159 Accesses
3 Citations

Abstract

On most high-performance architectures, data movement is slow compared to floating-point (in particular, vector) performance. On these architectures block algorithms have been successful for matrix computations. By considering a matrix as a collection of submatrices (the so-called blocks) one naturally arrives at algorithms that require little data movement. The optimal blocking strategy, however, depends on the computing environment and on the problem parameters. Current approaches use fixed-width blocking strategies that are not optimal. This paper presents an “adaptive blocking” methodology for determining in a systematic manner an optimal blocking strategy for a uniprocessor machine. We demonstrate this technique on a block QR factorization routine on a uniprocessor. After generating timing models for the high-level kernels of the algorithm we can formulate the optimal blocking strategy in a recurrence relation that we can solve inexpensively with a dynamic programming technique. Experiments on one processor of a CRAY-2 show that in fact the resulting blocking strategy is as good as any fixed-width blocking strategy. So while we do not know the optimum fixed-width blocking strategy unless we re-run the same problem several times, adaptive blocking provides optimum performance in the very first run.

This work was supported by the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy, under Contract W-31-109-Eng-38

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alfred Aho, John Hopcroft, and Jeffrey Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass., 1974.
Google Scholar
Michael Berry, Kyle Gallivan, William Harrod, William Jalby, Sy-Shin Lo, Ulrike Meier, Bernard Philippe, and Ahmed Sameh. Parallel algorithms on the Cedar system. In W. Händler, editor, Proceedings of CONPAR 86, pages 25–39. Springer Verlag, New York, 1986.
Google Scholar
Christian Bischof, James Demmel, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, and Danny Sorensen. LAPACK Working Note #5: Provisional contents. Technical Report ANL-88-38, Argonne National Laboratory, Mathematics and Computer Sciences Division, September 1988.
Google Scholar
Christian H. Bischof. Adaptive blocking in the QR factorization. The Journal of Supercomputing, 3(3):193–208, 1989.
Article Google Scholar
Christian H. Bischof. Computing the singular value decomposition on a distributed system of vector processors. Parallel Computing, 11:171–186, 1989.
Article Google Scholar
Christian H. Bischof and Jack J. Dongarra. A project for developing a linear algebra library for high-performance computers. In Graham Carey, editor, Parallel and Vector Supercomputing: Methods and Algorithms, pages 45–56. John Wiley & Sons, Somerset, NJ, 1989.
Google Scholar
Christian H. Bischof and Charles F. Van Loan. The WY representation for products of Householder matrices. SIAM Journal on Scientific and Statistical Computing, 8:s2–s13, 1987.
Article Google Scholar
William S. Cleveland, Susan J. Devlin, and Eric Grosse. Regression by local fitting: Methods, properties and computational algorithms. Journal of Econometrics, 37:87–114, 1988.
Article Google Scholar
Jim Demmel, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, and Danny Sorensen. Prospectus for the development of a linear algebra library for high-performance computers. Technical Report ANL-MCS-TM97, Argonne National Laboratory, Mathematics and Computer Sciences Division, September 1987.
Google Scholar
Jack Dongarra and Eric Grosse. Distribution of mathematical software by electronic mail. Communications of the ACM, 30(5):403–407, 1987.
Article Google Scholar
Jack Dongarra, Ahmed Sameh, and Danny Sorensen. Implementation of some concurrent algorithms for matrix factorization. Parallel Computing, 3(1):25–34, 1986.
Article Google Scholar
Jack J. Dongarra, Jeremy Du Croz, Sven Hammarling, and Richard J. Hanson. An extended set of Fortran basic linear algebra subprograms. ACM Transactions on Mathematical Software, 14(1):1–17, 1988.
Article Google Scholar
Jack J. Dongarra, Sven J. Hammarling, and Danny C. Sorensen. Block reduction of matrices to condensed form for eigenvalue computations. Technical Report MCS-TM-99, Argonne National Laboratory, Mathematics and Computer Sciences Division, September 1987.
Google Scholar
Kyle Gallivan, William Jalby, Ulrike Meier, and Ahmed Sameh. The impact of hierarchical memory systems on linear algebra algorithm design. SIAM Journal on Scientific and Statistical Computing, 8(6):1079–1084, November 1987.
Article Google Scholar
Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 1983.
Google Scholar
William Harrod. Solving linear least squares problems on an Alliant FX/8. Technical report, University of Illinois at Urbana-Champaign, Center for Supercomputing Research and Development, 1986.
Google Scholar
Kai Hwang and Fayé A. Briggs. Computer Architecture and Parallel Processing. McGraw-Hill, New York, 1984.
Google Scholar
Peter Lancaster and Kestutis Šalkauskas. Curve and Surface Fitting: An Introduction. Academic Press, San Diego, 1986.
Google Scholar
C. L. Lawson, R. J. Hanson, R. J. Kincaid, and F. T. Krogh. Basic linear algebra subprograms for Fortran usage. ACM Transactions on Mathematical Software, 5(3):308–323, September 1979.
Article Google Scholar
Peter Mayes and Guiseppe Radicati di Brozolo. Block factorization algorithms on the IBM 3090/VF. In Proceedings of the International Meeting on Supercomputing, 1989.
Google Scholar
Robert Schreiber. Block Algorithms for Parallel Machines, pages 197–207. Number 13 in IMA Volumes in Mathematics and its Applications. Springer Verlag, Berlin, 1988.
Google Scholar
Robert Schreiber and Charles Van Loan. A storage efficient WY representation for products of Householder transformations. SIAM Journal on Scientific and Statistical Computing, 10(1):53–57, 1989.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Avenue, 60439-4801, Argonne, IL
Christian H. Bischof & Philippe G. Lacroute

Authors

Christian H. Bischof
View author publications
You can also search for this author in PubMed Google Scholar
Philippe G. Lacroute
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Helmar Burkhart

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bischof, C.H., Lacroute, P.G. (1990). An adaptive blocking strategy for matrix factorizations. In: Burkhart, H. (eds) CONPAR 90 — VAPP IV. VAPP CONPAR 1990 1990. Lecture Notes in Computer Science, vol 457. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-53065-7_101

Download citation

DOI: https://doi.org/10.1007/3-540-53065-7_101
Published: 02 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-53065-7
Online ISBN: 978-3-540-46597-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics