article

Sampling from large matrices: An approach through geometric functional analysis

Authors:
Mark Rudelson

University of Missouri, Columbia, Missouri

University of Missouri, Columbia, Missouri
View Profile

,
Roman Vershynin

University of California, Davis, California

University of California, Davis, California
View Profile

Authors Info & Claims

Journal of the ACM Volume 54 Issue 4pp 21–eshttps://doi.org/10.1145/1255443.1255449

Published:01 July 2007Publication History

Journal of the ACM

Abstract

We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(rlog r) with a small error in the spectral norm, where r = ‖A‖²_F/‖A‖²₂ is the numerical rank of A. The numerical rank is always bounded by, and is a stable relaxation of, the rank of A. This yields an asymptotically optimal guarantee in an algorithm for computing low-rank approximations of A. We also prove asymptotically optimal estimates on the spectral norm and the cut-norm of random submatrices of A. The result for the cut-norm yields a slight improvement on the best-known sample complexity for an approximation algorithm for MAX-2CSP problems. We use methods of Probability in Banach spaces, in particular the law of large numbers for operator-valued random variables.

References

Alon, N., Fernandez De La Vega, W., Kannan, R., and Karpinski, M. 2002. Random sampling and approximation of MAX-CSPs. In Proceedings of the 34th ACM Symposium on Theory of Computing, ACM, New York, 232--239. Google Scholar
Alon, N., Fernandez De La Vega, W., Kannan, R., and Karpinski, M. 2003. Random Sampling and approximation of MAX-CSPs. J. Comput. Syst. Sci. 67, 212--243. Google Scholar
Azar, Y., Fiat, A., Karlin, A., Mcscherry, F., and Saia, J. 2001. Spectral analysis for data mining. In Proceedings of the 33rd ACM Symposium on Theory of Computing, ACM, New York, 619--626. Google Scholar
Berry, M. W., Drmac, Z., and Jessup, E. R. 1999. Matrices, vector spaces and information retrieval. SIAM Rev. 41, 335--362. Google Scholar
Berry, M. W., Dumais, S. T., and O'brian, S. T. 1995. Using linear algebra for intelligent information retrieval. SIAM Rev. 37, 573--595. Google Scholar
Bourgain, J., and Tzafriri, L. 1987. Invertibility of “large” sumatricies with applications to the geometry of Banach spaces and harmonic analysis. Israel Journal of Mathematics 57, 137--223.Google Scholar
Deerwester, S. T., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R.H. 1990. Indexing by latent semantic analysis. J. Amer. Soci. Inf. Sci. 41, 391--407.Google Scholar
Drineas, P., Frieze, A., Kannan, R., Vempala, S., and Vinay, V. 2004. Clustering large graphs via Singular Value Decomposition. Mach. Learn. 56, 9--33. Google Scholar
Drineas, P., and Kannan, R. 2003. Pass efficient algorithms for approximating large matrices. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (Baltimore, MD), ACM, New York, 223--232. Google Scholar
Drineas, P., Kannan, R., and Mahoney, M. 2006a. Fast Monte-Carlo algorithms for Matrices II: Computing a low-rank approximation to a matrix. SIAM J. Comput. 36, 158--183. Google Scholar
Drineas, P., Mahoney, M. P., and Kannan, R. 2006b. Fast Monte-Carlo algorithms for matrices III: Computing an efficient approximate decomposition of a matrix. SIAM J. Comput. 36, 184--206. Google Scholar
Fernandez De La Vega, W. 1996. MAX-CUT has a randomized approximation scheme in dense graphs. Rand. Struct. Algorithms 8, 187--199. Google Scholar
Frieze, A., Kannan, R., and Vempala, S. 2004. Fast Monte-Carlo algorithms for finding low-rank approximations. J. ACM 51, 1025--1041. Google Scholar
Jerry, M. J., and Linoff, G. 1997. Data mining techniques. Wiley, New York.Google Scholar
Kashin, B., and Tzafriri, L. Some remarks on the restrictions of operators to coordinate subspaces. Unpublished notes.Google Scholar
Ledoux, M., and Talagrand, M. 1991. Probability in Banach spaces, Springer-Verlag, New York.Google Scholar
LUNIN, A. A. 1975. On operator norms of submatrices. Math. USSR Sbornik 27, 481--502.Google Scholar
Papadimitriou, C. H., Raghvan, P., Tamaki, H., and Vempala, S. 1998. Latent semantic indexing: A probabilistic analysis. J. Comput. Syst. Sci. 61, 217--235. Google Scholar
Rudelson, M. 1999. Random vectors in isotropipc position. J. Funct. Anal. 164, 60--72.Google Scholar
Talagrand, M. 1995. Sections of smooth convex bodies via majorizing measures. Acta Math. 175, 273--300Google Scholar
Vershynin, R. 2001. John's decompositions: Selecting a large part. Isr. J. Math. 122, 253--277.Google Scholar

Index Terms

Sampling from large matrices: An approach through geometric functional analysis

Recommendations

Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix

In many applications, the data consist of (or may be naturally formulated as) an $m \times n$ matrix $A$. It is often of interest to find a low-rank approximation to $A$, i.e., an approximation $D$ to the matrix $A$ of rank not greater than a specified ...
Read More
Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition

In many applications, the data consist of (or may be naturally formulated as) an $m \times n$ matrix $A$ which may be stored on disk but which is too large to be read into random access memory (RAM) or to practically perform superlinear polynomial time ...
Read More
Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication

Motivated by applications in which the data may be formulated as a matrix, we consider algorithms for several common linear algebra problems. These algorithms make more efficient use of computational resources, such as the computation time, random ...
Read More

Reviews

Reviewer: Bruce E. Litow

This paper explores randomized sampling of matrices by submatrices. This is not a new topic, but the method employed is new and interesting. The main results are somewhat involved, but the key ideas used throughout the paper are these (where A is a finite dimensional matrix, not necessarily square): (1) Use of numerical rank, which exhibits stability not shared by the rank. This is defined as: where the numerator has the Frobenius norm, which is the sum of the squares of the singular values of A, and the denominator has the ℓ 2 norm, that is, the maximum singular value. The sampling parameter (number of rows needed) is bounded above by O(r ·log r). (See Theorem 1.1 of the paper.) The O-notation hides 1η 4·Δ, where 0 < η,Δ < 1, 1 - 2exp(-O(1Δ)) is the probability of sampling success (O notation here indicates an absolute constant), and η determines the error. (2) A law of large numbers for operator-valued random variables. This is the central contribution of the paper and represents an approach distinct from linear algebra techniques. This also allows for a natural notion of row or column sampling of A. (3) A series of tail distribution bonds on expected values of sequences of vectors. These results can undoubtedly be used in areas not covered in this paper, and so assume independent interest. Although the paper is not self-contained, the citations are sufficient for further exploration and the presentation is crisp and quite clear. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Journal of the ACM Volume 54, Issue 4
July 2007
176 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/1255443
Issue’s Table of Contents

Copyright © 2007 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 2007
Published in jacm Volume 54, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Monte-Carlo methods
Randomized algorithms
low-rank approximations
massive data sets
singular-value decompositions
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 178
  Total Citations
  View Citations
- 1,640
  Total Downloads
- Downloads (Last 12 months)104
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sampling from large matrices: An approach through geometric functional analysis

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix

Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition

Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Sampling from large matrices: An approach through geometric functional analysis

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix

Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition

Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media