Abstract
We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(rlog r) with a small error in the spectral norm, where r = ‖A‖2F/‖A‖22 is the numerical rank of A. The numerical rank is always bounded by, and is a stable relaxation of, the rank of A. This yields an asymptotically optimal guarantee in an algorithm for computing low-rank approximations of A. We also prove asymptotically optimal estimates on the spectral norm and the cut-norm of random submatrices of A. The result for the cut-norm yields a slight improvement on the best-known sample complexity for an approximation algorithm for MAX-2CSP problems. We use methods of Probability in Banach spaces, in particular the law of large numbers for operator-valued random variables.
- Alon, N., Fernandez De La Vega, W., Kannan, R., and Karpinski, M. 2002. Random sampling and approximation of MAX-CSPs. In Proceedings of the 34th ACM Symposium on Theory of Computing, ACM, New York, 232--239. Google Scholar
- Alon, N., Fernandez De La Vega, W., Kannan, R., and Karpinski, M. 2003. Random Sampling and approximation of MAX-CSPs. J. Comput. Syst. Sci. 67, 212--243. Google Scholar
- Azar, Y., Fiat, A., Karlin, A., Mcscherry, F., and Saia, J. 2001. Spectral analysis for data mining. In Proceedings of the 33rd ACM Symposium on Theory of Computing, ACM, New York, 619--626. Google Scholar
- Berry, M. W., Drmac, Z., and Jessup, E. R. 1999. Matrices, vector spaces and information retrieval. SIAM Rev. 41, 335--362. Google Scholar
- Berry, M. W., Dumais, S. T., and O'brian, S. T. 1995. Using linear algebra for intelligent information retrieval. SIAM Rev. 37, 573--595. Google Scholar
- Bourgain, J., and Tzafriri, L. 1987. Invertibility of “large” sumatricies with applications to the geometry of Banach spaces and harmonic analysis. Israel Journal of Mathematics 57, 137--223.Google Scholar
- Deerwester, S. T., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R.H. 1990. Indexing by latent semantic analysis. J. Amer. Soci. Inf. Sci. 41, 391--407.Google Scholar
- Drineas, P., Frieze, A., Kannan, R., Vempala, S., and Vinay, V. 2004. Clustering large graphs via Singular Value Decomposition. Mach. Learn. 56, 9--33. Google Scholar
- Drineas, P., and Kannan, R. 2003. Pass efficient algorithms for approximating large matrices. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (Baltimore, MD), ACM, New York, 223--232. Google Scholar
- Drineas, P., Kannan, R., and Mahoney, M. 2006a. Fast Monte-Carlo algorithms for Matrices II: Computing a low-rank approximation to a matrix. SIAM J. Comput. 36, 158--183. Google Scholar
- Drineas, P., Mahoney, M. P., and Kannan, R. 2006b. Fast Monte-Carlo algorithms for matrices III: Computing an efficient approximate decomposition of a matrix. SIAM J. Comput. 36, 184--206. Google Scholar
- Fernandez De La Vega, W. 1996. MAX-CUT has a randomized approximation scheme in dense graphs. Rand. Struct. Algorithms 8, 187--199. Google Scholar
- Frieze, A., Kannan, R., and Vempala, S. 2004. Fast Monte-Carlo algorithms for finding low-rank approximations. J. ACM 51, 1025--1041. Google Scholar
- Jerry, M. J., and Linoff, G. 1997. Data mining techniques. Wiley, New York.Google Scholar
- Kashin, B., and Tzafriri, L. Some remarks on the restrictions of operators to coordinate subspaces. Unpublished notes.Google Scholar
- Ledoux, M., and Talagrand, M. 1991. Probability in Banach spaces, Springer-Verlag, New York.Google Scholar
- LUNIN, A. A. 1975. On operator norms of submatrices. Math. USSR Sbornik 27, 481--502.Google Scholar
- Papadimitriou, C. H., Raghvan, P., Tamaki, H., and Vempala, S. 1998. Latent semantic indexing: A probabilistic analysis. J. Comput. Syst. Sci. 61, 217--235. Google Scholar
- Rudelson, M. 1999. Random vectors in isotropipc position. J. Funct. Anal. 164, 60--72.Google Scholar
- Talagrand, M. 1995. Sections of smooth convex bodies via majorizing measures. Acta Math. 175, 273--300Google Scholar
- Vershynin, R. 2001. John's decompositions: Selecting a large part. Isr. J. Math. 122, 253--277.Google Scholar
Index Terms
- Sampling from large matrices: An approach through geometric functional analysis
Recommendations
Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix
In many applications, the data consist of (or may be naturally formulated as) an $m \times n$ matrix $A$. It is often of interest to find a low-rank approximation to $A$, i.e., an approximation $D$ to the matrix $A$ of rank not greater than a specified ...
Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition
In many applications, the data consist of (or may be naturally formulated as) an $m \times n$ matrix $A$ which may be stored on disk but which is too large to be read into random access memory (RAM) or to practically perform superlinear polynomial time ...
Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication
Motivated by applications in which the data may be formulated as a matrix, we consider algorithms for several common linear algebra problems. These algorithms make more efficient use of computational resources, such as the computation time, random ...
Comments