skip to main content
article

Sampling from large matrices: An approach through geometric functional analysis

Published:01 July 2007Publication History
Skip Abstract Section

Abstract

We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(rlog r) with a small error in the spectral norm, where r = ‖A2F/‖A22 is the numerical rank of A. The numerical rank is always bounded by, and is a stable relaxation of, the rank of A. This yields an asymptotically optimal guarantee in an algorithm for computing low-rank approximations of A. We also prove asymptotically optimal estimates on the spectral norm and the cut-norm of random submatrices of A. The result for the cut-norm yields a slight improvement on the best-known sample complexity for an approximation algorithm for MAX-2CSP problems. We use methods of Probability in Banach spaces, in particular the law of large numbers for operator-valued random variables.

References

  1. Alon, N., Fernandez De La Vega, W., Kannan, R., and Karpinski, M. 2002. Random sampling and approximation of MAX-CSPs. In Proceedings of the 34th ACM Symposium on Theory of Computing, ACM, New York, 232--239. Google ScholarGoogle Scholar
  2. Alon, N., Fernandez De La Vega, W., Kannan, R., and Karpinski, M. 2003. Random Sampling and approximation of MAX-CSPs. J. Comput. Syst. Sci. 67, 212--243. Google ScholarGoogle Scholar
  3. Azar, Y., Fiat, A., Karlin, A., Mcscherry, F., and Saia, J. 2001. Spectral analysis for data mining. In Proceedings of the 33rd ACM Symposium on Theory of Computing, ACM, New York, 619--626. Google ScholarGoogle Scholar
  4. Berry, M. W., Drmac, Z., and Jessup, E. R. 1999. Matrices, vector spaces and information retrieval. SIAM Rev. 41, 335--362. Google ScholarGoogle Scholar
  5. Berry, M. W., Dumais, S. T., and O'brian, S. T. 1995. Using linear algebra for intelligent information retrieval. SIAM Rev. 37, 573--595. Google ScholarGoogle Scholar
  6. Bourgain, J., and Tzafriri, L. 1987. Invertibility of “large” sumatricies with applications to the geometry of Banach spaces and harmonic analysis. Israel Journal of Mathematics 57, 137--223.Google ScholarGoogle Scholar
  7. Deerwester, S. T., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R.H. 1990. Indexing by latent semantic analysis. J. Amer. Soci. Inf. Sci. 41, 391--407.Google ScholarGoogle Scholar
  8. Drineas, P., Frieze, A., Kannan, R., Vempala, S., and Vinay, V. 2004. Clustering large graphs via Singular Value Decomposition. Mach. Learn. 56, 9--33. Google ScholarGoogle Scholar
  9. Drineas, P., and Kannan, R. 2003. Pass efficient algorithms for approximating large matrices. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (Baltimore, MD), ACM, New York, 223--232. Google ScholarGoogle Scholar
  10. Drineas, P., Kannan, R., and Mahoney, M. 2006a. Fast Monte-Carlo algorithms for Matrices II: Computing a low-rank approximation to a matrix. SIAM J. Comput. 36, 158--183. Google ScholarGoogle Scholar
  11. Drineas, P., Mahoney, M. P., and Kannan, R. 2006b. Fast Monte-Carlo algorithms for matrices III: Computing an efficient approximate decomposition of a matrix. SIAM J. Comput. 36, 184--206. Google ScholarGoogle Scholar
  12. Fernandez De La Vega, W. 1996. MAX-CUT has a randomized approximation scheme in dense graphs. Rand. Struct. Algorithms 8, 187--199. Google ScholarGoogle Scholar
  13. Frieze, A., Kannan, R., and Vempala, S. 2004. Fast Monte-Carlo algorithms for finding low-rank approximations. J. ACM 51, 1025--1041. Google ScholarGoogle Scholar
  14. Jerry, M. J., and Linoff, G. 1997. Data mining techniques. Wiley, New York.Google ScholarGoogle Scholar
  15. Kashin, B., and Tzafriri, L. Some remarks on the restrictions of operators to coordinate subspaces. Unpublished notes.Google ScholarGoogle Scholar
  16. Ledoux, M., and Talagrand, M. 1991. Probability in Banach spaces, Springer-Verlag, New York.Google ScholarGoogle Scholar
  17. LUNIN, A. A. 1975. On operator norms of submatrices. Math. USSR Sbornik 27, 481--502.Google ScholarGoogle Scholar
  18. Papadimitriou, C. H., Raghvan, P., Tamaki, H., and Vempala, S. 1998. Latent semantic indexing: A probabilistic analysis. J. Comput. Syst. Sci. 61, 217--235. Google ScholarGoogle Scholar
  19. Rudelson, M. 1999. Random vectors in isotropipc position. J. Funct. Anal. 164, 60--72.Google ScholarGoogle Scholar
  20. Talagrand, M. 1995. Sections of smooth convex bodies via majorizing measures. Acta Math. 175, 273--300Google ScholarGoogle Scholar
  21. Vershynin, R. 2001. John's decompositions: Selecting a large part. Isr. J. Math. 122, 253--277.Google ScholarGoogle Scholar

Index Terms

  1. Sampling from large matrices: An approach through geometric functional analysis

        Recommendations

        Reviews

        Bruce E. Litow

        This paper explores randomized sampling of matrices by submatrices. This is not a new topic, but the method employed is new and interesting. The main results are somewhat involved, but the key ideas used throughout the paper are these (where A is a finite dimensional matrix, not necessarily square): (1) Use of numerical rank, which exhibits stability not shared by the rank. This is defined as: where the numerator has the Frobenius norm, which is the sum of the squares of the singular values of A, and the denominator has the ℓ 2 norm, that is, the maximum singular value. The sampling parameter (number of rows needed) is bounded above by O(r ·log r). (See Theorem 1.1 of the paper.) The O-notation hides 1η 4·Δ, where 0 < η,Δ < 1, 1 - 2exp(-O(1Δ)) is the probability of sampling success (O notation here indicates an absolute constant), and η determines the error. (2) A law of large numbers for operator-valued random variables. This is the central contribution of the paper and represents an approach distinct from linear algebra techniques. This also allows for a natural notion of row or column sampling of A. (3) A series of tail distribution bonds on expected values of sequences of vectors. These results can undoubtedly be used in areas not covered in this paper, and so assume independent interest. Although the paper is not self-contained, the citations are sufficient for further exploration and the presentation is crisp and quite clear. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader