skip to main content
research-article
Public Access

Low-Rank Approximation and Regression in Input Sparsity Time

Published:30 January 2017Publication History
Skip Abstract Section

Abstract

We design a new distribution over m × n matrices S so that, for any fixed n × d matrix A of rank r, with probability at least 9/10, ∥SAx2 = (1 ± ε)∥Ax2 simultaneously for all x ∈ Rd. Here, m is bounded by a polynomial in rε− 1, and the parameter ε ∈ (0, 1]. Such a matrix S is called a subspace embedding. Furthermore, SA can be computed in O(nnz(A)) time, where nnz(A) is the number of nonzero entries of A. This improves over all previous subspace embeddings, for which computing SA required at least Ω(ndlog d) time. We call these S sparse embedding matrices.

Using our sparse embedding matrices, we obtain the fastest known algorithms for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and ℓp regression.

More specifically, let b be an n × 1 vector, ε > 0 a small enough value, and integers k, p ⩾ 1. Our results include the following.

Regression: The regression problem is to find d × 1 vector x′ for which ∥Ax′ − bp ⩽ (1 + ε)min xAxbp. For the Euclidean case p = 2, we obtain an algorithm running in O(nnz(A)) + Õ(d3ε −2) time, and another in O(nnz(A)log(1/ε)) + Õ(d3 log (1/ε)) time. (Here, Õ(f) = f ċ log O(1)(f).) For p ∈ [1, ∞), more generally, we obtain an algorithm running in O(nnz(A) log n) + O(r−1)C time, for a fixed C.

Low-rank approximation: We give an algorithm to obtain a rank-k matrix Âk such that ∥AÂkF ≤ (1 + ε )∥ AAkF, where Ak is the best rank-k approximation to A. (That is, Ak is the output of principal components analysis, produced by a truncated singular value decomposition, useful for latent semantic indexing and many other statistical problems.) Our algorithm runs in O(nnz(A)) + Õ(nk2ε−4 + k3ε−5) time.

Leverage scores: We give an algorithm to estimate the leverage scores of A, up to a constant factor, in O(nnz(A)log n) + Õ(r3)time.

References

  1. Dimitris Achlioptas, Amos Fiat, Anna R. Karlin, and Frank McSherry. 2001. Web search via hub synthesis. In FOCS. 500--509. Google ScholarGoogle ScholarCross RefCross Ref
  2. Dimitris Achlioptas and Frank McSherry. 2005. On spectral learning of mixtures of distributions. In COLT. 458--469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Dimitris Achlioptas and Frank McSherry. 2007. Fast computation of low-rank matrix approximations. Journal of the ACM 54, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Nir Ailon and Bernard Chazelle. 2006. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. In STOC. 557--563. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sanjeev Arora, Elad Hazan, and Satyen Kale. 2006. A fast random sampling algorithm for sparsifying matrices. In APPROX-RANDOM. 272--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Haim Avron, Huy L. Nguyen, and David P. Woodruff. 2013a. Subspace embeddings for the polynomial kernel. In Manuscript.Google ScholarGoogle Scholar
  7. Haim Avron, Vikas Sindhwani, and David P. Woodruff. 2013b. Sketching structured matrices for faster nonlinear regression. In NIPS.Google ScholarGoogle Scholar
  8. Yossi Azar, Amos Fiat, Anna R. Karlin, Frank McSherry, and Jared Saia. 2001. Spectral analysis of data. In STOC. 619--626. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Boutsidis and A. Gittens. 2012. Improved matrix algorithms via the subsampled randomized Hadamard transform. ArXiv E-prints.Google ScholarGoogle Scholar
  10. Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2004. Finding frequent items in data streams. Theoretical Computer Science 312, 1 3--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ho Yee Cheung, Tsz Chiu Kwok, and Lap Chi Lau. 2012. Fast matrix rank algorithms and applications. In STOC. 549--562. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Clarkson, P. Drineas, Malik Magdon-Ismail, M. Mahoney, Xiangrui Meng, and David P. Woodruff. 2013. The fast Cauchy transform and faster robust linear regression. In SODA.Google ScholarGoogle Scholar
  13. Kenneth L. Clarkson and David P. Woodruff. 2009. Numerical linear algebra in the streaming model. In STOC. 205--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Anirban Dasgupta, Petros Drineas, Boulos Harb, Ravi Kumar, and Michael W. Mahoney. 2009. Sampling algorithms and coresets for ℓp regression. SIAM Journal on Computing 38, 5, 2060--2078. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. 2010. A sparse Johnson-Lindenstrauss transform. In STOC. 341--350.Google ScholarGoogle Scholar
  16. Amit Deshpande, Luis Rademacher, Santosh Vempala, and Grant Wang. 2006. Matrix approximation and projective clustering via volume sampling. In SODA. 1117--1126. Google ScholarGoogle ScholarCross RefCross Ref
  17. Amit Deshpande and Santosh Vempala. 2006. Adaptive sampling and fast low-rank matrix approximation. In APPROX-RANDOM. 292--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Petros Drineas, Alan M. Frieze, Ravi Kannan, Santosh Vempala, and V. Vinay. 2004. Clustering large graphs via the singular value decomposition. Machine Learning 56, 1--3, 9--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Petros Drineas, Ravi Kannan, and Michael W. Mahoney. 2006a. Fast Monte Carlo algorithms for matrices I: Approximating matrix multiplication. SIAM Journal on Computing 36, 1, 132--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Petros Drineas, Ravi Kannan, and Michael W. Mahoney. 2006b. Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation to a matrix. SIAM Journal on Computing 36, 1, 158--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Petros Drineas, Ravi Kannan, and Michael W. Mahoney. 2006c. Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition. SIAM Journal on Computing 36, 1, 184--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Petros Drineas, Iordanis Kerenidis, and Prabhakar Raghavan. 2002. Competitive recommendation systems. In STOC. 82--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, and David P. Woodruff. 2011. Fast approximation of matrix coherence and statistical leverage. CoRR abs/1109.3843.Google ScholarGoogle Scholar
  24. Petros Drineas, Michael Mahoney, Malik Magdon-Ismail, and David P. Woodruff. 2012. Fast approximation of matrix coherence and statistical leverage. In ICML.Google ScholarGoogle Scholar
  25. Petros Drineas and Michael W. Mahoney. 2005. Approximating a gram matrix for improved kernel-based learning. In COLT. 323--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. 2006a. Sampling algorithms for ℓ2 regression and applications. In SODA. 1127--1136.Google ScholarGoogle Scholar
  27. Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. 2006b. Subspace sampling and relative-error matrix approximation: Column-based methods. In Approx-Random. 316--326.Google ScholarGoogle Scholar
  28. Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. 2006c. Subspace sampling and relative-error matrix approximation: Column-row-based methods. In ESA. 304--314.Google ScholarGoogle Scholar
  29. Petros Drineas, Michael W. Mahoney, S. Muthukrishnan, and Tamás Sarlós. 2011. Faster least squares approximation. Numerische Mathematik 117, 2, 217--249.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alan M. Frieze, Ravi Kannan, and Santosh Vempala. 2004. Fast Monte-Carlo algorithms for finding low-rank approximations. Journal of the ACM 51, 6, 1025--1041. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Gene H. Golub and Charles F. van Loan. 1996. Matrix Computations (3rd ed.). Johns Hopkins University Press, Baltimore, MD. I--XXVII, 1--694 pages.Google ScholarGoogle Scholar
  32. Uffe Haagerup. 1981. The best constants in the Khintchine inequality. Studia Mathematica 70, 3, 231--283. http://eudml.org/doc/218383.Google ScholarGoogle ScholarCross RefCross Ref
  33. N. Halko, P.-G. Martinsson, and J. A. Tropp. 2009. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. ArXiv E-prints.Google ScholarGoogle Scholar
  34. D. L. Hanson and F. T. Wright. 1971. A bound on tail probabilities for quadratic forms in independent random variables. Annals of Mathematical Statistics 42, 3, 1079--1083. Google ScholarGoogle ScholarCross RefCross Ref
  35. William B. Johnson and Joram Lindenstrauss. 1984. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics 189--206. Google ScholarGoogle ScholarCross RefCross Ref
  36. Daniel M. Kane and Jelani Nelson. 2010. A sparser Johnson-Lindenstrauss transform. CoRR abs/1012.1577.Google ScholarGoogle Scholar
  37. Daniel M. Kane and Jelani Nelson. 2012. Sparser Johnson-Lindenstrauss transforms. In SODA. 1195--1206. Google ScholarGoogle ScholarCross RefCross Ref
  38. Daniel M. Kane, Jelani Nelson, Ely Porat, and David P. Woodruff. 2011. Fast moment estimation in data streams in optimal space. In STOC. 745--754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ravindran Kannan, Hadi Salmasian, and Santosh Vempala. 2008. The spectral method for general mixture models. SIAM Journal on Computing 38, 3, 1141--1156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 5, 604--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. D. G. Luenberger and Y. Ye. 2008. Linear and Nonlinear Programming. Vol. 116. Springer, Berlin.Google ScholarGoogle Scholar
  42. Avner Magen and Anastasios Zouzias. 2011. Low rank matrix-valued Chernoff bounds and approximate matrix multiplication. In SODA. 1422--1436. Google ScholarGoogle ScholarCross RefCross Ref
  43. Frank McSherry. 2001. Spectral partitioning of random graphs. In FOCS. 529--537. Google ScholarGoogle ScholarCross RefCross Ref
  44. X. Meng and M. W. Mahoney. 2012. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. ArXiv E-prints.Google ScholarGoogle Scholar
  45. X. Meng, M. A. Saunders, and M. W. Mahoney. 2011. LSRN: A parallel iterative solver for strongly over- or under-determined systems. ArXiv E-prints.Google ScholarGoogle Scholar
  46. Gary L. Miller and Richard Peng. 2012. Iterative approaches to row sampling. CoRR abs/1211.2713 (2012).Google ScholarGoogle Scholar
  47. Jelani Nelson and Huy L. Nguyen. 2012. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. CoRR abs/1211.1002 (2012).Google ScholarGoogle Scholar
  48. Jelani Nelson and Huy L. Nguyen. 2013a. Lower bounds for oblivious subspace embeddings. CoRR abs/1308.3280, abs/1308.3280 (2013).Google ScholarGoogle Scholar
  49. Jelani Nelson and Huy L. Nguyen. 2013b. Sparsity lower bounds for dimensionality reducing maps. In STOC. 101--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Jelani Nelson and David P. Woodruff. 2010. Fast Manhattan sketches in data streams. In PODS. 99--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Nam H. Nguyen, Thong T. Do, and Trac D. Tran. 2009. A fast and efficient algorithm for low-rank approximation of a matrix. In STOC. 215--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Rasmus Pagh. 2013. Compressed matrix multiplication. ACM Transactions on Computation Theory 5, 3, 9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Christos H. Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, and Santosh Vempala. 2000. Latent semantic indexing: A probabilistic analysis. Journal of Computer System Sciences 61, 2, 217--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Saurabh Paul, Christos Boutsidis, Malik Magdon-Ismail, and Petros Drineas. 2012. Random projections for support vector machines. CoRR abs/1211.6085.Google ScholarGoogle Scholar
  55. Benjamin Recht. 2009. A simpler approach to matrix completion. CoRR abs/0910.0651.Google ScholarGoogle Scholar
  56. M. Rudelson. 1999. Random vectors in the isotropic position. Journal of Functional Analysis 164, 1, 60--72. Google ScholarGoogle ScholarCross RefCross Ref
  57. Mark Rudelson and Roman Vershynin. 2007. Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM 54, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Tamás Sarlós. 2006. Improved approximation algorithms for large matrices via random projections. In FOCS. 143--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Mikkel Thorup and Yin Zhang. 2004. Tabulation based 4-universal hashing with applications to second moment estimation. In SODA. 615--624.Google ScholarGoogle Scholar
  60. Lloyd N. Trefethen and David Bau. 1997. Numerical Linear Algebra. SIAM, Philadelphia, PA. I--XII, 1--361 pages. Google ScholarGoogle ScholarCross RefCross Ref
  61. David P. Woodruff. 2014. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science 10, 1--2, 1--157.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. David P. Woodruff and Qin Zhang. 2013. Subspace embeddings and LP regression using exponential random variables. In COLT.Google ScholarGoogle Scholar
  63. Jiyan Yang, Xiangrui Meng, and Michael W. Mahoney. 2013. Quantile regression for large-scale applications. CoRR abs/1305.0087.Google ScholarGoogle Scholar
  64. Anastasios Zouzias. 2011. A matrix hyperbolic cosine algorithm and applications. CoRR abs/1103.2793.Google ScholarGoogle Scholar
  65. Anastasios Zouzias and Nikolaos M. Freris. 2012. Randomized extended Kaczmarz for solving least-squares. CoRR abs/1205.5770.Google ScholarGoogle Scholar

Index Terms

  1. Low-Rank Approximation and Regression in Input Sparsity Time

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Journal of the ACM
        Journal of the ACM  Volume 63, Issue 6
        February 2017
        233 pages
        ISSN:0004-5411
        EISSN:1557-735X
        DOI:10.1145/3038256
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 January 2017
        • Accepted: 1 September 2016
        • Revised: 1 June 2015
        • Received: 1 November 2013
        Published in jacm Volume 63, Issue 6

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader