Low-Rank Approximation and Regression in Input Sparsity Time

Authors:
Kenneth L. Clarkson

IBM Research, Almaden, Harry Road, San Jose, CA

IBM Research, Almaden, Harry Road, San Jose, CA
View Profile

,
David P. Woodruff

IBM Research, Almaden, Harry Road, San Jose, CA

IBM Research, Almaden, Harry Road, San Jose, CA
View Profile

Authors Info & Claims

Journal of the ACM Volume 63 Issue 6Article No.: 54pp 1–45https://doi.org/10.1145/3019134

Published:30 January 2017Publication History

Journal of the ACM

Abstract

We design a new distribution over m × n matrices S so that, for any fixed n × d matrix A of rank r, with probability at least 9/10, ∥SAx∥₂ = (1 ± ε)∥Ax∥₂ simultaneously for all x ∈ R^d. Here, m is bounded by a polynomial in rε^{− 1}, and the parameter ε ∈ (0, 1]. Such a matrix S is called a subspace embedding. Furthermore, SA can be computed in O(nnz(A)) time, where nnz(A) is the number of nonzero entries of A. This improves over all previous subspace embeddings, for which computing SA required at least Ω(ndlog d) time. We call these S sparse embedding matrices.

Using our sparse embedding matrices, we obtain the fastest known algorithms for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and ℓ_p regression.

More specifically, let b be an n × 1 vector, ε > 0 a small enough value, and integers k, p ⩾ 1. Our results include the following.

—Regression: The regression problem is to find d × 1 vector x′ for which ∥Ax′ − b∥_p ⩽ (1 + ε)min _x∥Ax − b∥_p. For the Euclidean case p = 2, we obtain an algorithm running in O(nnz(A)) + Õ(d³ε ⁻²) time, and another in O(nnz(A)log(1/ε)) + Õ(d³ log (1/ε)) time. (Here, Õ(f) = f ċ log ^O(1)(f).) For p ∈ [1, ∞), more generally, we obtain an algorithm running in O(nnz(A) log n) + O(r\ε ⁻¹)^C time, for a fixed C.

—Low-rank approximation: We give an algorithm to obtain a rank-k matrix Â_k such that ∥A − Â_k∥_F ≤ (1 + ε )∥ A − A_k∥_F, where A_k is the best rank-k approximation to A. (That is, A_k is the output of principal components analysis, produced by a truncated singular value decomposition, useful for latent semantic indexing and many other statistical problems.) Our algorithm runs in O(nnz(A)) + Õ(nk²ε⁻⁴ + k³ε⁻⁵) time.

—Leverage scores: We give an algorithm to estimate the leverage scores of A, up to a constant factor, in O(nnz(A)log n) + Õ(r³)time.

References

Dimitris Achlioptas, Amos Fiat, Anna R. Karlin, and Frank McSherry. 2001. Web search via hub synthesis. In FOCS. 500--509. Google ScholarCross Ref
Dimitris Achlioptas and Frank McSherry. 2005. On spectral learning of mixtures of distributions. In COLT. 458--469. Google ScholarDigital Library
Dimitris Achlioptas and Frank McSherry. 2007. Fast computation of low-rank matrix approximations. Journal of the ACM 54, 2. Google ScholarDigital Library
Nir Ailon and Bernard Chazelle. 2006. Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. In STOC. 557--563. Google ScholarDigital Library
Sanjeev Arora, Elad Hazan, and Satyen Kale. 2006. A fast random sampling algorithm for sparsifying matrices. In APPROX-RANDOM. 272--279. Google ScholarDigital Library
Haim Avron, Huy L. Nguyen, and David P. Woodruff. 2013a. Subspace embeddings for the polynomial kernel. In Manuscript.Google Scholar
Haim Avron, Vikas Sindhwani, and David P. Woodruff. 2013b. Sketching structured matrices for faster nonlinear regression. In NIPS.Google Scholar
Yossi Azar, Amos Fiat, Anna R. Karlin, Frank McSherry, and Jared Saia. 2001. Spectral analysis of data. In STOC. 619--626. Google ScholarDigital Library
C. Boutsidis and A. Gittens. 2012. Improved matrix algorithms via the subsampled randomized Hadamard transform. ArXiv E-prints.Google Scholar
Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2004. Finding frequent items in data streams. Theoretical Computer Science 312, 1 3--15.Google ScholarDigital Library
Ho Yee Cheung, Tsz Chiu Kwok, and Lap Chi Lau. 2012. Fast matrix rank algorithms and applications. In STOC. 549--562. Google ScholarDigital Library
K. Clarkson, P. Drineas, Malik Magdon-Ismail, M. Mahoney, Xiangrui Meng, and David P. Woodruff. 2013. The fast Cauchy transform and faster robust linear regression. In SODA.Google Scholar
Kenneth L. Clarkson and David P. Woodruff. 2009. Numerical linear algebra in the streaming model. In STOC. 205--214. Google ScholarDigital Library
Anirban Dasgupta, Petros Drineas, Boulos Harb, Ravi Kumar, and Michael W. Mahoney. 2009. Sampling algorithms and coresets for &ell;_p regression. SIAM Journal on Computing 38, 5, 2060--2078. Google ScholarDigital Library
Anirban Dasgupta, Ravi Kumar, and Tamás Sarlós. 2010. A sparse Johnson-Lindenstrauss transform. In STOC. 341--350.Google Scholar
Amit Deshpande, Luis Rademacher, Santosh Vempala, and Grant Wang. 2006. Matrix approximation and projective clustering via volume sampling. In SODA. 1117--1126. Google ScholarCross Ref
Amit Deshpande and Santosh Vempala. 2006. Adaptive sampling and fast low-rank matrix approximation. In APPROX-RANDOM. 292--303. Google ScholarDigital Library
Petros Drineas, Alan M. Frieze, Ravi Kannan, Santosh Vempala, and V. Vinay. 2004. Clustering large graphs via the singular value decomposition. Machine Learning 56, 1--3, 9--33. Google ScholarDigital Library
Petros Drineas, Ravi Kannan, and Michael W. Mahoney. 2006a. Fast Monte Carlo algorithms for matrices I: Approximating matrix multiplication. SIAM Journal on Computing 36, 1, 132--157. Google ScholarDigital Library
Petros Drineas, Ravi Kannan, and Michael W. Mahoney. 2006b. Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation to a matrix. SIAM Journal on Computing 36, 1, 158--183. Google ScholarDigital Library
Petros Drineas, Ravi Kannan, and Michael W. Mahoney. 2006c. Fast Monte Carlo algorithms for matrices III: Computing a compressed approximate matrix decomposition. SIAM Journal on Computing 36, 1, 184--206. Google ScholarDigital Library
Petros Drineas, Iordanis Kerenidis, and Prabhakar Raghavan. 2002. Competitive recommendation systems. In STOC. 82--90. Google ScholarDigital Library
Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, and David P. Woodruff. 2011. Fast approximation of matrix coherence and statistical leverage. CoRR abs/1109.3843.Google Scholar
Petros Drineas, Michael Mahoney, Malik Magdon-Ismail, and David P. Woodruff. 2012. Fast approximation of matrix coherence and statistical leverage. In ICML.Google Scholar
Petros Drineas and Michael W. Mahoney. 2005. Approximating a gram matrix for improved kernel-based learning. In COLT. 323--337. Google ScholarDigital Library
Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. 2006a. Sampling algorithms for &ell;₂ regression and applications. In SODA. 1127--1136.Google Scholar
Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. 2006b. Subspace sampling and relative-error matrix approximation: Column-based methods. In Approx-Random. 316--326.Google Scholar
Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. 2006c. Subspace sampling and relative-error matrix approximation: Column-row-based methods. In ESA. 304--314.Google Scholar
Petros Drineas, Michael W. Mahoney, S. Muthukrishnan, and Tamás Sarlós. 2011. Faster least squares approximation. Numerische Mathematik 117, 2, 217--249.Google ScholarDigital Library
Alan M. Frieze, Ravi Kannan, and Santosh Vempala. 2004. Fast Monte-Carlo algorithms for finding low-rank approximations. Journal of the ACM 51, 6, 1025--1041. Google ScholarDigital Library
Gene H. Golub and Charles F. van Loan. 1996. Matrix Computations (3rd ed.). Johns Hopkins University Press, Baltimore, MD. I--XXVII, 1--694 pages.Google Scholar
Uffe Haagerup. 1981. The best constants in the Khintchine inequality. Studia Mathematica 70, 3, 231--283. http://eudml.org/doc/218383.Google ScholarCross Ref
N. Halko, P.-G. Martinsson, and J. A. Tropp. 2009. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. ArXiv E-prints.Google Scholar
D. L. Hanson and F. T. Wright. 1971. A bound on tail probabilities for quadratic forms in independent random variables. Annals of Mathematical Statistics 42, 3, 1079--1083. Google ScholarCross Ref
William B. Johnson and Joram Lindenstrauss. 1984. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics 189--206. Google ScholarCross Ref
Daniel M. Kane and Jelani Nelson. 2010. A sparser Johnson-Lindenstrauss transform. CoRR abs/1012.1577.Google Scholar
Daniel M. Kane and Jelani Nelson. 2012. Sparser Johnson-Lindenstrauss transforms. In SODA. 1195--1206. Google ScholarCross Ref
Daniel M. Kane, Jelani Nelson, Ely Porat, and David P. Woodruff. 2011. Fast moment estimation in data streams in optimal space. In STOC. 745--754. Google ScholarDigital Library
Ravindran Kannan, Hadi Salmasian, and Santosh Vempala. 2008. The spectral method for general mixture models. SIAM Journal on Computing 38, 3, 1141--1156. Google ScholarDigital Library
Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 5, 604--632. Google ScholarDigital Library
D. G. Luenberger and Y. Ye. 2008. Linear and Nonlinear Programming. Vol. 116. Springer, Berlin.Google Scholar
Avner Magen and Anastasios Zouzias. 2011. Low rank matrix-valued Chernoff bounds and approximate matrix multiplication. In SODA. 1422--1436. Google ScholarCross Ref
Frank McSherry. 2001. Spectral partitioning of random graphs. In FOCS. 529--537. Google ScholarCross Ref
X. Meng and M. W. Mahoney. 2012. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. ArXiv E-prints.Google Scholar
X. Meng, M. A. Saunders, and M. W. Mahoney. 2011. LSRN: A parallel iterative solver for strongly over- or under-determined systems. ArXiv E-prints.Google Scholar
Gary L. Miller and Richard Peng. 2012. Iterative approaches to row sampling. CoRR abs/1211.2713 (2012).Google Scholar
Jelani Nelson and Huy L. Nguyen. 2012. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. CoRR abs/1211.1002 (2012).Google Scholar
Jelani Nelson and Huy L. Nguyen. 2013a. Lower bounds for oblivious subspace embeddings. CoRR abs/1308.3280, abs/1308.3280 (2013).Google Scholar
Jelani Nelson and Huy L. Nguyen. 2013b. Sparsity lower bounds for dimensionality reducing maps. In STOC. 101--110. Google ScholarDigital Library
Jelani Nelson and David P. Woodruff. 2010. Fast Manhattan sketches in data streams. In PODS. 99--110. Google ScholarDigital Library
Nam H. Nguyen, Thong T. Do, and Trac D. Tran. 2009. A fast and efficient algorithm for low-rank approximation of a matrix. In STOC. 215--224. Google ScholarDigital Library
Rasmus Pagh. 2013. Compressed matrix multiplication. ACM Transactions on Computation Theory 5, 3, 9.Google ScholarDigital Library
Christos H. Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, and Santosh Vempala. 2000. Latent semantic indexing: A probabilistic analysis. Journal of Computer System Sciences 61, 2, 217--235. Google ScholarDigital Library
Saurabh Paul, Christos Boutsidis, Malik Magdon-Ismail, and Petros Drineas. 2012. Random projections for support vector machines. CoRR abs/1211.6085.Google Scholar
Benjamin Recht. 2009. A simpler approach to matrix completion. CoRR abs/0910.0651.Google Scholar
M. Rudelson. 1999. Random vectors in the isotropic position. Journal of Functional Analysis 164, 1, 60--72. Google ScholarCross Ref
Mark Rudelson and Roman Vershynin. 2007. Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM 54, 4. Google ScholarDigital Library
Tamás Sarlós. 2006. Improved approximation algorithms for large matrices via random projections. In FOCS. 143--152. Google ScholarDigital Library
Mikkel Thorup and Yin Zhang. 2004. Tabulation based 4-universal hashing with applications to second moment estimation. In SODA. 615--624.Google Scholar
Lloyd N. Trefethen and David Bau. 1997. Numerical Linear Algebra. SIAM, Philadelphia, PA. I--XII, 1--361 pages. Google ScholarCross Ref
David P. Woodruff. 2014. Sketching as a tool for numerical linear algebra. Foundations and Trends in Theoretical Computer Science 10, 1--2, 1--157.Google ScholarDigital Library
David P. Woodruff and Qin Zhang. 2013. Subspace embeddings and LP regression using exponential random variables. In COLT.Google Scholar
Jiyan Yang, Xiangrui Meng, and Michael W. Mahoney. 2013. Quantile regression for large-scale applications. CoRR abs/1305.0087.Google Scholar
Anastasios Zouzias. 2011. A matrix hyperbolic cosine algorithm and applications. CoRR abs/1103.2793.Google Scholar
Anastasios Zouzias and Nikolaos M. Freris. 2012. Randomized extended Kaczmarz for solving least-squares. CoRR abs/1205.5770.Google Scholar

Index Terms

Low-Rank Approximation and Regression in Input Sparsity Time
1. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression
STOC '13: Proceedings of the forty-fifth annual ACM symposium on Theory of Computing

Low-distortion embeddings are critical building blocks for developing random sampling and random projection algorithms for common linear algebra problems. We show that, given a matrix A ∈ R^{n x d} with n >> d and a p ∈ [1, 2), with a constant probability, ...
Read More
Low rank approximation and regression in input sparsity time
STOC '13: Proceedings of the forty-fifth annual ACM symposium on Theory of Computing

We design a new distribution over poly(r ε^-1) x n matrices S so that for any fixed n x d matrix A of rank r, with probability at least 9/10, SAx₂ = (1 pm ε)Ax₂ simultaneously for all x ∈ R^d. Such a matrix S is called a subspace embedding. Furthermore, ...
Read More
Low-rank PSD approximation in input-sparsity time
SODA '17: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms

We give algorithms for approximation by low-rank positive semidefinite (PSD) matrices. For symmetric input matrix A ∈ ℝ^{n × n}, target rank k, and error parameter ε > 0, one algorithm finds with constant probability a PSD matrix Ỹ of rank k such that ||A −...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of the ACM Volume 63, Issue 6
February 2017
233 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/3038256
Editor:
Éva Tardos
Cornell University
Issue’s Table of Contents
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 January 2017
- Accepted: 1 September 2016
- Revised: 1 June 2015
- Received: 1 November 2013
Published in jacm Volume 63, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Matrices
approximation
randomized
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 120
  Total Citations
  View Citations
- 2,384
  Total Downloads
- Downloads (Last 12 months)504
- Downloads (Last 6 weeks)71
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Low-Rank Approximation and Regression in Input Sparsity Time

Journal of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression

Low rank approximation and regression in input sparsity time

Low-rank PSD approximation in input-sparsity time