Elsevier

Applied Numerical Mathematics

Volume 144, October 2019, Pages 140-150
Applied Numerical Mathematics

Reconstruction of jointly sparse vectors via manifold optimization

https://doi.org/10.1016/j.apnum.2019.05.022Get rights and content

Abstract

In this paper, we consider the challenge of reconstructing jointly sparse vectors from linear measurements. Firstly, we show that by utilizing the rank of the output data matrix we can reduce the problem to a full column rank case. This result reveals a reduction in the computational complexity of the original problem and enables a simple implementation of joint sparse recovery algorithms for full-rank setting. Secondly, we propose a new method for joint sparse recovery in the form of a non-convex optimization problem on a non-compact Stiefel manifold. In our numerical experiments our method outperforms the commonly used 2,1 minimization in the sense that much fewer measurements are required for accurate sparse reconstructions. We postulate this approach possesses the desirable rank aware property, that is, being able to take advantage of the rank of the unknown matrix to improve the recovery.

Introduction

The sparse sampling and recovery problem has been a subject of intensive research for the last two decades, especially after major advances in the field of compressed sensing, through the seminal works [5], [11]. In the classical problem, the goal is to find the vector xRN from the system Ax=y where ARM×N, under the assumption that the vector x is sparse, i.e., has only few non-zero entries. Here, N represents the dimensionality of the unknown vector x and M is the number of linear measurements performed on x. A typical difficulty in solving the problem stems from the fact that one often has only a limited number of measurements, i.e., M<N, so the system is underdetermined.

In the joint sparse recovery problem, also known as multiple measurements vectors (MMV), simultaneous sparse approximation, or collaborative sparse coding, we instead solve a matrix equation AX=Y where the unknown matrix XRN×K has only few non-zero rows. X can be viewed as a collection of K unknown vectors in RN which share the same sparsity support. In this case, it has been demonstrated [30], [14], [26] that recovery of multiple vectors simultaneously can take advantage of their joint sparsity, which helps reduce the number of measurements compared to individual reconstructing each at a time. The joint sparse recovery problem has applications in hyper-spectral imaging [7], [13], [35], [37], source localization [2], [25] and many other fields. In these problems, the joint sparse recovery is often conducted under additional positivity constraints, which are out of the scope of this paper. Our proposed method therefore is not applicable directly, but is an important block towards these applications. Another example is the CLSUnSAL algorithm [19] for hyperspectral unmixing, which is based on a joint sparse regularization.

In this paper, we first utilize the rank of the output data matrix Y to reduce the problem to the case where unknown matrix has full column rank. More specifically, we prove the following theorem (Section 3).

Theorem 1

Denote by Σs,r the set of all matrices in RN×K of rank at least r and having at most s non-zero rows (s-row sparse). Let ARM×N be such that the induced map ZAZ is injective on Σs,r, XRN×K be s-row sparse, Y=AX and rank(Y)=r. Then

  • 1.

    Y can be written as a product Y=VU where VRM×r with rank(V)=r and URr×K satisfying UUT=Ir where Ir is the identity matrix in Rr×r.

  • 2.

    The problemarg minZRN×rZ0s.t.AZ=V has a unique solution WRN×r. This solution is s-row sparse and has full column rank.

  • 3.

    The original matrix X can be computed by X=WU.

Here, Z0 is the number of non-zero rows of matrix Z.

This result helps to reduce the computational complexity of the original problem and allows a simple implementation of joint sparse recovery algorithms for full-rank setting such as the MUSIC algorithm [30], [14], [15] to the rank deficient case. We note that a similar reduction approach has been presented in [26, Section III] and [25]. However, in that work, the authors split the problem into two parts: first finding the support by solving the reduced problem, and then finding the unknown X using the knowledge of the support, whereas we have more complete result utilizing the rank.

As a relaxation of the minimization problem (1) in the theorem above, we propose a new method for the recovery of jointly sparse vectors in the form of a non-convex optimization problem on the non-compact Stiefel manifold:arg minZRN×r,rank(Z)=rZ(ZTZ)122,1 s.t. AZ=V. For the definition of 2,1 norm, see (5). Our method generalizes the 1/2 minimization method studied in [12], [17], [21], [29], [36]. We demonstrate in our numerical examples that it outperforms the 2,1 minimization commonly used to solve the joint sparse recovery problem in the sense that much fewer measurements are required for accurate sparse reconstructions.

For numerical examples, we consider the unconstrained version of (2)arg minZRN×r,rank(Z)=rZ(ZTZ)122,1+λ2AZV2F, where F is the Frobenius norm. Note that (3) is the Lagrangian form of the equivalent problemarg minZRN×r,rank(Z)=rZ(ZTZ)122,1 s.t. AZVF2ϵ for some other parameter ϵ>0, that measures the noise level in the measurements. Just like the ϵ, the choices of tuning parameter λ>0 are problem dependent, with small values of ϵ corresponding to large values of λ, and often determined empirically (see [9], [23] for more detailed discussion). By choosing large λ, ensures equation (3) gives the same solution as (2). Currently, there are several packages available for manifold optimization (see [18] for a detailed list). We give preference to the Pymanopt package for its automatic differentiation capabilities [31]. It is the Python version of the original Manopt package [4], [34] written for MATLAB. We also smooth out the 2,1 norm with the Huber function to be able to apply the conjugate gradient algorithm in Pymanopt. For users of the Manopt package, calculation of derivatives is needed, and we will explicitly compute the Euclidean gradient of the objective function in (3) as well as the gradient of its Huber regularized version.

The paper is organized as follows. For the rest of Section 1, we review the single sparse and joint sparse recovery problems. Section 2 provides background results on the rank-awareness of the 0 minimization problem for matrices. Section 3 discusses a simple strategy to reduce the rank deficient problems to the full-rank setting. We relax the 0 minimization to a manifold optimization problem in Section 4. Section 5 discusses the Huber regularization. Calculation of the derivative of our objective functions will be presented in Section 6. Finally, numerical example is given in Section 7.

Many signals of interest have (almost) sparse representation in some basis. For instance, images typically have sparse representations in wavelet bases, digital audio signals have sparse representation in Gabor systems, etc. Let xRN be the coefficients of the signal in such a basis and yRm be a vector of linear measurements of the signal. Via the Riesz theorem, the relationship between x and y can be written as y=Ax where the matrix A can be explicitly computed.

In classical sparse sampling and recovery problem, the goal is to find the vector xRN from the system Ax=y where ARM×N, under the assumption that the vector x is sparse. Even though the number s of non-zero entries in x can be much smaller than the dimension N, the indices of these entries are unknown, and directly solving the system by first finding the non-zero entry indices of x is an NP-hard problem [27]. Other methods have been suggested for solving it, one of the most popular is the 1 minimization problemarg minzRNz1 s.t. Az=y, or its unconstrained versionarg minzRNz1+λ2Azy22.

It has been proved that under some conditions on the matrix A, specifically the null space property (NSP) and the restricted isometry property (RIP), these methods are able to recover the vector x exactly and stably.

In the joint sparse recovery problem we want to solve a matrix equation of the formAX=Y,ARM×N,XRN×K and YRM×K, given that XRN×K is a s-row sparse matrix, i.e., if X=(X[1],,X[N])T then X[i]'s are all zero for all indices i's outside some subset of {1,,N} of size s.

Our goal is to find a method to exactly recover X from (4), where the matrix ARM×N is chosen with small M. One expects that when increasing the number K of unknown jointly sparse vectors, a high rank of X will result in a reduction of M which corresponds to the number of linear measurements done on the vectors. Define the 2,1 norm of XX2,1=n=1NX[n]2. Similar to the case of single vector recovery, the following convex minimization problem has been widely used in place of directly solving for sparsest solution of (4)arg minZRN×KZ2,1 s.t. AZ=Y. In the worst case (uniform recovery) scenario, however, this method does not allow measurement reduction when the number of vectors is increased. As the next theorem shows, null space property is a necessary and sufficient condition for both the single sparse recovery via 1 minimization and the joint sparse recovery via 2,1 minimization. Specifically, if A fails to recover an s-sparse vector with 1 minimization, then it will fail to recover some s-row sparse matrix via (6), no matter how high the rank of the unknown matrix is. As such, the 2,1 minimization is known to be a rank blind algorithm.

Theorem 2 [22], [10]

Given a matrix ARM×N, every s-row sparse matrix XRN×K can be exactly recovered from (6) if and only if A satisfies the NSP:xI1<xIc1,xker(A){0},I{1,,N},with|I|s, where xI is the vector that we get after nullifying all the entries of x except the ones with indexes in I. Furthermore, if there exists an xker(A) such thatxI1>xIc1,|I|s, then, for any 1rmin{s,K}, there is an s-row sparse matrix XRN×K with rank(X)=r which cannot be recovered from (6).

There have been several efforts to exploit the rank of the unknown matrix and design rank aware algorithms to improve the joint sparse recovery results. Examples of such algorithms include those based on MUSIC [15], [24] and orthogonal matching pursuit [10]. On the other hand, we are not aware of available rank aware algorithms based on (convex) functional optimization, CoSaMP or thresholding approaches.

One may think of resolving the rank blindness issue of 2,1 minimization by replacing the 2,1 norm with a different functional, i.e., consider the problemarg minZRN×Kϕ(Z) s.t. AZ=Y where ϕ is a convex functional in RN×K. However, as we point out below, no such convex functional like 2,1 will be rank aware, and we may need to change the geometry or look into non-convex optimization for such desirable property.

First, if (7) has a unique solution for every Y=AX with XΣs,r then, for any rrs and XΣs,r, it has a unique solution too. Hence (7) is rank aware if it allows unique recovery in the class Σs,r (rs) and fails to recover a matrix X0 in Σs,r for some r<r. Observe that, every XΣs,r is the unique solution of (7) if and only ifϕ(X+W)>ϕ(X),XΣs,r,W(ker(A))K{0}. And (7) strongly fails to recover an X0Σs,r if and only if there exists a W0(ker(A))K{0} such that ϕ(X0+W0)<ϕ(X0). However, by the continuity of ϕ and the fact that Σs,r is dense in Σs,r when rr, (8) implies that ϕ(X0+W0)ϕ(X0), a contradiction. As a result, (7) cannot recover any matrix in Σs,r while strongly fail to recover in the class Σs,r for rr.

In this paper, we utilize the rank of Y to extract information about X and reduce the problem to a smaller dimensional problem in full-rank setting. As a first step towards a rank aware optimization approach for uniform recovery, we develop a new non-convex optimization method on manifold, which achieves a significant reduction in the number of measurements in joint sparse recovery, compared to 2,1 minimization.

Section snippets

Preliminary results

This section revisits the 0 minimization for matrices, based on results in [6], [10]. We present a uniqueness condition for the sparse recovery on Σs,r, that is, the map ZAZ is injective on Σs,r. Before we characterize this condition, let us make a few observations.

Definition 1

The spark of the matrix A, denoted by spark(A), is the smallest number j such that there exists j linearly dependent columns in A.

Note that any spark(A)1 columns of A are linearly independent.

Proposition 1

If A:Σs,rRM×K is injective then for

Reduction to full-rank setting

We present a simple way to transform the matrix equation to be solved into one where the unknown is full-rank matrix. At the expense of an additional decomposition of output data, the benefit of this practice is twofold. First, we reduce the dimension of the matrix equation and now find solutions in a potentially much lower dimensional space RN×r, compared to the original one in RN×K. More importantly, this step allows us to apply and exploit the strength of several joint sparse recovery

Non-convex manifold optimization for joint sparse recovery

In this section, we present a novel relaxation to 0 minimization in form of an optimization problem on manifold.

Consider problem (10). As known from the previous section that its solution must have full rank, we can restrict this problem to the space of all full-rank matrices in RN×r. This adds a rank constraint to the joint sparsity recovery, and our idea is to recast the optimization problem with full-rank condition in the Euclidean space into one on a manifold that encodes this constraint

Huber regularization

We consider the following unconstrained problem, which produces the same solution as (11) for large λ:arg minZS(N,r)Z(ZTZ)122,1+λ2AZVF2. Satisfactory choices of λ are problem dependent and often determined empirically, as have been done herein. With notice that 2,1 norm is non-smooth, we further relax 2,1 with Huber regularization to be able to apply manifold optimization methods, which have been mostly developed for smooth optimization. In particular, we replace the X[n]2 with Hδ(

Explicit formulae for the Euclidean gradients of cost functions

The Euclidean gradients are the fundamental building block for the optimization of cost functions on manifolds. Such derivatives are often taxing to calculate, posing a major hurdle for implementing manifold optimization methods for many problems, especially where large matrices or vectors are involved. Interestingly, for our objective functions (12) and (13), it is possible to derive the exact, explicit formulae for their gradients. Expectedly, these formulae save the users a considerable

Numerical experiments with Gaussian data and measurements

We use the Pymanopt package [31], which is the Python version of the original Manopt package [4], [34] for MATLAB, in these numerical tests. The conjugate gradient method with backtracking Armijo line-search provided in Pymanopt is used to solve (13). For automatic differentiation, we employ the Autograd package. In each iteration, an r×r SVD is computed. For sparse recovery, since rs, this step is not numerically expensive. The initial value is chosen randomly.

In this test we compare the

Conclusion and future work

We study the joint sparse recovery problem. Our result here allows to reduce the problem to a potentially much smaller dimensional problem where the unknown matrix has full column rank. We also offer a new functional minimization method on the manifold of full-rank matrices for the recovery of the unknown matrix. As pointed out in Section 1.2, we cannot expect to have a rank sensitive convex optimization method on the Euclidean space, so changing the geometry and looking into manifold

Declaration of Competing Interest

None.

Acknowledgements

We want to thank Richard Barnard (Western Washington University) for suggesting the use of Huber regularization and Carola-Bibiane Schönlieb (University of Cambridge) for pointing out the Γ-convergence argument for Huber regularization to us. We also thank the anonymous referee for valuable suggestions.

This work is supported by the National Science Foundation, Division of Mathematical Sciences, Computational Mathematics program under contract number DMS1620280; the U.S. Department of Energy,

References (37)

  • M.J. Lai et al.

    The null space property for sparse recovery from multiple measurement vectors

    Appl. Comput. Harmon. Anal.

    (2011)
  • Z. Wang et al.

    Joint sparse model-based discriminative k-svd for hyperspectral image classification

    Signal Process.

    (2017)
  • F. Zhu et al.

    Structured sparse method for hyperspectral unmixing

    ISPRS J. Photogramm. Remote Sens.

    (2014)
  • P.A. Absil et al.

    Optimization Algorithms on Matrix Manifolds

    (2009)
  • E. Van den Berg et al.

    Sparse optimization with least-squares constraints

    SIAM J. Optim.

    (2011)
  • E. van den Berg et al.

    Probing the Pareto frontier for basis pursuit solutions

    SIAM J. Sci. Comput.

    (2008)
  • N. Boumal et al.

    Manopt, a Matlab toolbox for optimization on manifolds

    J. Mach. Learn. Res.

    (2014)
  • E. Candès et al.

    Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information

    IEEE Trans. Inf. Theory

    (2006)
  • J. Chen et al.

    Theoretical results on sparse representations of multiple-measurement vectors

    IEEE Trans. Signal Process.

    (2006)
  • Y. Chen et al.

    Hyperspectral image classification using dictionary-based sparse representation

    IEEE Trans. Geosci. Remote Sens.

    (2011)
  • G. Dal Maso

    An Introduction to Γ-Convergence, vol. 8

    (2012)
  • A.S. Dalalyan et al.

    On the prediction performance of the lasso

    Bernoulli

    (2017)
  • M.E. Davies et al.

    Rank awareness in joint sparse recovery

    IEEE Trans. Inf. Theory

    (2012)
  • D.L. Donoho

    Compressed sensing

    IEEE Trans. Inf. Theory

    (2006)
  • E. Esser et al.

    A method for finding structured sparse solutions to non-negative least squares problems with applications

    SIAM J. Imaging Sci.

    (2013)
  • L. Fang et al.

    Spectral–spatial hyperspectral image classification via multiscale adaptive sparse representation

    IEEE Trans. Geosci. Remote Sens.

    (2014)
  • P. Feng

    Universal Minimum-Rate Sampling and Spectrum-Blind Reconstruction for Multiband Signals

    (1998)
  • P. Feng et al.

    Spectrum-blind minimum-rate sampling and reconstruction of multiband signals

    Proc. ICASSP

    (1996)
  • Cited by (7)

    View all citing articles on Scopus
    View full text