Elsevier

Signal Processing

Volume 153, December 2018, Pages 311-320
Signal Processing

The adaptive block sparse PCA and its application to multi-subject FMRI data analysis using sparse mCCA

https://doi.org/10.1016/j.sigpro.2018.07.021Get rights and content

Abstract

Motivated by the problem of multi-subject functional magnetic resonance imaging (fMRI) data sets analysis using multiple-set canonical correlation analysis (mCCA), in this paper we propose a new variant of the principal component analysis (PCA) method, namely the adaptive block sparse PCA. It has the advantage to produce modified principal components with block sparse loadings. It is derived using penalized rank one matrix approximation where the penalty is introduced in the minimization problem to promote block sparsity of the loading vectors. An efficient algorithm is proposed for its computation. The effectiveness of the proposed method is illustrated on the problem of multi-subject fMRI data sets analysis using mCCA which is a generalization of canonical correlation analysis (CCA) to three or more sets of variables. This application is obtained by deriving the connection between mCCA and the singular value decomposition (SVD).

Introduction

Principal component analysis (PCA), also known as Karhunen–Loeve expansion is a classical feature extraction and data dimension reduction technique widely used in areas of signal and image processing, medical imaging and pattern recognition. It has for example been used in atmospheric radar signal processing to estimate Doppler frequencies [1] and other areas such as automatic speech recognition [2], wireless sensor networks [3], face recognition [4], dimension reduction for canonical correlation analysis [5], fault detection [6], hyperspectral imaging [7], speech emotion recognition [8] and functional magnetic resonance imaging (fMRI) data analysis [9]. As a linear statistical technique PCA cannot accurately describe all types of structures in a given dataset, especially nonlinear structures. To this end, kernel principal component analysis (KPCA) [10] has been proposed as a nonlinear extension of PCA. Autoencoders (also called autoassociators in the literature) [11], which belong to a special family of dimensionality reduction methods implemented using artificial neural networks, can also be used for nonlinear dimensionality reduction. When using affine encoder and decoder without any nonlinearity and a squared error loss, the autoencoder is essentially a PCA as showed in [12]. These latter approaches are however beyond the scope of this paper and will therefore not be further mentioned. PCA achieves its goal by constructing a sequence of orthogonal linear combinations of the original variables, called the principal components (PC), that have maximum variance. The associated vectors of coefficients of the linear combinations are the PC loadings. The idea of dimension reduction using PCA is based on the fact that the first few PCs might retain most of the total variability as described by the data. The number of important principal components to retain can be determined by studying their contribution to explaining the total variability or using a model selection criterion [13], [14].

Despite its efficacy and popularity in image processing applications, PCA as a data-driven technique is known to suffer from two major limitations. First, the PCs are usually linear combinations of all the original variables with nonzero loadings or coefficients. This not only incorporates unnecessary noise but also makes it difficult to interpret the obtained PCs without using subjective judgement, especially when the number of variables is large. Second, PCA treats all the variables equally, and thus it may not be well-suited for some problems, in which some variables or group of variables are more important than others. In recent years, variants of PCA that facilitate interpretation have been introduced. Such variants are obtained by generating a sparse loading vector and include the SCoTLASS [15] that imposes an ℓ1 penalty on the ordinary PCA loadings, the sparse PCA method [16] obtained by reformulating the PCA as regression problem and imposing a lasso type penalty on the regression coefficients or the sparse PCA method obtained via penalized matrix decomposition [17], [18]. All these variants however, consider interpretation at the variable level only. In a number of application areas, the variables form natural known blocks or groups and interpretability is sought at the group level. The PCA variants mentioned above are not suited in such cases because they ignore the variables group structure and the resulting principal components do not provide good interpretability.

In this paper we make three contributions. First, we propose a new variant of PCA geared towards exploring a known variables block structure and producing sparse modified principal components whose loadings are group sparse, thus significantly aiding interpretation of the results. We refer to this method as block sparse PCA (BSPCA). It is obtained using the link between PCA and low rank matrix approximation via the singular value decomposition (SVD) where a block sparse type penalty [19], [20] is introduced in the minimization problem to promote block sparsity of the PC loadings. Furthermore, instead of penalizing the different loadings block similarly as in [20], we propose to apply an adaptive penalization approach and allow different amounts of shrinkage for the different blocks of the loading vector [21], [22]. Second, an efficient iterative algorithm involving simple linear regression and block thresholding steps is proposed for its computation.

Data-driven methods have been successfully applied to fMRI data analysis. Among the justification for their suitability for fMRI data analysis is the minimization of the assumptions on the underlying structure of the problem. These methods mainly try to decompose the observed data based on a factor model and a specific constraint. The maximum correlation is one of these constraint which leads to canonical correlation analysis (CCA) [23]. CCA has been successfully used for fMRI data analysis. It has for example been used to find latent sources in single subject fMRI data by taking advantage of the spatial or temporal autocorrelation in the data [24] or improve the specificity and sensitivity of dictionary learning methods for fMRI by accounting for the autocorrelation in the fMRI signals [25], [26]. Its extension to multiple data sets, termed multiset canonical correlation analysis (mCCA) [27] has also successfully been used in association with other methods for the analysis of multiple fMRI data sets. It has for example been successfully used in conjunction with dictionary learning for multi-subject fMRI data analysis in [28] and in conjunction with ICA to maintain the correspondence of the source estimates across different subjects in [29], [30], [31].

The new PCA variant introduced above is motivated by the problem of multi-subject fMRI data analysis. When working with multi-subjects fMRI data sets, the general canonical components obtained using the standard mCCA involves all the individual subjects data sets as mCCA does not perform group variable or subject selection. Similar to the single variable case, this makes it difficult to interpret these components without using subjective judgment. Third, to ease this drawback of mCCA, a new method called block sparse mCCA is also introduced in this paper [32]. This method is a direct application of the proposed block sparse PCA algorithm. It is derived by considering mCCA under specific constraints and rewriting it as PCA problem [27], [33], [34]. Instead of treating all the block of variables or subjects equally, the proposed mCCA algorithm ignores the irrelevant or useless subjects and finds linear combinations of block variables (block of loadings) such that the block that are assumed to be connected are highly correlated.

The rest of the paper is organized as follows. In the next section we review both the PCA method and some sparse PCA methods. The proposed block sparse PCA (BSPCA) method is introduced in Section 3 with some theoretical properties. Its application to the problem of block sparse mCCA is described in section 4 after a review of both CCA and mCCA. Section 5 is dedicated to both simulation and a real data fMRI data experiments. Concluding remarks are given in section 6. All technical details are presented in the Appendices.

Section snippets

PCA and sparse PCA

Let X=(x1,,xn) denote an n × p data matrix of rank q ≤ min (n, p), where n is the number of data sample, p the number of variables and xi is a p × 1 vector of variables associated to the ith data sample. For the rest of the paper, the columns are assumed centered and cov(xi)=Σ, a positive definite matrix of size p × p, which can be decomposed as Σ=i=1qλiaiaiwhere λi is the ith largest eigenvalue of Σ and ai=(ai1,,aip) is the associated eigenvector. As indicated above, PCA reduces the

Block sparse PCA

This section describes the proposed adaptive block sparse PCA method. The focus is put on the extraction of the first block sparse loading and principal component vectors. The subsequent vectors are obtained by applying the same method to the deflated matrix.

The key strength of the ℓ1-norm used for the sparse PCA methods [15], [16], [17], [18] lies in its ability to simultaneously select and estimate the significant elements of the loading vectors. As in our motivated application, there are

Application to mCCA: block sparse mCCA

CCA is a standard approach for studying the relationships between two sets of random variables and a number of variations have been proposed to deal with the problem of interpretability in CCA particularly when the number of variables exceeds the number of observations [18], [40]. MCCA is a generalization of CCA to three or more sets of variable [27] and can reduce to CCA when the number of variable in each set is one. Its aim is to extract the linear relationships between several sets of

Experimental results

In this section, effectiveness of the proposed block sparse mCCA (BSmCCA) as compared to generic mCCA method is highlighted using simulated as well as real multi-subject fMRI datasets. Consider an fMRI dataset YRn×N with n time points per voxel and N voxels. When a data driven method is used to decompose Y using a factor model and some constraints, they decompose the said dataset into two sub matrices; i.e. Y=GH. Here the columns of matrix G contain the most dominant neural temporal dynamics

Conclusion

A new PCA variant, named the adaptive block sparse PCA is proposed in this paper. It is based on penalized SVD where the penalty is introduced in the minimization problem to promote block sparsity of the loading vectors. Furthermore, it has the property of reducing to PCA without the block sparsity constraint. The efficient alternating algorithm proposed for its computation is obtained by exploiting the biconvexity nature of the proposed penalized rank one approximation criterion. Besides its

Acknowledgment

This work was supported by the Australian Research Council through Grant FT. 130101394.

References (44)

  • D.M. Witten et al.

    A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis

    Biostatistics

    (2009)
  • H. Hotelling

    Relations between two sets of variables

    Biometrika

    (1936)
  • M.U. Khalid et al.

    Improving functional connectivity detection in fMRI by combining sparse dictionary learning and canonical correlation analysis

    Proceedings of IEEE International Symposium on Biomedical Imaging

    (2013)
  • M.A. Lindquist

    The statistical analysis of FMRI data

    Stat. Sci.

    (2008)
  • D.U.M. Rao et al.

    Atmospheric radar signal processing using principal component analysis

    Digit. Signal. Process.

    (2014)
  • B. Scholkopf et al.

    Nonlinear component analysis as kernel eigenvalue problem

    Neural Comput.

    (1998)
  • D. Rumelhart et al.

    Learning internal representations by error propagation

    Parallel Distributed Processing

    (1986)
  • P. Baldi et al.

    Neural networks and principal component analysis: learning from examples without local minima

    Neural Netw.

    (1989)
  • I.T. Jolliffe et al.

    A modified principal component technique based on the LASSO

    J. Comput. Graph. Stat.

    (2003)
  • H. Zou et al.

    Sparse principal component analysis

    J. Comput. Graph. Stat.

    (2006)
  • H. Shen et al.

    Sparse principal component analysis via regularized low rank matrix approximation

    J. Multivar. Anal.

    (2008)
  • S. Bakin

    Adaptive regression and model selection in data mining problems (Ph.D. thesis)

    (1999)
  • Cited by (12)

    • Target-aware and spatial-spectral discriminant feature joint correlation filters for hyperspectral video object tracking

      2022, Computer Vision and Image Understanding
      Citation Excerpt :

      To handle the obstacle of ignoring the global structure information in clustering-based methods, Zhai et al. (2019) proposed a novel low-rank subspace clustering model. For feature extraction, there are some typical methods, such as principal component analysis (PCA) (Seghouane and Iqbal, 2018), linear discriminant analysis (LDA) (Du, 2007) and neighborhood preserving embedding (NPE) (He et al., 2005). Li et al. (2020b) described the spatial information in HSI from both local and global perspectives.

    • Self-adaptive manifold discriminant analysis for feature extraction from hyperspectral imagery

      2020, Pattern Recognition
      Citation Excerpt :

      To overcome this issue, a large number of dimension reduction (DR) methods have been proposed for the purpose of band number reduction and extract the valuable intrinsic information in HSI [9,10]. Principal component analysis (PCA) [11] and independent component analysis (ICA) [12] are two typical DR methods that are widely used. PCA aims at preserving the maximum variance during projection, while ICA tends to find the statistical-independent components.

    View all citing articles on Scopus
    View full text