The adaptive block sparse PCA and its application to multi-subject FMRI data analysis using sparse mCCA
Introduction
Principal component analysis (PCA), also known as Karhunen–Loeve expansion is a classical feature extraction and data dimension reduction technique widely used in areas of signal and image processing, medical imaging and pattern recognition. It has for example been used in atmospheric radar signal processing to estimate Doppler frequencies [1] and other areas such as automatic speech recognition [2], wireless sensor networks [3], face recognition [4], dimension reduction for canonical correlation analysis [5], fault detection [6], hyperspectral imaging [7], speech emotion recognition [8] and functional magnetic resonance imaging (fMRI) data analysis [9]. As a linear statistical technique PCA cannot accurately describe all types of structures in a given dataset, especially nonlinear structures. To this end, kernel principal component analysis (KPCA) [10] has been proposed as a nonlinear extension of PCA. Autoencoders (also called autoassociators in the literature) [11], which belong to a special family of dimensionality reduction methods implemented using artificial neural networks, can also be used for nonlinear dimensionality reduction. When using affine encoder and decoder without any nonlinearity and a squared error loss, the autoencoder is essentially a PCA as showed in [12]. These latter approaches are however beyond the scope of this paper and will therefore not be further mentioned. PCA achieves its goal by constructing a sequence of orthogonal linear combinations of the original variables, called the principal components (PC), that have maximum variance. The associated vectors of coefficients of the linear combinations are the PC loadings. The idea of dimension reduction using PCA is based on the fact that the first few PCs might retain most of the total variability as described by the data. The number of important principal components to retain can be determined by studying their contribution to explaining the total variability or using a model selection criterion [13], [14].
Despite its efficacy and popularity in image processing applications, PCA as a data-driven technique is known to suffer from two major limitations. First, the PCs are usually linear combinations of all the original variables with nonzero loadings or coefficients. This not only incorporates unnecessary noise but also makes it difficult to interpret the obtained PCs without using subjective judgement, especially when the number of variables is large. Second, PCA treats all the variables equally, and thus it may not be well-suited for some problems, in which some variables or group of variables are more important than others. In recent years, variants of PCA that facilitate interpretation have been introduced. Such variants are obtained by generating a sparse loading vector and include the SCoTLASS [15] that imposes an ℓ1 penalty on the ordinary PCA loadings, the sparse PCA method [16] obtained by reformulating the PCA as regression problem and imposing a lasso type penalty on the regression coefficients or the sparse PCA method obtained via penalized matrix decomposition [17], [18]. All these variants however, consider interpretation at the variable level only. In a number of application areas, the variables form natural known blocks or groups and interpretability is sought at the group level. The PCA variants mentioned above are not suited in such cases because they ignore the variables group structure and the resulting principal components do not provide good interpretability.
In this paper we make three contributions. First, we propose a new variant of PCA geared towards exploring a known variables block structure and producing sparse modified principal components whose loadings are group sparse, thus significantly aiding interpretation of the results. We refer to this method as block sparse PCA (BSPCA). It is obtained using the link between PCA and low rank matrix approximation via the singular value decomposition (SVD) where a block sparse type penalty [19], [20] is introduced in the minimization problem to promote block sparsity of the PC loadings. Furthermore, instead of penalizing the different loadings block similarly as in [20], we propose to apply an adaptive penalization approach and allow different amounts of shrinkage for the different blocks of the loading vector [21], [22]. Second, an efficient iterative algorithm involving simple linear regression and block thresholding steps is proposed for its computation.
Data-driven methods have been successfully applied to fMRI data analysis. Among the justification for their suitability for fMRI data analysis is the minimization of the assumptions on the underlying structure of the problem. These methods mainly try to decompose the observed data based on a factor model and a specific constraint. The maximum correlation is one of these constraint which leads to canonical correlation analysis (CCA) [23]. CCA has been successfully used for fMRI data analysis. It has for example been used to find latent sources in single subject fMRI data by taking advantage of the spatial or temporal autocorrelation in the data [24] or improve the specificity and sensitivity of dictionary learning methods for fMRI by accounting for the autocorrelation in the fMRI signals [25], [26]. Its extension to multiple data sets, termed multiset canonical correlation analysis (mCCA) [27] has also successfully been used in association with other methods for the analysis of multiple fMRI data sets. It has for example been successfully used in conjunction with dictionary learning for multi-subject fMRI data analysis in [28] and in conjunction with ICA to maintain the correspondence of the source estimates across different subjects in [29], [30], [31].
The new PCA variant introduced above is motivated by the problem of multi-subject fMRI data analysis. When working with multi-subjects fMRI data sets, the general canonical components obtained using the standard mCCA involves all the individual subjects data sets as mCCA does not perform group variable or subject selection. Similar to the single variable case, this makes it difficult to interpret these components without using subjective judgment. Third, to ease this drawback of mCCA, a new method called block sparse mCCA is also introduced in this paper [32]. This method is a direct application of the proposed block sparse PCA algorithm. It is derived by considering mCCA under specific constraints and rewriting it as PCA problem [27], [33], [34]. Instead of treating all the block of variables or subjects equally, the proposed mCCA algorithm ignores the irrelevant or useless subjects and finds linear combinations of block variables (block of loadings) such that the block that are assumed to be connected are highly correlated.
The rest of the paper is organized as follows. In the next section we review both the PCA method and some sparse PCA methods. The proposed block sparse PCA (BSPCA) method is introduced in Section 3 with some theoretical properties. Its application to the problem of block sparse mCCA is described in section 4 after a review of both CCA and mCCA. Section 5 is dedicated to both simulation and a real data fMRI data experiments. Concluding remarks are given in section 6. All technical details are presented in the Appendices.
Section snippets
PCA and sparse PCA
Let denote an n × p data matrix of rank q ≤ min (n, p), where n is the number of data sample, p the number of variables and is a p × 1 vector of variables associated to the ith data sample. For the rest of the paper, the columns are assumed centered and a positive definite matrix of size p × p, which can be decomposed as where λi is the ith largest eigenvalue of Σ and is the associated eigenvector. As indicated above, PCA reduces the
Block sparse PCA
This section describes the proposed adaptive block sparse PCA method. The focus is put on the extraction of the first block sparse loading and principal component vectors. The subsequent vectors are obtained by applying the same method to the deflated matrix.
The key strength of the ℓ1-norm used for the sparse PCA methods [15], [16], [17], [18] lies in its ability to simultaneously select and estimate the significant elements of the loading vectors. As in our motivated application, there are
Application to mCCA: block sparse mCCA
CCA is a standard approach for studying the relationships between two sets of random variables and a number of variations have been proposed to deal with the problem of interpretability in CCA particularly when the number of variables exceeds the number of observations [18], [40]. MCCA is a generalization of CCA to three or more sets of variable [27] and can reduce to CCA when the number of variable in each set is one. Its aim is to extract the linear relationships between several sets of
Experimental results
In this section, effectiveness of the proposed block sparse mCCA (BSmCCA) as compared to generic mCCA method is highlighted using simulated as well as real multi-subject fMRI datasets. Consider an fMRI dataset with n time points per voxel and N voxels. When a data driven method is used to decompose Y using a factor model and some constraints, they decompose the said dataset into two sub matrices; i.e. . Here the columns of matrix G contain the most dominant neural temporal dynamics
Conclusion
A new PCA variant, named the adaptive block sparse PCA is proposed in this paper. It is based on penalized SVD where the penalty is introduced in the minimization problem to promote block sparsity of the loading vectors. Furthermore, it has the property of reducing to PCA without the block sparsity constraint. The efficient alternating algorithm proposed for its computation is obtained by exploiting the biconvexity nature of the proposed penalized rank one approximation criterion. Besides its
Acknowledgment
This work was supported by the Australian Research Council through Grant FT. 130101394.
References (44)
- et al.
The integration of principal component analysis and cepstral mean subtraction in parallel model combination for robust speech recognition
Digit. Signal. Process.
(2011) - et al.
Distributed adaptive estimation of covariance matrix eigenvectors in wireless sensor networks with application to distributed pca
Signal. Process.
(2014) - et al.
Curvelet based face recognition via dimension reduction
Signal. Process.
(2009) - et al.
Canonical correlation analysis of high-dimensional data with very small sample support
Signal. Process.
(2016) - et al.
Incipient fault detection and diagnosis based on Kullback–Leibler divergence using principal component analysis: part i
Signal. Process.
(2014) - et al.
A super-resolution reconstruction algorithm for hyperspectral images
Signal. Process.
(2012) - et al.
Speech emotion recognition: features and classification models
Digit. Signal. Process.
(2012) - et al.
Principal component analysis of the dynamic response measured by fMRI: a generalized linear system framework
Magn. Reson. Imaging
(1999) - et al.
Bayesian estimation of the number of principal components
Signal Process.
(2007) - et al.
Dimension estimation in noisy PCA with SURE and random matrix theory
IEEE Trans. Signal Process.
(2008)
A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis
Biostatistics
Relations between two sets of variables
Biometrika
Improving functional connectivity detection in fMRI by combining sparse dictionary learning and canonical correlation analysis
Proceedings of IEEE International Symposium on Biomedical Imaging
The statistical analysis of FMRI data
Stat. Sci.
Atmospheric radar signal processing using principal component analysis
Digit. Signal. Process.
Nonlinear component analysis as kernel eigenvalue problem
Neural Comput.
Learning internal representations by error propagation
Parallel Distributed Processing
Neural networks and principal component analysis: learning from examples without local minima
Neural Netw.
A modified principal component technique based on the LASSO
J. Comput. Graph. Stat.
Sparse principal component analysis
J. Comput. Graph. Stat.
Sparse principal component analysis via regularized low rank matrix approximation
J. Multivar. Anal.
Adaptive regression and model selection in data mining problems (Ph.D. thesis)
Cited by (12)
Target-aware and spatial-spectral discriminant feature joint correlation filters for hyperspectral video object tracking
2022, Computer Vision and Image UnderstandingCitation Excerpt :To handle the obstacle of ignoring the global structure information in clustering-based methods, Zhai et al. (2019) proposed a novel low-rank subspace clustering model. For feature extraction, there are some typical methods, such as principal component analysis (PCA) (Seghouane and Iqbal, 2018), linear discriminant analysis (LDA) (Du, 2007) and neighborhood preserving embedding (NPE) (He et al., 2005). Li et al. (2020b) described the spatial information in HSI from both local and global perspectives.
Self-adaptive manifold discriminant analysis for feature extraction from hyperspectral imagery
2020, Pattern RecognitionCitation Excerpt :To overcome this issue, a large number of dimension reduction (DR) methods have been proposed for the purpose of band number reduction and extract the valuable intrinsic information in HSI [9,10]. Principal component analysis (PCA) [11] and independent component analysis (ICA) [12] are two typical DR methods that are widely used. PCA aims at preserving the maximum variance during projection, while ICA tends to find the statistical-independent components.
Adaptive complex-valued dictionary learning: Application to fMRI data analysis
2020, Signal ProcessingMultiview Regularized Discriminant Canonical Correlation Analysis: Sequential Extraction of Relevant Features From Multiblock Data
2023, IEEE Transactions on CyberneticsAdaptive Generalized Multi-View Canonical Correlation Analysis for Incrementally Update Multiblock Data
2023, IEEE Transactions on Knowledge and Data Engineering