Modeling Differences in the Dimensionality of Multiblock Data by Means of Clusterwise Simultaneous Component Analysis

De Roover, Kim; Ceulemans, Eva; Timmerman, Marieke E.; Nezlek, John B.; Onghena, Patrick

doi:10.1007/s11336-013-9318-4

Modeling Differences in the Dimensionality of Multiblock Data by Means of Clusterwise Simultaneous Component Analysis

Published: 25 January 2013

Volume 78, pages 648–668, (2013)
Cite this article

Psychometrika Aims and scope Submit manuscript

Kim De Roover¹,
Eva Ceulemans¹,
Marieke E. Timmerman²,
John B. Nezlek^3,4 &
…
Patrick Onghena¹

480 Accesses
20 Citations
Explore all metrics

Abstract

Given multivariate multiblock data (e.g., subjects nested in groups are measured on multiple variables), one may be interested in the nature and number of dimensions that underlie the variables, and in differences in dimensional structure across data blocks. To this end, clusterwise simultaneous component analysis (SCA) was proposed which simultaneously clusters blocks with a similar structure and performs an SCA per cluster. However, the number of components was restricted to be the same across clusters, which is often unrealistic. In this paper, this restriction is removed. The resulting challenges with respect to model estimation and selection are resolved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clusterwise analysis for multiblock component methods

Article 08 November 2017

Hierarchical disjoint principal component analysis

Article 24 August 2022

Parsimonious Finite Mixtures of Matrix-Variate Regressions

Notes

This algorithm is implemented in an easy-to-use software program that can be downloaded at http://ppw.kuleuven.be/okp/software/MBCA/ (De Roover et al. 2012a).
It was confirmed for the simulation study reported below that multiplying the second term of the loss function (and partition criterion) with two—like in the AIC—gives an optimal cluster recovery for 99.6 % of the simulated data sets, as opposed to using another factor. In particular, multiplying fp with log(N)—like in the Bayesian information criterion (BIC; Schwarz 1978)—appeared to lead to a too high penalty, in that too few data blocks were assigned to the higher-dimensional clusters.
The adapted procedure will be added to the above mentioned software program in the near future and the updated program will be made available at http://ppw.kuleuven.be/okp/software/MBCA/.
In Step 2 of the ALS_AIC procedure, the estimation of the SCA-ECP model per cluster is also based on the least squares estimates for the $\mathbf{F}_{i}^{(k)}$ and B ^(k) matrices described by Timmerman and Kiers (2003), which implies that this step minimizes the SSE objective function. This is equivalent to minimizing the AIC objective function, because the number of free parameters is fixed within Step 2 and the minimal SSE corresponds to the minimal log(SSE).
We also assessed the sensitivity to local minima and the recovery of the within-cluster component structures. A sufficiently low sensitivity to local minima was established for both procedures (i.e., 5.17 % and 0.29 % local minima over all conditions for ALS_SSE and ALS_AIC, respectively) and the recovery of the cluster loading matrices was found to be really good (i.e., mean congruence coefficient of 0.9968 (SD=0.02) between estimated and simulated loadings across all conditions) for the ALS_AIC procedure. Note that previous studies on Clusterwise SCA (De Roover et al. 2012c; De Roover, Ceulemans, Timmerman, & Onghena, 2012b) have already indicated that the within-cluster component loadings are recovered very well in cases where the data blocks are clustered correctly.
The mean values for the modified RV-coefficient (Smilde, Kiers, Bijlsma, Rubingh, & van Erk, 2009), are 0.02 (SD=0.09) and 0.59 (SD=0.08), respectively.

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Article Google Scholar
Barrett, L.F. (1998). Discrete emotions or dimensions? The role of valence focus and arousal focus. Cognition and Emotion, 12, 579–599.
Article Google Scholar
Brusco, M.J., & Cradit, J.D. (2001). A variable selection heuristic for K-means clustering. Psychometrika, 66, 249–270.
Article Google Scholar
Brusco, M.J., & Cradit, J.D. (2005). ConPar: a method for identifying groups of concordant subject proximity matrices for subsequent multidimensional scaling analyses. Journal of Mathematical Psychology, 49, 142–154.
Article Google Scholar
Cattell, R.B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245–276.
Article Google Scholar
Ceulemans, E., & Kiers, H.A.L. (2006). Selecting among three-mode principal component models of different types and complexities: a numerical convex hull based method. British Journal of Mathematical & Statistical Psychology, 59, 133–150.
Article Google Scholar
Ceulemans, E., & Kiers, H.A.L. (2009). Discriminating between strong and weak structures in three-mode principal component analysis. British Journal of Mathematical & Statistical Psychology, 62, 601–620.
Article Google Scholar
Ceulemans, E., Timmerman, M.E., & Kiers, H.A.L. (2011). The CHULL procedure for selecting among multilevel component solutions. Chemometrics and Intelligent Laboratory Systems, 106, 12–20.
Article Google Scholar
Ceulemans, E., & Van Mechelen, I. (2005). Hierarchical classes models for three-way three-mode binary data: interrelations and model selection. Psychometrika, 70, 461–480.
Article Google Scholar
Cohen, J. (1973). Eta-squared and partial eta-squared in fixed factor ANOVA designs. Educational and Psychological Measurement, 33, 107–112.
Article Google Scholar
De Roover, K., Ceulemans, E., & Timmerman, M.E. (2012a). How to perform multiblock component analysis in practice. Behavior Research Methods, 44, 41–56.
Article PubMed Google Scholar
De Roover, K., Ceulemans, E., Timmerman, M.E., & Onghena, P. (2012b). A clusterwise simultaneous component method for capturing within-cluster differences in component variances and correlations. British Journal of Mathematical & Statistical Psychology. doi:10.1111/j.2044-8317.2012.02040.x. Advance online publication.
Google Scholar
De Roover, K., Ceulemans, E., Timmerman, M.E., Vansteelandt, K., Stouten, J., & Onghena, P. (2012c). Clusterwise simultaneous component analysis for the analysis of structural differences in multivariate multiblock data. Psychological Methods, 17, 100–119.
Article PubMed Google Scholar
Diaz-Loving, R. (1998). Contributions of Mexican ethnopsychology to the resolution of the etic-emic dilemma in personality. Journal of Cross-Cultural Psychology, 29, 104–118.
Article Google Scholar
Feningstein, A., Scheier, M.F., & Buss, A. (1975). Public and private self-consciousness. Journal of Consulting and Clinical Psychology, 43, 522–527.
Article Google Scholar
Goldberg, L.R. (1990). An alternative “description of personality”: the Big-Five factor structure. Journal of Personality and Social Psychology, 59, 1216–1229.
Article PubMed Google Scholar
Hands, S., & Everitt, B. (1987). A Monte Carlo study of the recovery of cluster structure in binary data by hierarchical clustering techniques. Multivariate Behavioral Research, 22, 235–243.
Article Google Scholar
Hoerl, A.E. (1962). Application of ridge analysis to regression problems. Chemical Engineering Progress, 58, 54–59.
Google Scholar
Hofmans, J., Ceulemans, E., Steinley, D., & Van Mechelen, I. (2012). On the added value of bootstrap analysis for K-means clustering. Manuscript conditionally accepted.
Jolliffe, I.T. (1986). Principal component analysis. New York: Springer.
Book Google Scholar
Kaiser, H.F. (1958). The Varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.
Article Google Scholar
Kiers, H.A.L. (1990). SCA. A program for simultaneous components analysis of variables measured in two or more populations. Groningen: iec ProGAMMA.
Google Scholar
Kiers, H.A.L., & ten Berge, J.M.F. (1994). Hierarchical relations between methods for Simultaneous Components Analysis and a technique for rotation to a simple simultaneous structure. British Journal of Mathematical & Statistical Psychology, 47, 109–126.
Article Google Scholar
McLachlan, G.J., & Peel, D. (2000). Finite mixture models. New York: Wiley.
Book Google Scholar
Meredith, W., & Millsap, R.E. (1985). On component analyses. Psychometrika, 50, 495–507.
Article Google Scholar
Milligan, G.W., Soon, S.C., & Sokol, L.M. (1983). The effect of cluster size, dimensionality, and the number of clusters on recovery of true cluster structure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5, 40–47.
Article PubMed Google Scholar
Nezlek, J.B. (2005). Distinguishing affective and non-affective reactions to daily events. Journal of Personality, 73, 1539–1568.
Article PubMed Google Scholar
Nezlek, J.B. (2012). Diary methods for social and personality psychology. In J.B. Nezlek (Ed.), The SAGE library in social and personality psychology methods. London: Sage Publications.
Google Scholar
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2, 559–572.
Article Google Scholar
Robert, P., & Escoufier, Y. (1976). A unifying tool for linear multivariate statistical methods: the RV-coefficient. Applied Statistics, 25, 257–265.
Article Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
Article Google Scholar
Selim, S.Z., & Ismail, M.A. (1984). K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 81–87.
Article PubMed Google Scholar
Smilde, A.K., Kiers, H.A.L., Bijlsma, S., Rubingh, C.M., & van Erk, M.J. (2009). Matrix correlations for high-dimensional data: the modified RV-coefficient. Bioinformatics, 25, 401–405.
Article PubMed Google Scholar
Steinley, D. (2003). Local optima in K-means clustering: what you don’t know may hurt you. Psychological Methods, 8, 294–304.
Article PubMed Google Scholar
ten Berge, J.M.F. (1993). Least squares optimization in multivariate analysis. Leiden: DSWO Press.
Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological), 58, 267–288.
Google Scholar
Timmerman, M.E., Ceulemans, E., Kiers, H.A.L., & Vichi, M. (2010). Factorial and reduced K-means reconsidered. Computational Statistics & Data Analysis, 54, 1858–1871.
Article Google Scholar
Timmerman, M.E., & Kiers, H.A.L. (2000). Three-mode principal component analysis: choosing the numbers of components and sensitivity to local optima. British Journal of Mathematical & Statistical Psychology, 53, 1–16.
Article Google Scholar
Timmerman, M.E., & Kiers, H.A.L. (2003). Four simultaneous component models of multivariate time series from more than one subject to model intraindividual and interindividual differences. Psychometrika, 86, 105–122.
Article Google Scholar
Timmerman, M.E., Kiers, H.A.L., Smilde, A.K., Ceulemans, E., & Stouten, J. (2009). Bootstrap confidence intervals in multi-level simultaneous component analysis. British Journal of Mathematical & Statistical Psychology, 62, 299–318.
Article Google Scholar
Trapnell, P.D., & Campbell, J.D. (1999). Private self-consciousness and the five factor model of personality: distinguishing rumination from reflection. Journal of Personality and Social Psychology, 76, 284–304.
Article PubMed Google Scholar
Tugade, M.M., Fredrickson, B.L., & Barrett, L.F. (2004). Psychological resilience and positive emotional granularity: examining the benefits of positive emotions on coping and health. Journal of Personality, 72, 1161–1190.
Article PubMed Google Scholar
Van Deun, K., Wilderjans, T.F., van den Berg, R.A., Antoniadis, A., & Van Mechelen, I. (2011). A flexible framework for sparse simultaneous component based data integration. BMC Bioinformatics, 12, 448.
Article PubMed Google Scholar
Van Mechelen, I., & Smilde, A.K. (2010). A generic linked-mode decomposition model for data fusion. Chemometrics and Intelligent Laboratory Systems, 104, 83–94. doi:10.1016/j.chemolab.2010.04.012.
Article Google Scholar
Wilderjans, T.F., Ceulemans, E., Van Mechelen, I., & van den Berg, R.A. (2011). Simultaneous analysis of coupled data matrices subject to different amounts of noise. British Journal of Mathematical & Statistical Psychology, 64, 277–290.
Article Google Scholar
Yung, Y.F. (1997). Finite mixtures in confirmatory factor-analysis models. Psychometrika, 62, 297–330.
Article Google Scholar

Download references

Acknowledgements

The research reported in this paper was partially supported by the fund for Scientific Research-Flanders (Belgium), Project No. G.0477.09 awarded to Eva Ceulemans, Marieke Timmerman, and Patrick Onghena and by the Research Council of KU Leuven (GOA/2010/02).

Author information

Authors and Affiliations

Methodology of Educational Sciences Research Unit, Faculty of Psychology and Educational Sciences, KU Leuven, Andreas Vesaliusstraat 2, 3000, Leuven, Belgium
Kim De Roover, Eva Ceulemans & Patrick Onghena
University of Groningen, Groningen, The Netherlands
Marieke E. Timmerman
College of William & Mary, Williamsburg, US
John B. Nezlek
Faculty in Poznań, University of Social Sciences and Humanities, Poznań, Poland
John B. Nezlek

Authors

Kim De Roover
View author publications
You can also search for this author in PubMed Google Scholar
Eva Ceulemans
View author publications
You can also search for this author in PubMed Google Scholar
Marieke E. Timmerman
View author publications
You can also search for this author in PubMed Google Scholar
John B. Nezlek
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Onghena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kim De Roover.

Appendix: Derivation of an AIC-Based Partition Criterion

Conditional upon a specific Clusterwise SCA-ECP model M, the log-likelihood of data block X _i when assigned to cluster k (and thus modeled by $\mathbf{M}_{i}^{(k)}$) amounts to

(A.1)

which is the block-specific counterpart of Equation (7), given $\mathit{SSE}_{i}^{(k)}$ as defined in Equation (5). When inserting $\hat{\sigma}^{2} = \frac{\mathit{SSE}_{i}^{(k)}}{N_{i} J}$ as a post-hoc estimator of the error variance σ ² (Wilderjans et al. 2011), the log-likelihood can be rewritten as

$$ \operatorname{loglik}\bigl(\mathbf{X}_{i}| \mathbf{M}_{i}^{(k)}\bigr) = - \frac{N_{i} J}{2}\bigl[ 1 + \log( 2\pi) - \log( N_{i }J ) + \log\bigl( \mathit{SSE}_{i}^{(k)} \bigr) \bigr], $$

(A.2)

where the first three terms are not influenced by the cluster assignment and can thus be discarded. The number of free parameters for data block i, when it is tentatively assigned to cluster k, is denoted by $\mathit{fp}_{i}^{(k)}$ and can be computed as follows:

$$ \mathit{fp}_{i}^{(k)} = N_{i}Q^{(k)}. $$

(A.3)

It corresponds to the size of the component score matrix $\mathbf{F}_{i}^{(k)}$ that is computed to evaluate the fit of data block i in cluster k. When combining Equations (A.2) (omitting the invariant terms) and (A.3) as in the AIC (Akaike 1974), we obtain the AIC-based partition criterion in Equation (11).

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Roover, K., Ceulemans, E., Timmerman, M.E. et al. Modeling Differences in the Dimensionality of Multiblock Data by Means of Clusterwise Simultaneous Component Analysis. Psychometrika 78, 648–668 (2013). https://doi.org/10.1007/s11336-013-9318-4

Download citation

Received: 17 January 2012
Revised: 04 July 2012
Published: 25 January 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s11336-013-9318-4

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling Differences in the Dimensionality of Multiblock Data by Means of Clusterwise Simultaneous Component Analysis

Abstract

Access this article

Similar content being viewed by others

Clusterwise analysis for multiblock component methods

Hierarchical disjoint principal component analysis

Parsimonious Finite Mixtures of Matrix-Variate Regressions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Derivation of an AIC-Based Partition Criterion

Rights and permissions

About this article

Cite this article

Key words

Navigation

Modeling Differences in the Dimensionality of Multiblock Data by Means of Clusterwise Simultaneous Component Analysis

Abstract

Access this article

Similar content being viewed by others

Clusterwise analysis for multiblock component methods

Hierarchical disjoint principal component analysis

Parsimonious Finite Mixtures of Matrix-Variate Regressions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: Derivation of an AIC-Based Partition Criterion

Appendix: Derivation of an AIC-Based Partition Criterion

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation