Variable selection and the interpretation of principal subspaces

Cadima, Jorge F. C. L.; Jolliffe, Ian T.

doi:10.1198/108571101300325256

Jorge F. C. L. Cadima¹ &
Ian T. Jolliffe²

522 Accesses
66 Citations
Explore all metrics

Abstract

Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aarts, E., and Korst, J. (1989), Simulated Annealing and Boltzmann Machines—A Stochastic Approach to Combinatorial Optimization and Neural Computing, Chichester: Wiley Interscience Series in Discrete Mathematics and Optimization.
MATH Google Scholar
Baeriswyl, P. A., and Rebetez, M. (1997), “Regionalization of Precipitation in Switzerland by Means of Principal Component Analysis,” Theoretical and Applied Climatology, 58, 31–41.
Article Google Scholar
Bonifas, I., Escoufier, Y., Gonzales, P. L., and Sabatier, R. (1984), “Choix de Variables en Analyse en Composants Principales,” Revue de Statistiques Appliquées, 23, 5–15.
Google Scholar
Cadima, J., and Jolliffe, I. T. (1995), “Loadings and Correlations in the Interpretation of Principal Components,” Journal of Applied Statistics, 22, 203–214.
Article MathSciNet Google Scholar
Durrieu, G., Letellier, T., Antoch, J., Deshouillers, J. M., Malgat, M., and Mazat, J. P. (1997), “Identification of Mitochondrial Deficiency Using Principal Component Analysis,” Molecular and Cellular Biochemistry, 174, 149–156.
Article Google Scholar
Falguerolles, A., and Jmel, S. (1993), “Un Critére de Choix de Variables en Analyse en Composants Principales Fondé surdes Modèles Graphiques Gaussiens Particuliers,” The Canadian Journal of Statistics, 21, 239–256.
Article MATH Google Scholar
Ferraz, A., Esposito, E., Bruns, R. E., and Duran, N. (1998), “The Use of Principal Component Analysis (PCA) for Pattern Recognition in Eucalyptus grandis Wood Biodegradation Experiments,” World Journal of Microbiology and Biotechnology, 14, 487–490.
Article Google Scholar
Golub, G., and Van Loan, C. (1996), Matrix Computations, Baltimore: Johns Hopkins University Press.
MATH Google Scholar
Gonzalez, P. L., Evry, R., Cléroux, R., and Rioux, B. (1990), “Selecting the Best Subset of Variables in Principal Component Analysis,” in Compstat 1990, eds. K. Momirovic and V. Mildner, Heidelberg: Physica-Verlag, pp. 115–120.
Google Scholar
Jeffers, J. N. R. (1967), “Two Case Studies in the Application of Principal Component Analysis,” Applied Statistics, 16, 225–236.
Article Google Scholar
Jolicoeur, P. (1963), “The Multivariate Generalisation of the Allometry Equation,” Biometrics, 19, 497–499.
Article Google Scholar
Jolliffe, I. T. (1972), “Discarding Variables in a Principal Component Analysis, I: Artificial Data,” Applied Statistics, 21, 160–173.
Article MathSciNet Google Scholar
Jolliffe, I. T. (1973), “Discarding Variables in a Principal Component Analysis, II: Real Data,” Applied Statistics, 22, 21–31.
Article Google Scholar
Jolliffe, I. T. (1986), Principal Component Analysis, New York: Springer-Verlag.
Google Scholar
Jolliffe, I. T. (1987), “Letter to the Editors,” Applied Statistics, 36, 373–374.
Google Scholar
Jolliffe, I. T. (1989), “Rotation of Ill-Defined Principal Components,” Applied Statistics, 38, 139–147.
Article MathSciNet Google Scholar
Krzanowski, W. J. (1987), “Selection of Variables to Preserve Multivariate Data Structure Using Principal Components,” Applied Statistics, 36, 22–33.
Article Google Scholar
Krzanowski, W. J. (1988), Principles of Multivariate Analysis: A User’s Perspective, Oxford: Clarendon Press.
MATH Google Scholar
McCabe, G. P. (1984), “Principal Variables,” Technometrics, 26, 137–144.
Article MATH MathSciNet Google Scholar
McCabe, G. P. (1986), “Prediction of Principal Components by Variables Subsets,” Technical Report 86-19, Purdue University, Dept. of Statistics.
Neter, J., Wasserman, W., and Kutner, M. H. (1990), Applied Linear Statistical Models (3rd ed.), Chicago: Irwin.
Google Scholar
Ramsay, J. O., and Silverman, B. W. (1997), Functional Data Analysis, Springer Series in Statics, Springer.
Ramsay, J. O., ten Berge, J., and Styan, G. P. H. (1984), “Matrix Correlation,” Psychometrika, 49, 403–423.
Article MATH MathSciNet Google Scholar
Richman, M. B. (1992), “Determination of Dimensionality in Eigen analysis,” Proceedings of the Fifth International Meeting on Statistical Climatology, 229–235.
Somers, K. M. (1986), “Allometry, Isometry and Shape in Principal Component Analysis,” Systematic Zoology, 38, 169–173.
Article Google Scholar
Teitelman, M., and Eeckman, F. H. (1996), “Principal Component Analysis and Large-Scale Correlations in Non-Coding Sequences of Human DNA,” Journal of Computational Biology, 3, 573–576.
Article Google Scholar
Villar, A., Garcia, J. A., Iglesias, L., Garcia, M. L., and Otero, A. (1996), “Application of Principal Component Analysis to the Study of Microbial Populations in Refrigerated Raw Milk From Farms,” International Dairy Journal, 6, 937–945.
Article Google Scholar
Yu, C. C., Quinn, J. T., Dufournaud, C. M., Harrington, J. J., Rogers, P. P., and Lohani, B. N. (1998), “Effective Dimensionality of Environmental Indicators: A Principal Component Analysis With Bootstrap Confidence Intervals,” Journal of Environmental Management, 53, 101–119.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Departmento de Matemática, Instituto Superior de Agronomia, Tapada da Ajuda, 1399, Lisboa Codex, Portugal
Jorge F. C. L. Cadima (Professor Auxiliar)
Department of Mathematical Sciences, University of Aberdeen, King’s College, AB24 3UE, Aberdeen, UK
Ian T. Jolliffe (Professor of Statistics)

Authors

Jorge F. C. L. Cadima
View author publications
You can also search for this author in PubMed Google Scholar
Ian T. Jolliffe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jorge F. C. L. Cadima.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cadima, J.F.C.L., Jolliffe, I.T. Variable selection and the interpretation of principal subspaces. JABES 6, 62 (2001). https://doi.org/10.1198/108571101300325256

Download citation

Received: 15 March 1999
Accepted: 15 June 2000
DOI: https://doi.org/10.1198/108571101300325256

Key Words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable selection and the interpretation of principal subspaces

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Data clustering: application and trends

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key Words

Navigation

Variable selection and the interpretation of principal subspaces

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Data clustering: application and trends

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Search

Navigation