Skip to main content
Log in

Variable selection and the interpretation of principal subspaces

  • Published:
Journal of Agricultural, Biological, and Environmental Statistics Aims and scope Submit manuscript

Abstract

Principal component analysis is widely used in the analysis of multivariate data in the agricultural, biological, and environmental sciences. The first few principal components (PCs) of a set of variables are derived variables with optimal properties in terms of approximating the original variables. This paper considers the problem of identifying subsets of variables that best approximate the full set of variables or their first few PCs, thus stressing dimensionality reduction in terms of the original variables rather than in terms of derived variables (PCs) whose definition requires all the original variables. Criteria for selecting variables are often ill defined and may produce inappropriate subsets. Indicators of the performance of different subsets of the variables are discussed and two criteria are defined. These criteria are used in stepwise selection-type algorithms to choose good subsets. Examples are given that show, among other things, that the selection of variable subsets should not be based only on the PC loadings of the variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aarts, E., and Korst, J. (1989), Simulated Annealing and Boltzmann Machines—A Stochastic Approach to Combinatorial Optimization and Neural Computing, Chichester: Wiley Interscience Series in Discrete Mathematics and Optimization.

    MATH  Google Scholar 

  • Baeriswyl, P. A., and Rebetez, M. (1997), “Regionalization of Precipitation in Switzerland by Means of Principal Component Analysis,” Theoretical and Applied Climatology, 58, 31–41.

    Article  Google Scholar 

  • Bonifas, I., Escoufier, Y., Gonzales, P. L., and Sabatier, R. (1984), “Choix de Variables en Analyse en Composants Principales,” Revue de Statistiques Appliquées, 23, 5–15.

    Google Scholar 

  • Cadima, J., and Jolliffe, I. T. (1995), “Loadings and Correlations in the Interpretation of Principal Components,” Journal of Applied Statistics, 22, 203–214.

    Article  MathSciNet  Google Scholar 

  • Durrieu, G., Letellier, T., Antoch, J., Deshouillers, J. M., Malgat, M., and Mazat, J. P. (1997), “Identification of Mitochondrial Deficiency Using Principal Component Analysis,” Molecular and Cellular Biochemistry, 174, 149–156.

    Article  Google Scholar 

  • Falguerolles, A., and Jmel, S. (1993), “Un Critére de Choix de Variables en Analyse en Composants Principales Fondé surdes Modèles Graphiques Gaussiens Particuliers,” The Canadian Journal of Statistics, 21, 239–256.

    Article  MATH  Google Scholar 

  • Ferraz, A., Esposito, E., Bruns, R. E., and Duran, N. (1998), “The Use of Principal Component Analysis (PCA) for Pattern Recognition in Eucalyptus grandis Wood Biodegradation Experiments,” World Journal of Microbiology and Biotechnology, 14, 487–490.

    Article  Google Scholar 

  • Golub, G., and Van Loan, C. (1996), Matrix Computations, Baltimore: Johns Hopkins University Press.

    MATH  Google Scholar 

  • Gonzalez, P. L., Evry, R., Cléroux, R., and Rioux, B. (1990), “Selecting the Best Subset of Variables in Principal Component Analysis,” in Compstat 1990, eds. K. Momirovic and V. Mildner, Heidelberg: Physica-Verlag, pp. 115–120.

    Google Scholar 

  • Jeffers, J. N. R. (1967), “Two Case Studies in the Application of Principal Component Analysis,” Applied Statistics, 16, 225–236.

    Article  Google Scholar 

  • Jolicoeur, P. (1963), “The Multivariate Generalisation of the Allometry Equation,” Biometrics, 19, 497–499.

    Article  Google Scholar 

  • Jolliffe, I. T. (1972), “Discarding Variables in a Principal Component Analysis, I: Artificial Data,” Applied Statistics, 21, 160–173.

    Article  MathSciNet  Google Scholar 

  • Jolliffe, I. T. (1973), “Discarding Variables in a Principal Component Analysis, II: Real Data,” Applied Statistics, 22, 21–31.

    Article  Google Scholar 

  • Jolliffe, I. T. (1986), Principal Component Analysis, New York: Springer-Verlag.

    Google Scholar 

  • Jolliffe, I. T. (1987), “Letter to the Editors,” Applied Statistics, 36, 373–374.

    Google Scholar 

  • Jolliffe, I. T. (1989), “Rotation of Ill-Defined Principal Components,” Applied Statistics, 38, 139–147.

    Article  MathSciNet  Google Scholar 

  • Krzanowski, W. J. (1987), “Selection of Variables to Preserve Multivariate Data Structure Using Principal Components,” Applied Statistics, 36, 22–33.

    Article  Google Scholar 

  • Krzanowski, W. J. (1988), Principles of Multivariate Analysis: A User’s Perspective, Oxford: Clarendon Press.

    MATH  Google Scholar 

  • McCabe, G. P. (1984), “Principal Variables,” Technometrics, 26, 137–144.

    Article  MATH  MathSciNet  Google Scholar 

  • McCabe, G. P. (1986), “Prediction of Principal Components by Variables Subsets,” Technical Report 86-19, Purdue University, Dept. of Statistics.

  • Neter, J., Wasserman, W., and Kutner, M. H. (1990), Applied Linear Statistical Models (3rd ed.), Chicago: Irwin.

    Google Scholar 

  • Ramsay, J. O., and Silverman, B. W. (1997), Functional Data Analysis, Springer Series in Statics, Springer.

  • Ramsay, J. O., ten Berge, J., and Styan, G. P. H. (1984), “Matrix Correlation,” Psychometrika, 49, 403–423.

    Article  MATH  MathSciNet  Google Scholar 

  • Richman, M. B. (1992), “Determination of Dimensionality in Eigen analysis,” Proceedings of the Fifth International Meeting on Statistical Climatology, 229–235.

  • Somers, K. M. (1986), “Allometry, Isometry and Shape in Principal Component Analysis,” Systematic Zoology, 38, 169–173.

    Article  Google Scholar 

  • Teitelman, M., and Eeckman, F. H. (1996), “Principal Component Analysis and Large-Scale Correlations in Non-Coding Sequences of Human DNA,” Journal of Computational Biology, 3, 573–576.

    Article  Google Scholar 

  • Villar, A., Garcia, J. A., Iglesias, L., Garcia, M. L., and Otero, A. (1996), “Application of Principal Component Analysis to the Study of Microbial Populations in Refrigerated Raw Milk From Farms,” International Dairy Journal, 6, 937–945.

    Article  Google Scholar 

  • Yu, C. C., Quinn, J. T., Dufournaud, C. M., Harrington, J. J., Rogers, P. P., and Lohani, B. N. (1998), “Effective Dimensionality of Environmental Indicators: A Principal Component Analysis With Bootstrap Confidence Intervals,” Journal of Environmental Management, 53, 101–119.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge F. C. L. Cadima.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cadima, J.F.C.L., Jolliffe, I.T. Variable selection and the interpretation of principal subspaces. JABES 6, 62 (2001). https://doi.org/10.1198/108571101300325256

Download citation

  • Received:

  • Accepted:

  • DOI: https://doi.org/10.1198/108571101300325256

Key Words

Navigation