Skip to main content
Log in

Advances in Principal Balances for Compositional Data

  • Published:
Mathematical Geosciences Aims and scope Submit manuscript

Abstract

Compositional data analysis requires selecting an orthonormal basis with which to work on coordinates. In most cases this selection is based on a data driven criterion. Principal component analysis provides bases that are, in general, functions of all the original parts, each with a different weight hindering their interpretation. For interpretative purposes, it would be better to have each basis component as a ratio or balance of the geometric means of two groups of parts, leaving irrelevant parts with a zero weight. This is the role of principal balances, defined as a sequence of orthonormal balances which successively maximize the explained variance in a data set. The new algorithm to compute principal balances requires an exhaustive search along all the possible sets of orthonormal balances. To reduce computational time, the sets of possible partitions for up to 15 parts are stored. Two other suboptimal, but feasible, algorithms are also introduced: (i) a new search for balances following a constrained principal component approach and (ii) the hierarchical cluster analysis of variables. The latter is a new approach based on the relation between the variation matrix and the Aitchison distance. The properties and performance of these three algorithms are illustrated using a typical data set of geochemical compositions and a simulation exercise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Aitchison J (1982) The statistical analysis of compositional data (with discussion). J R Stat Soc B Methodol 44:139–177

    Google Scholar 

  • Aitchison J (1983) Principal component analysis of compositional data. Biometrika 70:57–65

    Article  Google Scholar 

  • Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman & Hall Ltd., London. (Reprinted in 2003 with additional material by The Blackburn Press)

  • Aitchison J, Greenacre M (2002) Biplots for compositional data. J R Stat Soc C Appl 51:375–392

    Article  Google Scholar 

  • Barceló-Vidal C, Martín-Fernández JA (2016) The mathematics of compositional analysis. Austrian J Stat 45:57–71

    Article  Google Scholar 

  • Chipman HA, Gu H (2005) Interpretable dimension reduction. J Appl Stat 32:969–987

    Article  Google Scholar 

  • Cox TF, Arnold DS (2016) Simple components. J App Stat. https://doi.org/10.1080/02664763.2016.1268104

  • Enki HA, Trendafilov NT, Jolliffe IT (2013) A clustering approach to interpretable principal components. J Appl Stat 40:583–599

    Article  Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37:795–828

    Article  Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V (2006) Simplicial geometry for compositional data. Geol Soc Spec Pub 264:145–159

    Article  Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35:279–300

    Article  Google Scholar 

  • Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley, Chichester

    Book  Google Scholar 

  • Gallo M, Trendafilov NT, Buccianti A (2016) Sparse PCA and investigation of multi-elements compositional repositories: theory and applications. Environ Ecol Stat 23:421–434

    Article  Google Scholar 

  • Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:417–441

    Article  Google Scholar 

  • Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New York

    Book  Google Scholar 

  • Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer series in statistics. Springer, New York

    Google Scholar 

  • Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12:531–547

    Article  Google Scholar 

  • Lovell D, Pawlowsky-Glahn V, Egozcue JJ, Marguerat S, Bähler J (2015) Proportionality: a valid alternative to correlation for relative data. PLoS Comput Biol 11(3):e1004075. https://doi.org/10.1371/journal.pcbi.1004075

    Article  Google Scholar 

  • Mateu-Figueras G, Pawlowsky-Glahn V, Egozcue JJ (2011) The principle of working on coordinates. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis: theory and applications. Wiley, Chichester, pp 31–42

    Google Scholar 

  • Mert MC, Filzmoser P, Hron K (2015) Sparse principal balances. Stat Model 15:159–174

    Article  Google Scholar 

  • Palarea-Albaladejo J, Martín-Fernández JA, Soto JA (2012) Dealing with distances and transformations for fuzzy C-means clustering of compositional data. J Classif 29:144–169

    Article  Google Scholar 

  • Palarea-Albaladejo J, Martín-Fernández JA (2015) zCompositions—R package for multivariate imputation of nondetects and zeros in compositional data sets. Chemom Intell Lab 143:85–96

    Article  Google Scholar 

  • Pawlowsky-Glahn V, Egozcue JJ (2001) Geometric approach to statistical analysis on the simplex. Stoch Environ Res Risk Assess 15:384–398

    Article  Google Scholar 

  • Pawlowsky-Glahn V, Egozcue JJ (2011) Exploring compositional data with the CoDa-dendrogram. Austrian J Stat 40:103–113

    Google Scholar 

  • Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2011) Principal balances. In Egozcue JJ, Tolosana-Delgado R, Ortego M (eds) Proceedings of the 4th international workshop on compositional data analysis, Girona, Spain, pp 1–10

  • Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Statistics in practice. Wiley, Chichester

    Google Scholar 

  • Podani J (2000) Simulation of random dendrograms and comparison tests: some comments. J Classif 17:123–142

    Article  Google Scholar 

  • Prados F, Boada I, Prats A, Martín-Fernández JA, Feixas M, Blasco G, Puig J, Pedraza S (2010) Analysis of new diffusion tensor imaging anisotropy measures in the 3P-plot. J Magn Reson Imaging 31:1435–1444

    Article  Google Scholar 

  • R development core team (2015) R: a language and environment for statistical computing: Vienna. http://www.r-project.org

  • Tolosana-Delgado R, von Eynatten H (2010) Simplifying compositional multiple regression: application to grain size controls on sediment geochemistry. Comput Geosci 36:577–589

    Article  Google Scholar 

  • von Eynatten H, Tolosana-Delgado R, Karius V (2012) Sediment generation in modern glacial settings: grain-size and source-rock control on sediment composition. Sediment Geol 280:80–92

    Article  Google Scholar 

  • Witten D, Tibshirani R, Gross S, Narasimhan B (2011) PMA: penalized multivariate analysis. R Package Version 1:8

    Google Scholar 

Download references

Acknowledgements

This research has been supported by the Spanish Ministry of Economy and Competitiveness under the project CODA-RETOS (Ref: MTM2015-65016-C2-1(2)-R); and by the Agència de Gestió d’Ajuts Universitaris i de Recerca of the Generalitat de Catalunya under the project COSDA (Ref: 2014SGR551). The authors gratefully acknowledge the constructive comments of the anonymous referees which have undoubtedly helped to significantly improve the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. A. Martín-Fernández.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martín-Fernández, J.A., Pawlowsky-Glahn, V., Egozcue, J.J. et al. Advances in Principal Balances for Compositional Data. Math Geosci 50, 273–298 (2018). https://doi.org/10.1007/s11004-017-9712-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11004-017-9712-z

Keywords

Navigation