Skip to main content
Log in

Dealing with Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation

  • Published:
Mathematical Geology Aims and scope Submit manuscript

Abstract

The statistical analysis of compositional data based on logratios of parts is not suitable when zeros are present in a data set. Nevertheless, if there is interest in using this modeling approach, several strategies have been published in the specialized literature which can be used. In particular, substitution or imputation strategies are available for rounded zeros. In this paper, existing nonparametric imputation methods—both for the additive and the multiplicative approach—are revised and essential properties of the last method are given. For missing values a generalization of the multiplicative approach is proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFRENCES

  1. Aitchison, J., 1986, The statistical analysis of compositional data: Chapman and Hall, London, 416p.

    Google Scholar 

  2. Aitchison, J., 1997, The one-hour course in compositional data analysis or compositional data analysis is simple, in Pawlowsky-Glahn, V., ed., Proceedings of IAMG'97, The Third Annual Conference of the International Association for Mathematical Geology, Vol.1: International Center for Numerical Methods in Engineering (CIMNE); Barcelona, Spain, p. 3-35.

    Google Scholar 

  3. Aitchison, J., 2002, Simplicial inference, in Viana, M. A. G., and Richards, D. S. P., eds., Contemporary mathematics series, Vol.287: Algebraic methods in statistics and probability, American Mathematical Society, Providence, RI, p. 1-22.

    Google Scholar 

  4. Aitchison, J., Barceló-Vidal, C., Martín-Fernández, J. A., and Pawlowsky-Glahn, V., 2000, Logratio analysis and compositional distance: Math. Geol., v.32, no.3, p. 271-275.

    Google Scholar 

  5. Aitchison, J., and Greenacre, M., 2002, Biplots of compositional data: Appl. Stat., v.51, no.4, p. 375-392.

    Google Scholar 

  6. Allison, P. D., 2001, Missing data: Sage University Papers Series on Quantitative Applications in the Social Sciences, 07-136, Thousand Oaks, CA, 93p.

    Google Scholar 

  7. Bacon-Shone, J., 1992, Ranking methods for compositional data: Appl. Stat., v.41, no.3, p. 533-537.

    Google Scholar 

  8. Barceló-Vidal, C., Martíln-Fernández, J. A., and Pawlowsky-Glahn, V., 2001, Mathematical foundations of compositional data analysis, in Ross, G., ed., Proceedings of IAMG'01, The sixth annual conference of the International Association for Mathematical Geology: Cancun, Mexico, 20p. (CD, electronic publication).

    Google Scholar 

  9. Billheimer, D., Guttorp, P., and Fagan, W., 2001, Statistical interpretation of species composition: J. Am. Stat. Assoc., v.96, p. 1205-1214.

    Google Scholar 

  10. Bohling, G. C., Davis, J. C., Olea, R. A., and Harff, J., 1996, Singularity and nonnormality in the classification of compositional data: Math. Geol., v.30, no.1, p. 5-20.

    Google Scholar 

  11. Cox, T. F., and Cox, M. A., 1994, Multidimensional Scaling: Monographs on statistics and applied probability: Chapman and Hall, London, 213p.

    Google Scholar 

  12. Davis, J. C., Harff, J., Olea, R., and Bohling, G. C., 1995, Regionalized classification of the Darss Sill sediments, in Pawlowsky-Glahn, V., ed., Proceedings of IAMG'97, The Third Annual Conference of the International Association for Mathematical Geology, Vol.1: International Center for Numerical Methods in Engineering (CIMNE), Barcelona, p. 145-150.

    Google Scholar 

  13. Fry, J. M., Fry, T. R. L., and McLaren, K. R., 1996, Compositional data analysis and zeros in micro data: Centre of Policy Studies (COPS), General Paper no. G-120, Monash University, Clayton, Australia.

    Google Scholar 

  14. Krzanowski, W. J., 1988, Principles of multivariate analysis: A user's perspective: Clarendon Press, Oxford, 563p. (reprinted 1996).

    Google Scholar 

  15. Little, R. J. A., and Rubin, D. B., 1987, Statistical analysis with missing data: Wiley, New York, 278p.

    Google Scholar 

  16. Martín-Fernández, J. A., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 1997, Different classifications of the Darss Sill data set based on mixture models for compositional data, in Pawlowsky-Glahn, V., ed., Proceedings of IAMG'97, The Third Annual Conference of the International Association for Mathematical Geology, Vol.1: International Center for Numerical Methods in Engineering (CIMNE), Barcelona, p. 151-158.

    Google Scholar 

  17. Martín-Fernández, J. A., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 1998a, Measures of difference for compositional data and hierarchical clustering methods, in Buccianti, A., Nardi, G., and Potenza, R., eds., Proceedings of IAMG'98, The Fourth Annual Conference of the International Association for Mathematical Geology, Vol.2: De Frede Editore, Napoli, p. 526-531.

    Google Scholar 

  18. Martín-Fernández, J. A., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 1998b, A critical approach to nonparametric classification of compositional data, in Rizzi, A., Vichi, M., and Bock, H. H., eds., Advances in data science and classification, Proceedings of the 6th Conference of the International Federation of Classification Societies (IFCS-98), Università La Sapienza, Roma: Springer-Verlag, Berlin, p. 49-56.

    Google Scholar 

  19. Martín-Fernández, J. A., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 2000, Zero replacement in compositional data sets, in Kiers, H., Rasson, J., Groenen, P., and Shader, M., eds., Studies in classification, data analysis, and knowledge organization, Proceedings of the 7th Conference of the International Federation of Classification Societies (IFCS'2000), University of Namur, Namur: Springer-Verlag, Berlin, p. 155-160.

    Google Scholar 

  20. Martín-Fernández, J. A., Olea-Meneses, R., and Pawlowsky-Glahn, V., 2001, Criteria to compare estimation methods of regionalized compositions: Math. Geol., v.33, no.8, p. 889-909.

    Google Scholar 

  21. Mateu-Figueras, G., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 1998, Modeling compositional data with multivariate skew-normal distributions, in Buccianti, A., Nardi, G., and Potenza, R., eds., Proceedings of IAMG'98, The Fourth Annual Conference of the International Association for Mathematical Geology, Vol.1: De Frede Editore, Napoli, p. 532-537.

    Google Scholar 

  22. Pawlowsky-Glahn, V., and Egozcue, J. J., 2001, Geometric approach to statistical analysis on the simplex: SERRA, v.15, no.5, p. 384-398.

    Google Scholar 

  23. Pawlowsky-Glahn, V., and Egozcue, J. J., 2002, BLU estimators and compositional data: Math. Geol., v.34, no.3, p. 259-274.

    Google Scholar 

  24. Sandford, R. F., Pierson, C. T., and Crovelli, R. A., 1993, An objective replacement method for censored geochemical data: Math. Geol., v.25, no.1, p. 59-80.

    Google Scholar 

  25. Shafer, J. L., 1997, Analysis of incomplete multivariate data: Chapman and Hall, London, 430p.

    Google Scholar 

  26. Tauber, F., 1999, Spurious clusters in granulometric data caused by logratio transformation: Math. Geol., v.31, no.5, p. 491-504.

    Google Scholar 

  27. Zhou, D., 1997, Logratio statistical classification and estimation of hydrodynamic parameters from Darss Sill grain-size data, in Pawlowsky-Glahn, V., ed., Proceedings of IAMG'97, The Third Annual Conference of the International Association for Mathematical Geology, Vol.1: International Center for Numerical Methods in Engineering (CIMNE), Barcelona, p. 139-144.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martín-Fernández, J.A., Barceló-Vidal, C. & Pawlowsky-Glahn, V. Dealing with Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation. Mathematical Geology 35, 253–278 (2003). https://doi.org/10.1023/A:1023866030544

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1023866030544

Navigation