Abstract
Persistence homology is a vital tool for topological data analysis. Previous work has developed some statistical estimators for characteristics of collections of persistence diagrams. However, tools that provide statistical inference for observations that are persistence diagrams are limited. Specifically, there is a need for tests that can assess the strength of evidence against a claim that two samples arise from the same population or process. This expository paper provides an introduction to randomization-style null hypothesis significance tests (NHST) and shows how they can be used with sets of persistence diagrams. The hypothesis test is based on a loss function that comprises pairwise distances between the elements of each sample and all the elements in the other sample. We use this method to analyze a range of simulated and experimental data. Through these examples we experimentally explore the power of the p-values. Our results show that the randomization-style NHST based on pairwise distances can distinguish between samples from different processes, which suggests that its use for hypothesis tests upon persistence diagrams is reasonable. We demonstrate its application on a real dataset of fMRI data of patients with ADHD.
Similar content being viewed by others
Notes
The reader should note that in Turner et al. (2014b) the focus is on the L1 distance.
References
Baddeley, A., Silverman, B.: A cautionary example on the use of second-order methods for analyzing point patterns. Biometrics. 40(4), 1089–1093 (1984)
Baddeley, A., Turner, R., et al.: Spatstat: an R package for analyzing spatial point patterns. J. Stat. Softw. 12(6), 1–42 (2005)
Balakrishnan, S., Fasy, B., Lecci, F., Rinaldo, A., Singh, A., and Wasserman, L.: Statistical inference for persistent homology (2013). arXiv:1303.7117
Bendich, P., Edelsbrunner, H., Kerber, M.: Computing robustness and persistence for images. Vis. Comput. Graph. IEEE Trans. 16(6), 1251–1260 (2010)
Berger, J.: Could Fisher, Jeffreys and Neyman have agreed on testing? Stat. Sci. 18(1), 1–32 (2003)
Biscio, C., Møller, J.: The accumulated persistence function, a new useful functional summary statistic for topological data analysis, with a view to brain artery trees and spatial point process applications. (2016). arXiv:1611.00630
Bubenik, P.: Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16(1), 77–102 (2015)
Bubenik, P., Kim, P.T.: A statistical approach to persistent homology. Homol. Homotopy Appl. 9(2), 337–362 (2007)
Casella, G., Berger, R.L.: Statistical Inference. Duxbury Press, Belmont (1990)
Cericola, C., Johnson, I., Kiers, J., Krock, M., Purdy, J., Torrence, J. Extending hypothesis testing with persistence homology to three or more groups. (2016). arXiv:1602.03760
Cerri, A., Ferri, M., Giorgi, D.: Retrieval of trademark images by means of size functions. Graph. Models 68(5), 451–471 (2006)
Chazal, F., Glisse, M., Labruère, C., Michel, B. Optimal rates of convergence for persistence diagrams in topological data analysis. (2013). arXiv:1305.6239
Edgington, E. S., Onghena, P.: Randomization Tests, 4th edn. Chapman & Hall/CRC, Boca Raton (2007)
Ellis, S. P., Klein, A. Describing high-order statistical dependence using “concurrence topology”, with application to functional mri brain data. (2012). arXiv:1212.1642
Gamble, J., Heo, G.: Exploring uses of persistent homology for statistical analysis of landmark-based shape data. J. Multivariate Anal. 101(9), 2184–2199 (2010)
Gao, J. X.: Visionlab. WWW. (2004). http://visionlab.uta.edu/shape_data.htm
Hatcher, A.: Algebraic topology. Cambridge University Press (2002)
Latecki, L. J., Lakamper, R., Eckhardt, T.: Shape descriptors for non-rigid shapes with a single closed contour. In: Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on, vol. 1, pp. 424–429. IEEE (2000)
Mileyko, Y., Mukherjee, S., Harer, J.: Probability measures on the space of persistence diagrams. Inverse Probl. 27(12), 124007 (2011)
Pawitan, Y. : In All Likelihood: Statistical Modelling and Inference Using Likelihood. Clarendon Press, Oxford (2001)
Phipson, B., Smyth, G.K.: Permutation p-values should never be zero: calculating exact p-values when permutations are randomly drawn. Stat. Appl. Genet. Mol. Biol. 9(1) (2010)
Robins, V., Turner, K.: Principal component analysis of persistent homology rank functions with case studies of spatial point patterns, sphere packing and colloids. (2015). arXiv:1507.01454
Sikora, T.: The MPEG-7 visual standard for content description-an overview. Circuits Syst. Video Technol. IEEE Trans. 11(6), 696–702 (2001)
Turner, K. Means and medians of sets of persistence diagrams. (2013). arXiv:1307.8300
Turner, K., Mileyko, Y., Mukherjee, S., Harer, J. Fréchet means for distributions of persistence diagrams. Discret. Comput. Geom. 52(1), 44–70 (2014a)
Turner, K., Mukherjee, S., Boyer, D.M.: Persistent homology transform for modeling shapes and surfaces. Inf. Inference 3(4), 310–344 (2014b)
Welsh, A.H.: Aspects of Statistical Inference. Wiley, New York (1996)
Acknowledgements
We thank Steve Ellis and Arno Klein for providing us with the persistence diagrams produced in their work. The authors would like to acknowledge the assistance of the Defence Science Institute in facilitating this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Rights and permissions
About this article
Cite this article
Robinson, A., Turner, K. Hypothesis testing for topological data analysis. J Appl. and Comput. Topology 1, 241–261 (2017). https://doi.org/10.1007/s41468-017-0008-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41468-017-0008-7