Skip to main content
Log in

Understanding the topology and the geometry of the space of persistence diagrams via optimal partial transport

  • Published:
Journal of Applied and Computational Topology Aims and scope Submit manuscript

Abstract

Despite the obvious similarities between the metrics used in topological data analysis and those of optimal transport, an optimal-transport based formalism to study persistence diagrams and similar topological descriptors has yet to come. In this article, by considering the space of persistence diagrams as a space of discrete measures, and by observing that its metrics can be expressed as optimal partial transport problems, we introduce a generalization of persistence diagrams, namely Radon measures supported on the upper half plane. Such measures naturally appear in topological data analysis when considering continuous representations of persistence diagrams (e.g. persistence surfaces) but also as limits for laws of large numbers on persistence diagrams or as expectations of probability distributions on the space of persistence diagrams. We explore topological properties of this new space, which will also hold for the closed subspace of persistence diagrams. New results include a characterization of convergence with respect to Wasserstein metrics, a geometric description of barycenters (Fréchet means) for any distribution of diagrams, and an exhaustive description of continuous linear representations of persistence diagrams. We also showcase the strength of this framework to study random persistence diagrams by providing several statistical results made meaningful thanks to this new formalism.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Notes

  1. A Radon measure supported on \(\varOmega \) is a (Borel) measure that gives a finite mass to any compact subset \(K \subset \varOmega \). See Appendix A for a short reminder about measure theory.

References

  • Adams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., Chepushtanova, S., Hanson, E., Motta, F., Ziegelmeier, L.: Persistence images: a stable vector representation of persistent homology. J. Mach. Learn. Res. 18(8), 1–35 (2017)

    MathSciNet  MATH  Google Scholar 

  • Agueh, M., Carlier, G.: Barycenters in the Wasserstein space. SIAM J. Math. Anal. 43(2), 904–924 (2011)

    MathSciNet  MATH  Google Scholar 

  • Ambrosio, L., Gigli, N., Savaré, G.: Gradient Flows: In Metric Spaces and in the Space of Probability measures. Springer, Berlin (2008)

    MATH  Google Scholar 

  • Billingsley, P.: Convergence of Probability Measures. Wiley Series in Probability and Statistics. Wiley, New York (2013)

    Google Scholar 

  • Blumberg, A.J., Gal, I., Mandell, M.A., Pancia, M.: Robust statistics, hypothesis testing, and confidence intervals for persistent homology on metric measure spaces. Found. Comput. Math. 14(4), 745–789 (2014)

    MathSciNet  MATH  Google Scholar 

  • Bobrowski, O., Kahle, M., Skraba, P., et al.: Maximally persistent cycles in random geometric complexes. Ann. Appl. Probab. 27(4), 2032–2060 (2017)

    MathSciNet  MATH  Google Scholar 

  • Bochner, S.: Integration von funktionen, deren werte die elemente eines vektorraumes sind. Fundam. Math. 20(1), 262–276 (1933)

    MathSciNet  MATH  Google Scholar 

  • Bogachev, V.: Measure Theory. No. v. 1 in Measure Theory. Springer Berlin Heidelberg, Berlin (2007)

    Google Scholar 

  • Bubenik, P., Dłotko, P.: A persistence landscapes toolbox for topological statistics. J. Symb. Comput. 78, 91–114 (2017)

    MathSciNet  MATH  Google Scholar 

  • Bubenik, P., Kim, P.T., et al.: A statistical approach to persistent homology. Homol. Homotopy Appl. 9(2), 337–362 (2007)

    MathSciNet  MATH  Google Scholar 

  • Bubenik, P., Vergili, T.: Topological spaces of persistence modules and their properties. J. Appl. Comput. Topol. 2018, 1–37 (2018)

    MathSciNet  MATH  Google Scholar 

  • Carlier, G., Ekeland, I.: Matching for teams. Econ. Theor. 42(2), 397–418 (2010)

    MathSciNet  MATH  Google Scholar 

  • Carlier, G., Oberman, A., Oudet, E.: Numerical methods for matching for teams and Wasserstein barycenters. ESAIM Math. Model. Numer. Anal. 49(6), 1621–1642 (2015)

    MathSciNet  MATH  Google Scholar 

  • Carrière, M., Cuturi, M., Oudot, S.: Sliced Wasserstein kernel for persistence diagrams. In: 34th International Conference on Machine Learning (2017)

  • Carrière, M., Oudot, S.Y., Ovsjanikov, M.: Stable topological signatures for points on 3d shapes. Comput. Graph. Forum 34(5), 1–12 (2015). https://doi.org/10.1111/cgf.12692

    Article  Google Scholar 

  • Cascales, B., Raja, M.: Measurable selectors for the metric projection. Math. Nachr. 254(1), 27–34 (2003)

    MathSciNet  MATH  Google Scholar 

  • Champion, T., De Pascale, L., Juutinen, P.: The \(\infty \)-Wasserstein distance: local solutions and existence of optimal transport maps. SIAM J. Math. Anal. 40(1), 1–20 (2008)

    MathSciNet  MATH  Google Scholar 

  • Chazal, F., De Silva, V., Glisse, M., Oudot, S.: The Structure and Stability of Persistence Modules. Springer, Berlin (2016)

    MATH  Google Scholar 

  • Chazal, F., Fasy, B., Lecci, F., Michel, B., Rinaldo, A., Wasserman, L.: Subsampling methods for persistent homology. In: International Conference on Machine Learning, pp. 2143–2151 (2015)

  • Chazal, F., Fasy, B.T., Lecci, F., Rinaldo, A., Wasserman, L.A.: Stochastic convergence of persistence landscapes and silhouettes. JoCG 6(2), 140–161 (2015). https://doi.org/10.20382/jocg.v6i2a8

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, Y.C., Wang, D., Rinaldo, A., Wasserman, L.: Statistical analysis of persistence intensity functions (2015). arXiv preprint arXiv:1510.02502

  • Chizat, L., Peyré, G., Schmitzer, B., Vialard, F.X.: Unbalanced optimal transport: geometry and kantorovich formulation (2015). arXiv preprint arXiv:1508.05216

  • Cohen-Steiner, D., Edelsbrunner, H., Harer, J.: Stability of persistence diagrams. Discrete Comput. Geom. 37(1), 103–120 (2007)

    MathSciNet  MATH  Google Scholar 

  • Cohen-Steiner, D., Edelsbrunner, H., Harer, J., Mileyko, Y.: Lipschitz functions have Lp-stable persistence. Found. Comput. Math. 10(2), 127–139 (2010)

    MathSciNet  MATH  Google Scholar 

  • Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems, pp. 2292–2300 (2013)

  • Divol, V., Chazal, F.: The density of expected persistence diagrams and its kernel based estimation. JoCG 10(2), 127–153 (2019). https://doi.org/10.20382/jocg.v10i2a7

    Article  MathSciNet  MATH  Google Scholar 

  • Divol, V., Polonik, W.: On the choice of weight functions for linear representations of persistence diagrams. J. Appl. Comput. Topol. 3(3), 249–283 (2019)

    MathSciNet  MATH  Google Scholar 

  • Edelsbrunner, H., Harer, J.: Computational topology: an introduction. American Mathematical Soc, (2010)

  • Figalli, A.: The optimal partial transport problem. Arch. Ration. Mech. Anal. 195(2), 533–560 (2010)

    MathSciNet  MATH  Google Scholar 

  • Figalli, A., Gigli, N.: A new transportation distance between non-negative measures, with applications to gradients flows with dirichlet boundary conditions. J. Math. Pures Appl. 94(2), 107–130 (2010)

    MathSciNet  MATH  Google Scholar 

  • Flamary, R., Courty, N.: POT python optimal transport library (2017). https://github.com/rflamary/POT

  • Folland, G.: Real Analysis: Modern Techniques and Their Applications. Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts. Wiley, New York (2013)

    Google Scholar 

  • Genevay, A., Peyre, G., Cuturi, M.: Learning generative models with sinkhorn divergences. In: International Conference on Artificial Intelligence and Statistics, pp. 1608–1617 (2018)

  • Goel, A., Trinh, K.D., Tsunoda, K.: Asymptotic behavior of Betti numbers of random geometric complexes (2018). arXiv preprint arXiv:1805.05032

  • Hall, M.: Combinatorial Theory, 2nd edn. Wiley, New York (1986)

    MATH  Google Scholar 

  • Hiraoka, Y., Nakamura, T., Hirata, A., Escolar, E.G., Matsue, K., Nishiura, Y.: Hierarchical structures of amorphous solids characterized by persistent homology. Proc. Natl. Acad. Sci. (2016). https://doi.org/10.1073/pnas.1520877113

    Article  Google Scholar 

  • Hiraoka, Y., Shirai, T., Trinh, K.D., et al.: Limit theorems for persistence diagrams. Ann. Appl. Probab. 28(5), 2740–2780 (2018)

    MathSciNet  MATH  Google Scholar 

  • Hofer, C.D., Kwitt, R., Niethammer, M.: Learning representations of persistence barcodes. J. Mach. Learn. Res. 20(126), 1–45 (2019)

    MathSciNet  MATH  Google Scholar 

  • Kallenberg, O.: Random Measures. Elsevier, Amsterdam (1983)

    MATH  Google Scholar 

  • Kechris, A.: Classical Descriptive Set Theory. Graduate Texts in Mathematics. Springer, Berlin (1995)

    Google Scholar 

  • Kerber, M., Morozov, D., Nigmetov, A.: Geometry helps to compare persistence diagrams. J. Exp. Algorithmics 22(1), 1–4 (2017)

    MathSciNet  MATH  Google Scholar 

  • Kondratyev, S., Monsaingeon, L., Vorotnikov, D., et al.: A new optimal transport distance on the space of finite Radon measures. Adv. Differ. Equ. 21(11/12), 1117–1164 (2016)

    MathSciNet  MATH  Google Scholar 

  • Kramar, M., Goullet, A., Kondic, L., Mischaikow, K.: Persistence of force networks in compressed granular media. Phys. Rev. E 87, 042207 (2013). https://doi.org/10.1103/PhysRevE.87.042207

    Article  Google Scholar 

  • Kusano, G., Fukumizu, K., Hiraoka, Y.: Kernel method for persistence diagrams via kernel embedding and weight factor. J. Mach. Learn. Res. 18(1), 6947–6987 (2017)

    MathSciNet  MATH  Google Scholar 

  • Kusano, G., Hiraoka, Y., Fukumizu, K.: Persistence weighted gaussian kernel for topological data analysis. In: International Conference on Machine Learning, pp. 2004–2013 (2016)

  • Kwitt, R., Huber, S., Niethammer, M., Lin, W., Bauer, U.: Statistical topological data analysis - a kernel perspective. In: Advances in neural information processing systems, pp. 3070–3078 (2015)

  • Lacombe, T., Cuturi, M., Oudot, S.: Large scale computation of means and clusters for persistence diagrams using optimal transport. In: Advances in Neural Information Processing Systems (2018)

  • Le Gouic, T., Loubes, J.M.: Existence and consistency of Wasserstein barycenters. Probability Theory and Related Fields 1–17 (2016)

  • Li, C., Ovsjanikov, M., Chazal, F.: Persistence-based structural recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

  • Mileyko, Y., Mukherjee, S., Harer, J.: Probability measures on the space of persistence diagrams. Inverse Prob. 27(12), 124007 (2011)

    MathSciNet  MATH  Google Scholar 

  • Nielsen, L.: Weak convergence and Banach space-valued functions: improving the stability theory of feynman’s operational calculi. Math. Phys. Anal. Geom. 14(4), 279–294 (2011)

    MathSciNet  MATH  Google Scholar 

  • Oudot, S.Y.: Persistence Theory: From Quiver Representations to Data Analysis, vol. 209. American Mathematical Society, Providence (2015)

    MATH  Google Scholar 

  • Perlman, M.D.: Jensen’s inequality for a convex vector-valued function on an infinite-dimensional space. J. Multivar. Anal. 4(1), 52–65 (1974)

    MathSciNet  MATH  Google Scholar 

  • Peyré, G., Cuturi, M.: Computational optimal transport. 2017–86 (2017)

  • Reininghaus, J., Huber, S., Bauer, U., Kwitt, R.: A stable multi-scale kernel for topological machine learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4741–4748 (2015)

  • Santambrogio, F.: Optimal Transport for Applied Mathematicians. Birkäuser, New York (2015)

    MATH  Google Scholar 

  • Santambrogio, F.: Euclidean, metric, and Wasserstein gradient flows: an overview. Bull. Math. Sci. 7(1), 87–154 (2017)

    MathSciNet  MATH  Google Scholar 

  • Schrijver, A.: Combinatorial Optimization: Polyhedra and Efficiency, vol. 24. Springer, Berlin (2003)

    MATH  Google Scholar 

  • Schweinhart, B.: Weighted persistent homology sums of random Čech complexes (2018). arXiv preprint arXiv:1807.07054

  • Som, A., Thopalli, K., Natesan Ramamurthy, K., Venkataraman, V., Shukla, A., Turaga, P.: Perturbation robust representations of topological persistence diagrams. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 617–635 (2018)

  • Trillos, N.G., Slepčev, D.: On the rate of convergence of empirical measures in \(\infty \)-transportation distance. Can. J. Math. 67(6), 1358–1383 (2015)

    MathSciNet  MATH  Google Scholar 

  • Turner, K.: Means and medians of sets of persistence diagrams (2013). arXiv preprint arXiv:1307.8300

  • Turner, K., Mileyko, Y., Mukherjee, S., Harer, J.: Fréchet means for distributions of persistence diagrams. Discrete Comput. Geom. 52(1), 44–70 (2014)

    MathSciNet  MATH  Google Scholar 

  • Turner, K., Mukherjee, S., Boyer, D.M.: Persistent homology transform for modeling shapes and surfaces. Inf. Inference J. IMA 3(4), 310–344 (2014). https://doi.org/10.1093/imaiai/iau011

    Article  MathSciNet  MATH  Google Scholar 

  • Umeda, Y.: Time series classification via topological data analysis. Inf. Media Technol. 12, 228–239 (2017)

    Google Scholar 

  • Villani, C.: Topics in Optimal Transportation, vol. 58. American Mathematical Society, Providence (2003)

    MATH  Google Scholar 

  • Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Berlin (2008)

    MATH  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the advanced Grant of the European Research Council GUDHI (Geometric Understanding in Higher Dimensions). TL is supported by the AMX grant, École polytechnique. Authors thank Frédéric Chazal for fruitful discussions and the anonymous reviewers for their thoughtful comments and efforts towards improving this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincent Divol.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Elements of measure theory

In the following, \(\varOmega \) denotes a locally compact Polish metric space (i.e. a Polish space equipped with a distinguished Polish metric).

Definition A.1

The space \({\mathcal {M}}(\varOmega )\) of Radon measures supported on \(\varOmega \) is the space of Borel measures which give finite mass to every compact set of \(\varOmega \). The vague topology on \({\mathcal {M}}(\varOmega )\) is the coarsest topology such that the maps \(\mu \mapsto \mu (f):=\int f\mathrm {d}\mu \) are continuous for every \(f \in C_c(\varOmega )\), the space of continuous functions with compact support in \(\varOmega \).

Radon measures on a general space are also required to be regular (i.e. well approximated by above by open sets and by below by compact sets). However, on a locally compact Polish metric space (such as \(\varOmega \)), regularity is implied by the above definition (see Folland 2013, Section 7.1 and Theorem 7.8 for details).

Definition A.2

Denote by \({\mathcal {M}}_f(\varOmega )\) the space of finite Borel measures on \(\varOmega \). The weak topology on \({\mathcal {M}}_f(\varOmega )\) is the coarsest topology such that the maps \(\mu \mapsto \mu (f)\) are continuous for every \(f \in C_b(\varOmega )\), the space of continuous bounded functions in \(\varOmega \).

See Bogachev (2007, Chapter 8) for more details on the weak topology on the set of finite Borel measures (which coincide with the set of Baire measures for \(\varOmega \) a metrizable space). We denote by \(\xrightarrow {v}\) the vague convergence and \(\xrightarrow {w}\) the weak convergence.

Definition A.3

A set \(F \subset {\mathcal {M}}(\varOmega )\) is said to be tight if, for every \(\varepsilon >0\), there exists a compact set K with \(\mu (\varOmega \backslash K)\le \varepsilon \) for every \(\mu \in F\).

The following propositions are standard results. Corresponding proofs can be found for instance in Kallenberg (1983, Section 15.7).

Proposition A.4

A set \(F \subset {\mathcal {M}}(\varOmega )\) is relatively compact for the vague topology if and only if for every compact set K included in \(\varOmega \),

$$\begin{aligned} \sup \{\mu (K),\ \mu \in F\} < \infty . \end{aligned}$$

Proposition A.5

(Prokhorov’s theorem) A set \(F \subset {\mathcal {M}}_f(\varOmega )\) is relatively compact for the weak topology if and only if F is tight and \(\sup _{\mu \in F} \mu (\varOmega ) < \infty \).

Proposition A.6

Let \(\mu ,\mu _1,\mu _2,\dots \) be measures in \({\mathcal {M}}_f(\varOmega )\). Then, \(\mu _n \xrightarrow {w}\mu \) if and only if \(\mu _n(\varOmega ) \rightarrow \mu (\varOmega )\) and \(\mu _n \xrightarrow {v}\mu \).

Proposition A.7

(The Portmanteau theorem) Let \(\mu ,\mu _1,\mu _2,\dots \) be measures in \({\mathcal {M}}(\varOmega )\). Then, \(\mu _n \xrightarrow {v}\mu \) if and only if one of the following propositions holds:

  • for all open sets \(U\subset \varOmega \) and all bounded closed sets \(F\subset \varOmega \) ,

    $$\begin{aligned} \limsup _{n \rightarrow \infty } \mu _n(F) \le \mu (F) \quad {\text {and}}\quad \liminf _{n\rightarrow \infty } \mu _n(U) \ge \mu (U). \end{aligned}$$
  • for all bounded Borel sets A with \(\mu (\partial A)= 0\), \(\displaystyle \lim _{n\rightarrow \infty } \mu _n(A) = \mu (A)\).

Definition A.8

The set of point measures on \(\varOmega \) is the subset \({\mathcal {D}}(\varOmega ) \subset {\mathcal {M}}(\varOmega )\) of Radon measures with discrete support and integer mass on each point, that is of the form

$$\begin{aligned} \sum _{x \in X} n_x \delta _{x} \end{aligned}$$

where \(n_x \in {\mathbb {N}}\) and \(X \subset \varOmega \) is some locally finite set.

Proposition A.9

The set \({\mathcal {D}}(\varOmega )\) is closed in \({\mathcal {M}}(\varOmega )\) for the vague topology.

Delayed proofs of Section 3

For the sake of completeness, we present in this section proofs which either require very few adaptations from corresponding proofs in Figalli and Gigli (2010) or which are close to standard proofs in optimal transport theory.

Proofs of Proposition 3.2 and Proposition 3.21

  • For \(\pi \in \mathrm {Adm}(\mu ,\nu )\) supported on \(E_\varOmega \), and for any compact sets \(K,\ K' \subset \varOmega \), one has \(\pi ((K \times {\overline{\varOmega }}) \cup ({\overline{\varOmega }}\times K')) \le \mu (K) + \nu (K') < \infty \). As any compact subset of \(E_\varOmega \) is included in a set of the form \((K \times {\overline{\varOmega }}) \cup ({\overline{\varOmega }}\times K')\), Proposition A.4 implies that \(\mathrm {Adm}(\mu ,\nu )\) is relatively compact for the vague convergence on \(E_\varOmega \). Also, if a sequence \((\pi _n)_n\) in \(\mathrm {Adm}(\mu ,\nu )\) converges vaguely to some \(\pi \in {\mathcal {M}}(E_\varOmega )\), then the marginals of \(\pi \) are still \(\mu \) and \(\nu \). Indeed, if f is a continuous function with compact support on \(\varOmega \), then

    $$\begin{aligned} \int _{E_{\varOmega }} f(x)\mathrm {d}\pi (x,y)= & {} \lim _n \int _{E_{\varOmega }} f(x)\mathrm {d}\pi _n(x,y) = \lim _n\int _{\varOmega } f(x)\mathrm {d}\mu _n(x) \\= & {} \int _{\varOmega } f(x)\mathrm {d}\mu (x), \end{aligned}$$

    and we show likewise that the second marginal of \(\pi \) is \(\nu \). Hence, \(\mathrm {Adm}(\mu ,\nu )\) is closed and relatively compact in \({\mathcal {M}}(E_\varOmega )\): it is therefore sequentially compact.

  • To prove the second point of Proposition 3.2, consider \(\pi , \pi _1, \pi _2, \dots \) such that \(\pi _n \xrightarrow {v}\pi \), and introduce \(\pi '_n : A \mapsto \iint _A d(x,y)^p \mathrm {d}\pi _n\). The sequence \((\pi '_n)_n\) still converges vaguely to \(\pi ' : A \mapsto \iint _{A} d(x,y)^p \mathrm {d}\pi \). the Portmanteau theorem (Proposition A.7) applied with the open set \(E_\varOmega \) to the measures \(\pi '_n \xrightarrow {v}\pi '\) implies that

    $$\begin{aligned} C_p(\pi ) =\pi '(E_\varOmega ) \le \liminf _n \pi '_n (E_\varOmega ) = \liminf _n C_p(\pi _n), \end{aligned}$$

    i.e. \(C_p\) is lower semi-continuous.

  • We now prove the lower semi-continuity of \(C_\infty \). Let \((\pi _n)_n\) be a sequence converging vaguely to \(\pi \) on \(E_\varOmega \) and let \(\displaystyle r >\liminf _{n\rightarrow \infty } C_\infty (\pi _n)\). The set \(U_r = \{(x,y) \in E_\varOmega , \ d(x,y) > r \}\) is open. By the Portmanteau theorem (Proposition A.7), we have

    $$\begin{aligned} 0=\liminf _{n\rightarrow \infty } \pi _n(U_r) \ge \pi (U_r). \end{aligned}$$

    Therefore, \(\mathrm {spt}(\pi ) \subset U_r^c\) and \(C_\infty (\pi ) \le r\). As this holds for any \(\displaystyle r>\liminf _{n\rightarrow \infty } C_\infty (\pi _n)\), we have \(\displaystyle \liminf _{n\rightarrow \infty } C_\infty (\pi _n) \ge C_\infty (\pi )\).

  • We show that for any \(1\le p \le \infty \), the lower semi-continuity of \(C_p\) and the sequential compactness of \(\mathrm {Adm}(\mu ,\nu )\) imply that 1. \(\mathrm {Opt}_p(\mu ,\nu )\) is a non-empty compact set for the vague topology on \(E_\varOmega \) and that 2. \(\mathrm {OT}_p\) is lower semi-continuous.

    1. 1.

      Let \((\pi _n)_n\) be a minimizing sequence of (2.3) or (3.10) in \(\mathrm {Adm}(\mu ,\nu )\). As \( \mathrm {Adm}(\mu ,\nu )\) is sequentially compact, it has an adherence value \(\pi \), and the lower semi-continuity implies that \(C_p(\pi ) \le \liminf _{n\rightarrow \infty } C_p(\pi _n) = \mathrm {OT}_p^p(\mu ,\nu )\), so that \(\mathrm {Opt}_p(\mu ,\nu )\) is non-empty. Using once again the lower semi-continuity of \(C_p\), if a sequence in \(\mathrm {Opt}_p(\mu ,\nu )\) converges to some limit, then the cost of the limit is less than or equal to (and thus equal to) \(\mathrm {OT}_p^p(\mu ,\nu )\), i.e. the limit is in \(\mathrm {Opt}_p(\mu ,\nu )\). The set \(\mathrm {Opt}_p(\mu ,\nu )\) being closed in the sequentially compact set \(\mathrm {Adm}(\mu ,\nu )\), it is also sequentially compact.

    2. 2.

      Let \(\mu _n \xrightarrow {v}\mu \) and \(\nu _n \xrightarrow {v}\nu \). One has \(\liminf _n \mathrm {OT}_p(\mu _n,\nu _n)=\lim _k \mathrm {OT}_p(\mu _{n_k},\nu _{n_k})\) for some subsequence \((n_k)_k\). For ease of notation, we will still use the index n to denote this subsequence. If the limit is infinite, there is nothing to prove. Otherwise, consider \(\pi _n \in \mathrm {Opt}_p(\mu _n,\nu _n)\). For any compact sets \(K,\ K' \subset \varOmega \), one has \(\pi _n((K \times {\overline{\varOmega }}) \cup ({\overline{\varOmega }}\times K')) \le \sup _n \mu _n(K) + \sup _n \nu _n(K') < \infty \). Therefore, by Proposition A.4, there exists a subsequence \((\pi _{n_k})_k\) which converges vaguely to some measure \(\pi \in \mathrm {Adm}(\mu ,\nu )\). Note that the first (resp. second) marginal of \(\pi \) is equal to the limit \(\mu \) (resp. \(\nu \)) of the first (resp. second) marginal of \((\pi _{n_k})\), so that \(\pi \) is in \(\mathrm {Adm}(\mu ,\nu )\). Therefore,

      $$\begin{aligned} \mathrm {OT}_p^p(\mu ,\nu ) \le C_p(\pi ) \le \liminf _{n\rightarrow \infty } C_p(\pi _n) =\liminf _{n\rightarrow \infty } \mathrm {OT}_p^p(\mu _n,\nu _n). \end{aligned}$$
  • Finally, we prove that \(\mathrm {OT}_p\) is a metric on \({\mathcal {M}}^p\). Let \(\mu ,\nu ,\lambda \in {\mathcal {M}}^p\). The symmetry of \(\mathrm {OT}_p\) is clear. If \(\mathrm {OT}_p(\mu ,\nu ) = 0\), then there exists \(\pi \in \mathrm {Adm}(\mu ,\nu )\) supported on \(\{(x,x),\ x\in \varOmega \}\). Therefore, for a Borel set \(A \subset \varOmega \), \(\mu (A) = \pi (A \times {\overline{\varOmega }}) = \pi (A \times A)=\pi ({\overline{\varOmega }}\times A)=\nu (A)\), and \(\mu = \nu \). To prove the triangle inequality, we need a variant on the gluing lemma, stated in Figalli and Gigli (2010, Lemma 2.1): for \(\pi _{12} \in \mathrm {Opt}(\mu ,\nu )\) and \(\pi _{23} \in \mathrm {Opt}(\nu ,\lambda )\) there exists a measure \(\gamma \in {\mathcal {M}}({\overline{\varOmega }}^3)\) such that the marginal corresponding to the first two entries (resp. two last entries), when restricted to \(E_\varOmega \), is equal to \(\pi _{12}\) (resp. \(\pi _{23}\)), and induces a zero cost on \({\partial \varOmega }\times {\partial \varOmega }\). Therefore, by the triangle inequality and the Minkowski inequality,

    $$\begin{aligned} \mathrm {OT}_p(\mu ,\lambda )&\le \left( \int _{{\overline{\varOmega }}^2} d(x,z)^p\mathrm {d}\gamma (x,y,z) \right) ^{1/p} \\&\le \left( \int _{{\overline{\varOmega }}^2} d(x,y)^p\mathrm {d}\gamma (x,y,z) \right) ^{1/p} + \left( \int _{{\overline{\varOmega }}^2} d(y,z)^p\mathrm {d}\gamma (x,y,z) \right) ^{1/p} \\&= \left( \int _{{\overline{\varOmega }}^2} d(x,y)^p\mathrm {d}\pi _{12}(x,y) \right) ^{1/p} + \left( \int _{{\overline{\varOmega }}^2} d(y,z)^p\mathrm {d}\pi _{23}(y,z) \right) ^{1/p} \\&= \mathrm {OT}_p(\mu ,\nu ) + \mathrm {OT}_p(\nu ,\lambda ). \end{aligned}$$

    The proof is similar for \(p= \infty \).

\(\square \)

Proof of Proposition 3.6

We first show the separability. Consider for \(k>0\) a partition of \(\varOmega \) into squares \((C_i^k)\) of side length \(2^{-k}\), centered at points \(x_i^k\). Let F be the set of all measures of the form \(\sum _{i\in I} q_i \delta _{x_i^k}\) for \(q_i\) positive rationals, \(k>0\) and I a finite subset of \({\mathbb {N}}\). Our goal is to show that the countable set F is dense in \({\mathcal {M}}^p\). Fix \(\varepsilon > 0\), and \(\mu \in {\mathcal {M}}^p\). The proof is in three steps.

  1. 1.

    Since \(\mathrm {Pers}_p(\mu ) < \infty \), there exists a compact \(K \subset \varOmega \) such that \(\mathrm {Pers}_p(\mu ) - \mathrm {Pers}_p(\mu _0) < \varepsilon ^p\), where \(\mu _0\) is the restriction of \(\mu \) to K. By considering the transport plan between \(\mu \) and \(\mu _0\) induced by the identity map on K and the projection onto the diagonal on \({\overline{\varOmega }}\backslash K\), it follows that \(\mathrm {OT}_p^p(\mu ,\mu _0) \le \mathrm {Pers}_p(\mu ) - \mathrm {Pers}_p(\mu _0) \le \varepsilon ^p\).

  2. 2.

    Consider k such that \(2^{-k} \le \varepsilon / (\sqrt{2}\mu (K)^{1/p})\) and denote by I the indices corresponding to squares \(C_i^k\) intersecting K. Let \(\mu _1 = \sum _{i\in I}^\infty \mu _0(C_i^k) \delta _{x_i^k}\). One can create a transport map between \(\mu _0\) and \(\mu _1\) by mapping each square \(C_i^k\) to its center \(x_i^k\), so that

    $$\begin{aligned} \mathrm {OT}_p(\mu _0,\mu _1) \le \left( \sum _{i} \mu _0(C_i^k) (\sqrt{2}\cdot 2^{-k})^p \right) ^{1/p} \le \mu (K)^{1/p} \sqrt{2}\cdot 2^{-k} \le \varepsilon . \end{aligned}$$
  3. 3.

    Consider, for \(i \in I\), \(q_i\) a rational number satisfying \(q_i \le \mu _0(C_i^k)\) and \(|\mu _0(C_i^k) - q_i| \le \varepsilon ^p/\left( \sum _{i\in I} d(x_i^k,{\partial \varOmega })^p \right) \). Let \(\mu _2 = \sum _{i\in I} q_i\delta _{x_i^k}\). Consider the transport plan between \(\mu _2\) and \(\mu _1\) that fully transports \(\mu _2\) onto \(\mu _1\), and transport the remaining mass in \(\mu _1\) onto the diagonal. Then,

    $$\begin{aligned} \mathrm {OT}_p(\mu _1,\mu _2) \le \left( \sum _{i\in I} |\mu _0(C_i^k) - q_i| d(x_i^k,{\partial \varOmega })^p \right) ^{1/p} \le \varepsilon . \end{aligned}$$

As \(\mu _2 \in F\) and \(\mathrm {OT}_p(\mu ,\mu _2) \le 3 \varepsilon \), the separability is proven.

To prove that the space is complete, consider a Cauchy sequence \((\mu _n)_n\). As the sequence \((\mathrm {Pers}_p(\mu _n))_n = (\mathrm {OT}_p^p(\mu _n,0))_n\) is a Cauchy sequence, it is bounded. Therefore, for \(K\subset \varOmega \) a compact set, (3.1) implies that \(\sup _n\mu _n(K)<\infty \). Proposition A.4 implies that \((\mu _n)_n\) is relatively compact for the vague topology on \(\varOmega \). Consider \((\mu _{n_k})_k\) a subsequence converging vaguely on \(\varOmega \) to some measure \(\mu \). By the lower semi-continuity of \(\mathrm {OT}_p\),

$$\begin{aligned} \mathrm {Pers}_p(\mu ) = \mathrm {OT}_p^p(\mu ,0) \le \liminf _{k \rightarrow \infty } \mathrm {OT}_p^p(\mu _{n_k},0) < \infty , \end{aligned}$$

so that \(\mu \in {\mathcal {M}}^p\). Using once again the lower semi-continuity of \(\mathrm {OT}_p\),

$$\begin{aligned} \mathrm {OT}_p(\mu _n,\mu )&\le \liminf _{k \rightarrow \infty } \mathrm {OT}_p(\mu _n,\mu _{n_k}) \\ \lim _{n\rightarrow \infty } \mathrm {OT}_p(\mu _n,\mu )&\le \lim _{n\rightarrow \infty } \liminf _{k \rightarrow \infty } \mathrm {OT}_p(\mu _n,\mu _{n_k})=0, \end{aligned}$$

ensuring that \(\mathrm {OT}_p(\mu _n, \mu ) \rightarrow 0\), that is the space is complete. \(\square \)

Proof of the direct implication of Theorem 3.7

Let \(\mu ,\mu _1,\mu _2,\dots \) be elements of \({\mathcal {M}}^p\) and assume that the sequence \((\mathrm {OT}_p(\mu _n,\mu ))_n\) converges to 0. The triangle inequality implies that \(\mathrm {Pers}_p(\mu _n)=\mathrm {OT}_p^p(\mu _n,0)\) converges to \(\mathrm {Pers}_p(\mu )=\mathrm {OT}_p^p(\mu ,0)\). Let \(f\in C_c(\varOmega )\), whose support is included in some compact set K. For any \(\varepsilon >0\), there exists a Lipschitz function \(f_\varepsilon \), with Lipschitz constant L and whose support is included in K, with the \(\infty \)-norm \(\Vert f-f_\varepsilon \Vert _\infty \) less than or equal to \(\varepsilon \). The convergence of \(\mathrm {Pers}_p(\mu _n)\) and (3.1) imply that \(\sup _k \mu _k(K) < \infty \). Let \(\pi _n \in \mathrm {Opt}_p(\mu _n, \mu )\), we have

$$\begin{aligned} |\mu _n(f)-\mu (f)|&\le |\mu _n(f-f_\varepsilon )| + |\mu (f-f_\varepsilon )| + |\mu _n(f_\varepsilon )-\mu (f_\varepsilon )| \\&\le (\mu _n(K) + \mu (K))\varepsilon + |\mu _n(f_\varepsilon )-\mu (f_\varepsilon )| \\&\le (\sup _k \mu _k(K) + \mu (K))\varepsilon + |\mu _n(f_\varepsilon )-\mu (f_\varepsilon )|. \end{aligned}$$

Also,

$$\begin{aligned} |\mu _n(f_\varepsilon )-\mu (f_\varepsilon )|&\le \iint _{{\overline{\varOmega }}^2} |f_\varepsilon (x)-f_\varepsilon (y)| \mathrm {d}\pi _n(x,y) \quad {\text {where}}\; \pi _n \in \mathrm {Opt}(\mu _n, \mu ) \\&\le L \iint \limits _{(K \times {\overline{\varOmega }}) \cup ({\overline{\varOmega }}\times K)} d(x,y)\mathrm {d}\pi _n(x,y) \\&\le L \pi _n((K \times {\overline{\varOmega }}) \cup ({\overline{\varOmega }}\times K))^{1- \frac{1}{p}}\\&\quad \left( \iint \limits _{(K \times {\overline{\varOmega }}) \cup ({\overline{\varOmega }}\times K)} d(x,y)^p \mathrm {d}\pi _n(x,y)\right) ^{\frac{1}{p}} \\&\quad {\text { by H}} \ddot{{\text {o}}}\text {lder's inequality.}\\&\le L \left( \sup _k \mu _k(K) + \mu (K)\right) ^{1- \frac{1}{p}} \mathrm {OT}_p(\mu _n,\mu )\xrightarrow [n\rightarrow \infty ]{} 0. \end{aligned}$$

Therefore, taking the limsup in n and then letting \(\varepsilon \) goes to 0, we obtain that \(\mu _n(f) \rightarrow \mu (f)\). \(\square \)

Proofs of the technical lemmas of Section 4

The following proof is already found in Le Gouic and Loubes (2016). We reproduce it here for the sake of completeness.

Proof of Lemma 4.6

Recall that \(P_n\) is a sequence in \({\mathcal {W}}^p({\mathcal {M}}^p)\) such that each \(P_n\) has a p-Fréchet mean \(\mu _n\) and that \(W_{p,\mathrm {OT}_p}(P_n,P) \rightarrow 0\) for some \(P\in {\mathcal {W}}^p({\mathcal {M}}^p)\). According to the beginning of the proof of Proposition 4.5, the sequence \((\mu _n)_n\) is relatively compact for the vague convergence. Let \(\nu \in {\mathcal {M}}^p\) and let \(\mu \) be the vague limit of some subsequence, which, for ease of notations, will be denoted as the initial sequence. By Skorokhod’s representation theorem Billingsley (2013, Theorem 6.7), as \(P_n\) converges weakly to \(P\), there exists a probabilistic space on which are defined random variables \({\varvec{\mu }}\sim P\) and \({\varvec{\mu _n}}\sim P_n\) for \(n\ge 0\), such that \({\varvec{\mu _n}}\) converges almost surely with respect to the \(\mathrm {OT}_p\) metric towards \({\varvec{\mu }}\). Using those random variables, we have

$$\begin{aligned} \begin{aligned} {\mathcal {E}}(\nu )&= {\mathbb {E}}\mathrm {OT}_p^p(\nu ,{\varvec{\mu }}) = W_{p,\mathrm {OT}_p}^p(\delta _{\nu },P) \\&= \lim _n W_{p,\mathrm {OT}_p}^p(\delta _{\nu },P_n)\quad {\text { since }}\; W_{p,\mathrm {OT}_p}(P_n,P)\rightarrow 0\\&=\lim _n {\mathbb {E}}\mathrm {OT}_p^p(\nu ,{\varvec{\mu _n}}) \\&\ge \lim _n {\mathbb {E}}\mathrm {OT}_p^p(\mu _n,{\varvec{\mu _n}})\quad {\text { since }}\; \mu _n \; {\text { is a barycenter of }}\; P_n\\&\ge {\mathbb {E}}\liminf _n \mathrm {OT}_p^p(\mu _n,{\varvec{\mu _n}})\quad {\text { by Fatou's lemma}}\\&\ge {\mathbb {E}}\mathrm {OT}_p^p(\mu ,{\varvec{\mu }}) = {\mathcal {E}}(\mu ) \quad {\text { by lower semi-continuity of }} \; \mathrm {OT}_p{\text { (Prop.}}~3.2). \end{aligned} \end{aligned}$$
(C.1)

This implies that \(\mu \) is a barycenter of \(P\). We are now going to show that, almost surely, \(\liminf _n \mathrm {OT}_p(\mu _n,{\varvec{\mu }})=\mathrm {OT}_p(\mu ,{\varvec{\mu }})\). This concludes the proof by letting \(n_k\) be the subsequence attaining the liminf for some fixed realization of \({\varvec{\mu }}\). By plugging in \(\nu =\mu \) in (C.1), all the inequalities become equalities, and in particular,

$$\begin{aligned} \lim _n W_{p,\mathrm {OT}_p}^p(\delta _{\mu _n},P_n)= \lim _n {\mathbb {E}}\mathrm {OT}_p^p(\mu _n,{\varvec{\mu _n}})={\mathbb {E}}\mathrm {OT}_p^p(\mu ,{\varvec{\mu }})=W_{p,\mathrm {OT}_p}^p(\delta _{\mu },P). \end{aligned}$$

This yields

$$\begin{aligned}&0\le W_{p,\mathrm {OT}_p}(\delta _{\mu _n},P)-W_{p,\mathrm {OT}_p}(\delta _{\mu },P)\\&\quad \le W_{p,\mathrm {OT}_p}(\delta _{\mu _n},P_n) + W_{p,\mathrm {OT}_p}(P_n,P) -W_{p,\mathrm {OT}_p}(\delta _{\mu },P)\rightarrow 0 \end{aligned}$$

as n goes to \(+\infty \), i.e. \(\lim _n W_{p,\mathrm {OT}_p}(\delta _{\mu _n},P)=W_{p,\mathrm {OT}_p}(\delta _{\mu },P)\). Therefore,

$$\begin{aligned} {\mathbb {E}}\mathrm {OT}_p^p(\mu ,{\varvec{\mu }})&= W_{p,\mathrm {OT}_p}^p(\delta _{\mu },P)=\lim _n W_{p,\mathrm {OT}_p}^p(\delta _{\mu _n},P)=\lim _n {\mathbb {E}}\mathrm {OT}_p^p(\mu _n,{\varvec{\mu }})\\&\ge {\mathbb {E}}\liminf _n \mathrm {OT}_p^p(\mu _n,{\varvec{\mu }})\quad {\text { by Fatou's lemma}}\\&\ge {\mathbb {E}}\mathrm {OT}_p^p(\mu ,{\varvec{\mu }})\quad {\text { by lower semi-continuity of }}\; \mathrm {OT}_p. \end{aligned}$$

As \(\liminf _n \mathrm {OT}_p^p(\mu _n,{\varvec{\mu }})\ge \mathrm {OT}_p^p(\mu ,{\varvec{\mu }})\) and \({\mathbb {E}}\liminf _n \mathrm {OT}_p^p(\mu _n,{\varvec{\mu }})={\mathbb {E}}\mathrm {OT}_p^p(\mu ,{\varvec{\mu }})\), we actually have \(\liminf _n \mathrm {OT}_p^p(\mu _n,{\varvec{\mu }})= \mathrm {OT}_p^p(\mu ,{\varvec{\mu }})\) almost surely, concluding the proof. \(\square \)

Fig. 7
figure 7

Partition of \(\varOmega \) used in the proof of Lemma 4.7

Proof of Lemma 4.7

For the direct implication, take \(\nu =0\) and apply Theorem 3.7.

Let us prove the converse implication. Assume that \(\mu _n \xrightarrow {v}\mu \) and \(\mathrm {OT}_p(\mu _n,\nu ) \rightarrow \mathrm {OT}_p(\mu ,\nu )\) for some \(\nu \in {\mathcal {D}}^p\). The vague convergence of \((\mu _n)_n\) implies that \(\mu ^{(p)}\) is the only possible accumulation point for weak convergence of the sequence \((\mu ^{(p)}_n)_n\). Therefore, it is sufficient to show that the sequence \((\mu ^{(p)}_n)_n\) is relatively compact for weak convergence (i.e. tight and bounded in total variation, see Proposition A.5). Indeed, this would mean that \((\mu ^{(p)}_n)\) converges weakly to \(\mu ^{(p)}\), or equivalently by Proposition A.6 that \(\mu _n \xrightarrow {v}\mu \) and \(\mathrm {Pers}_p(\mu _n) \rightarrow \mathrm {Pers}_p(\mu )\). The conclusion is then obtained thanks to Theorem 3.7.

Thus, let \((\mu _n)_n\) be any subsequence and \((\pi _n)_n\) be corresponding optimal transport plans between \(\mu _n\) and \(\nu \). The vague convergence of \((\mu _n)_n\) implies that \((\pi _n)_n\) is relatively compact with respect to the vague convergence on \(E_\varOmega \). Let \(\pi \) be a limit of any converging subsequence of \((\pi _n)_n\), which indexes are still denoted by n. One can prove that \(\pi \in \mathrm {Opt}(\mu ,\nu )\) (see Figalli and Gigli 2010, Prop. 2.3). For \(r>0\), define \(A_r :=\{x \in \varOmega ,\ d(x,{\partial \varOmega })\le r\}\) and write \({\overline{A}}_r\) for \(A_r \cup {\partial \varOmega }\). Consider \(\eta >1\). We can write

$$\begin{aligned}&\int _{A_r} d(x, {\partial \varOmega })^p \mathrm {d}\mu _n(x) \\&\quad = \iint \limits _{A_r \times {\overline{\varOmega }}} d(x, {\partial \varOmega })^p \mathrm {d}\pi _n(x,y) \\&\quad = \iint \limits _{A_r \times (\varOmega \backslash A_{\eta r})} d(x, {\partial \varOmega })^p \mathrm {d}\pi _n(x,y) + \iint \limits _{{{\overline{A}}}_r\times {\overline{A}}_{\eta r}} d(x,{\partial \varOmega })^p \mathrm {d}\pi _n(x,y) \\&\quad {\mathop {\le }\limits ^{(*)}} \frac{1}{(\eta -1)^p} \iint \limits _{A_r \times (\varOmega \backslash A_{\eta r})} d(x,y)^p \mathrm {d}\pi _n(x,y) + \iint \limits _{{{\overline{A}}}_r\times {\overline{A}}_{\eta r}} d(x,{\partial \varOmega })^p \mathrm {d}\pi _n(x,y) \\&\quad \le \frac{1}{(\eta -1)^p} \mathrm {OT}_p^p(\mu _n,\nu ) \\&\qquad + 2^{p-1}\Bigg (\iint \limits _{{\overline{A}}_r\times {\overline{A}}_{\eta r}} d(x,y)^p \mathrm {d}\pi _n(x,y) + \iint \limits _{{\overline{A}}_r\times {\overline{A}}_{\eta r}} d(y,{\partial \varOmega })^p \mathrm {d}\pi _n(x,y)\Bigg ) \\&\quad \le \frac{1}{(\eta -1)^p} \mathrm {OT}_p^p(\mu _n,\nu ) \\&\qquad + 2^{p-1}\Bigg ( \mathrm {OT}_p^p(\mu _n,\nu ) - \iint \limits _{E_\varOmega \backslash ({\overline{A}}_r\times {\overline{A}}_{\eta r})} d(x,y)^p \mathrm {d}\pi _n(x,y) + \int _{ A_{\eta r}} d(y,{\partial \varOmega })^p \mathrm {d}\nu (y)\Bigg ) \end{aligned}$$

where \((*)\) holds because \(d(x,y)\ge (\eta -1) r \ge (\eta -1) d(x,{\partial \varOmega })\) for \((x,y)\in A_r \times A_{\eta r}^c\). Therefore,

$$\begin{aligned}&\limsup _{n \rightarrow \infty } \int _{A_r} d(x,{\partial \varOmega })^p \mathrm {d}\mu _n(x) \\&\quad \le \frac{1}{(\eta -1)^p} \mathrm {OT}_p^p(\mu ,\nu )+ 2^{p-1}\Bigg (\mathrm {OT}_p^p(\mu ,\nu ) \\&\qquad - \iint \limits _{E_\varOmega \backslash ({\overline{A}}_r\times {\overline{A}}_{\eta r})} d(x,y)^p \mathrm {d}\pi (x,y) + \int _{ A_{\eta r}} d(y,{\partial \varOmega })^p \mathrm {d}\nu (y)\Bigg ) \end{aligned}$$

Note that at the last line, we used the Portmanteau theorem (see Proposition A.7) on the sequence of measures \((d(x,y)^p \mathrm {d}\pi _n(x,y))_n\) for the open set \(E_\varOmega \backslash ({\overline{A}}_r\times {\overline{A}}_{\eta r})\). Letting r goes to 0, then \(\eta \) goes to infinity, one obtains

$$\begin{aligned} \lim _{r \rightarrow 0} \limsup _{n \rightarrow \infty } \int _{A_r} d(x,{\partial \varOmega })^p \mathrm {d}\mu _n(x) = 0. \end{aligned}$$

The second part consists in showing that there can not be mass escaping “at infinity” in the subsequence \((\mu ^{(p)}_n)_n\). Fix \(r,M>0\). For \(x \in \varOmega \), denote s(x) the projection of x on \({\partial \varOmega }\). Pose

$$\begin{aligned} K_{M,r} :=\{x \in \varOmega \backslash A_r,\ d(x,{\partial \varOmega })< M, d(s(x),0) < M\} \end{aligned}$$

and \(L_{M,r}\) the closure of \(\varOmega \backslash (A_r\cup K_{M,r})\) (see Fig. 7). For \(r'>0\),

$$\begin{aligned} \int _{L_{M,r}} d(x,{\partial \varOmega })^p \mathrm {d}\mu _n(x)&= \iint \limits _{L_{M,r} \times {\overline{\varOmega }}} d(x,{\partial \varOmega })^p \mathrm {d}\pi _n(x,y) \\&= \iint \limits _{L_{M,r} \times (L_{M/2,r'}\cup {\overline{A}}_{r'})} d(x,{\partial \varOmega })^p \mathrm {d}\pi _n(x,y) \\&\quad + \iint \limits _{L_{M,r} \times K_{M/2,r'}} d(x,{\partial \varOmega })^p \mathrm {d}\pi _n(x,y) \\&\le 2^{p-1} \iint \limits _{L_{M,r} \times (L_{M/2,r'} \cup {\overline{A}}_{r'})} d(x,y)^p \mathrm {d}\pi _n(x,y) \\&\quad + 2^{p-1} \iint \limits _{L_{M,r} \times ( L_{M/2,r'}\cup {\overline{A}}_{r'})} d({\partial \varOmega }, y)^p \mathrm {d}\pi _n(x,y) \\&\quad + \iint \limits _{L_{M,r} \times K_{M/2,r'}} d(x,{\partial \varOmega })^p \mathrm {d}\pi _n(x,y). \end{aligned}$$

We treat the three parts of the sum separately. As before, taking the \(\limsup \) in n and letting M goes to \(\infty \), the first part of the sum converges to 0 (apply the Portmanteau theorem on the open set \(E_\varOmega \backslash (L_{M,r} \times (L_{M/2,r'} \cup {\overline{A}}_{r'}))\). The second part is less than or equal to

$$\begin{aligned} 2^{p-1} \int _{ L_{M/2,r'}\cup A_{r'}} d(y,{\partial \varOmega })^p \mathrm {d}\nu (y), \end{aligned}$$

which converges to 0 as \(M\rightarrow \infty \) and \(r'\rightarrow 0\). For the third part, notice that if \((x,y)\in L_{M,r} \times K_{M/2,r'}\), then

$$\begin{aligned} d(x,{\partial \varOmega }) \le d(x,s(y)) \le d(x,y) +d(y,s(y)) \le d(x,y) + \frac{M}{2} \le 2d(x,y). \end{aligned}$$

Therefore,

$$\begin{aligned} \iint \limits _{L_{M,r} \times K_{M/2,r'}} d(x,{\partial \varOmega })^p \mathrm {d}\pi _n(x,y)&\le 2^p \iint \limits _{L_{M,r} \times K_{M/2,r'}} d(x,y)^p \mathrm {d}\pi _n(x,y) \\&\le 2^p \iint \limits _{L_{M,r} \times {\overline{\varOmega }}} d(x,y)^p \mathrm {d}\pi _n(x,y). \end{aligned}$$

As before, it is shown that \(\limsup _n \iint _{L_{M,r} \times {\overline{\varOmega }}} d(x,y)^p \mathrm {d}\pi _n(x,y)\) converges to 0 when M goes to infinity by applying the Portmanteau theorem on the open set \(E_\varOmega \backslash (L_{M,r} \times {\overline{\varOmega }})\).

Finally, we have shown, that by taking r small enough and M large enough, one can find a compact set \(\overline{K_{M,r}}\) such that \(\int _{\varOmega \backslash \overline{K_{M,r}}} d(x,{\partial \varOmega })^p \mathrm {d}\mu _n = \mu ^{(p)}_n(\varOmega \backslash \overline{K_{M,r}})\) is uniformly small: \((\mu ^{(p)}_n)_n\) is tight. As we have

$$\begin{aligned} \mu ^{(p)}_n(\varOmega )&= \mathrm {Pers}_p(\mu _n) = \mathrm {OT}_p^p(\mu _n, 0) \\&\le (\mathrm {OT}_p(\mu _n, \nu ) + \mathrm {OT}_p(\nu , 0))^p \rightarrow (\mathrm {OT}_p(\mu , \nu ) + \mathrm {OT}_p(\nu ,0))^p, \end{aligned}$$

it is also bounded in total variation. Hence, \((\mu ^{(p)}_n)_n\) is relatively compact for the weak convergence: this concludes the proof. \(\square \)

Proof of Lemma 4.9

Let \(P= \sum _{i=1}^N \lambda _i \delta _{a_i}\) a probability distribution with \(a_i \in {\mathcal {D}}_f\) of mass \(m_i \in {\mathbb {N}}\), and define \(m_{\mathrm {tot}}= \sum _{i=1}^N m_i\). By Proposition 4.4, every p-Fréchet mean a of \(P\) is in correspondence with a p-Fréchet mean for the Wasserstein distance \({{\tilde{a}}}\) of \({\tilde{P}} = \sum _{i=1}^N \lambda _i \delta _{{\tilde{a}}_i}\), where \({\tilde{a}}_i = a_i + (m_{\mathrm {tot}}- m_i)\delta _{{\partial \varOmega }}\), with a being the restriction of \({\tilde{a}}\) to \(\varOmega \).

Let thus fix \(m \in {\mathbb {N}}\), and let \({\tilde{a}}_1,\dots ,{\tilde{a}}_N\) be point measures of mass m in \({\tilde{\varOmega }}\). Write \({\tilde{a}}_i =\sum _{j=1}^{m} \delta _{x_{i,j}}\), so that \(x_{i,j} \in {\tilde{\varOmega }}\) for \(1 \le i \le N,\ 1 \le j \le m\), with the \(x_{i,j}\)s non-necessarily distinct. Define

$$\begin{aligned} T: (x_1, \dots , x_N) \in {\tilde{\varOmega }}^N \mapsto {{\mathrm{arg\,min}}}\left\{ \sum _{i=1}^N \lambda _i \rho (x_i,y)^p,\ y\in {\tilde{\varOmega }} \right\} \in {\tilde{\varOmega }}. \end{aligned}$$
(C.2)

Since we assume \(p > 1\), T is well-defined and is continuous (the minimizer is unique by strict convexity). Using the localization property stated in Carlier et al. (2015, Section 2.2), we know that the support of a p-Fréchet mean of \({{\tilde{P}}}\) is included in the finite set

$$\begin{aligned} S :=\{ T(x_{1,j_1},\dots ,x_{N,j_N}),\ 1\le j_1 ,\dots , j_N \le m \}. \end{aligned}$$

Let \(K=m^N\) and let \(z_1,\dots ,z_K\) be an enumeration of the points of S (with potential repetitions). Denote by \(\mathrm {Gr}(z_k)\) the N elements \(x_1,\dots ,x_N\), with \(x_i \in \mathrm {spt}({\tilde{a}}_i)\), such that \(z_k = T(x_1,\dots ,x_N)\). It is explained in Carlier et al. (2015, §2.3), that finding a p-Fréchet mean of \({\tilde{P}}\) is equivalent to finding a minimizer of the problem

$$\begin{aligned} \inf _{(\gamma _1,\dots ,\gamma _N) \in \varPi } \sum _{i=1}^N \lambda _i \iint _{{\tilde{\varOmega }}^2} \rho (x_i,y)^p \mathrm {d}\gamma _i(x_i,y), \end{aligned}$$
(C.3)

where \(\varPi \) is the set of plans \((\gamma _i)_{i=1,\dots ,N}\), with \(\gamma _i\) having for first marginal \({\tilde{a}}_i\), and such that all \(\gamma _i\)s share the same (non-fixed) second marginal. Furthermore, we can assume without loss of generality that \((\gamma _1 \dots \gamma _N)\) is supported on \((\mathrm {Gr}(z_k), z_k)_k\), i.e. a point \(z_k\) in the p-Fréchet mean is necessary transported to its corresponding grouping \(\mathrm {Gr}(z_k)\) by (optimal) \(\gamma _1, \dots \gamma _N\) (Carlier et al. 2015, §2.3). For such a minimizer, the common second marginal is a p-Fréchet mean of \({\tilde{P}}\).

A potential minimizer of (C.3) is described by a vector \(\gamma = (\gamma _{i,j,k})\in {\mathbb {R}}_+^{NmK}\) such that:

$$\begin{aligned} {\left\{ \begin{array}{ll} \text {for}\ 1\le i \le N,\ 1\le j \le m, &{}\quad \sum _{k=1}^K \gamma _{i,j,k} = 1\quad {\text {and}}\\ \text {for}\ 2\le i \le N,\ 1\le k \le K, &{}\quad \sum _{j=1}^m \gamma _{1,j,k} = \sum _{j=1}^m \gamma _{i,j,k}. \end{array}\right. } \end{aligned}$$
(C.4)

Let \(c \in {\mathbb {R}}^{NmK}\) be the vector defined by \(c_{i,j,k} = {\mathbf {1}}\{x_{i,j} \in \mathrm {Gr}(z_k)\} \lambda _i \rho (x_{i,j},z_k)^p \). Then, the problem (C.3) is equivalent to

$$\begin{aligned} \mathop {{\mathrm{minimize}}}\limits _{\gamma \in R_+^{NmK}} \gamma ^T c \quad \text {under the constraints }(\mathrm{C}.4). \end{aligned}$$
(C.5)

The set of p-Fréchet means of P are in bijection with the set of minimizers of this Linear Programming problem (see Schrijver 2003, §5.15), which is given by a face of the polyhedron described by the equations (C.4). Hence, if we show that this polyhedron is integer (i.e. its vertices have integer values), then it would imply that the extreme points of the set of p-Fréchet means of P are point measures, concluding the proof. The constraints (C.4) are described by a matrix A of size \((Nm+(N-1)K) \times NmK\) and a vector \(b= [{\mathbf {1}}_{Nm},{\mathbf {0}}_{(N-1)K}]\), such that \(\gamma \in {\mathbb {R}}^{NmK}\) satisfies (C.4) if and only if \(A\gamma =b\). A sufficient condition for the polyhedron \(\{Ax\le b\}\) to be integer is to satisfy the following property (see Schrijver 2003, Section 5.17): for all \(u \in {\mathbb {Z}}^{NmK}\), the dual problem

$$\begin{aligned} \max \{y^T b,\ y\ge 0 \quad {\text {and}}\quad y^TA=u\} \end{aligned}$$
(C.6)

has either no solution (i.e. there is no \(y \ge 0\) satisfying \(y^T A = u\)), or it has an integer optimal solution y.

For y satisfying \(y^T A = u\), write \(y=[y^0,y^1]\) with \(y^0 \in {\mathbb {R}}^{Nm}\) and \(y^1 \in {\mathbb {R}}^{(N-1)K}\), so that \(y^0\) is indexed on \( 1\le i \le N,\ 1\le j \le m\) and \(y^1\) is indexed on \( 2\le i \le N,\ 1\le k \le K\). One can check that, for \( 2\le i \le N,\ 1\le j \le m,\ 1\le k \le K\):

$$\begin{aligned} u_{1,j,k} = y^0_{1,j} + \sum _{i'=2}^N y^1_{i',k} \quad {\text {and}} \quad u_{i,j,k} = y^0_{i,j} - y^1_{i,k}, \end{aligned}$$
(C.7)

so that,

$$\begin{aligned} y^T b&= \sum _{i=1}^N \sum _{j=1}^m y^0_{i,j} = \sum _{j=1}^m y^0_{1,j} + \sum _{i=2}^N \sum _{j=1}^m y^0_{i,j} \\&= \sum _{j=1}^m \left( u_{1,j,k} - \sum _{i=2}^N y^1_{i,k}\right) + \sum _{i=2}^N \sum _{j=1}^m (u_{i,j,k} + y^1_{i,k}) \\&= \sum _{i=1}^N \sum _{j=1}^m u_{i,j,k}. \end{aligned}$$

Therefore, the function \(y^Tb\) is constant on the set \(P :=\{y\ge 0,\ y^T A=u\}\), and any point of the set is an argmax. We need to check that if the set P is non-empty, then it contains a vector with integer coordinates: this would conclude the proof. A solution of the homogeneous equation \(y^T A=0\) satisfies \(y^0_{i,j}=y^1_{i,k} = \lambda _i\) for \(i\ge 2\) and \(y^0_{1,j} = -\sum _{i=2}^N y^1_{i,k} = -\sum _{i=2}^N \lambda _i\) and reciprocally, any choice of \(\lambda _i \in {\mathbb {R}}\) gives rise to a solution of the homogeneous equation. For a given u, one can verify that the set of solutions of \(y^TA=u\) is given, for \(\lambda _i \in {\mathbb {R}}\), by

$$\begin{aligned} {\left\{ \begin{array}{ll} y^0_{1,j} = \sum _{i=1}^N u_{i,j,k} - \sum _{i=2}^N \lambda _i \\ y^0_{i,j} = \lambda _i \quad {\text {for}}\quad i\ge 2,\\ y^1_{i,k} = -u_{i,j,k} + \lambda _i \quad {\text {for}}\quad i\ge 2. \end{array}\right. } \end{aligned}$$

Such a solution exists if and only if for all j, \(U_j :=\sum _{i=1}^N u_{i,j,k}\) does not depend on k and for \(i\ge 2\), \(U_{i,k} :=u_{i,j,k}\) does not depend on j. For such a vector u, P corresponds to the \(\lambda _i \ge 0\) with \(\lambda _i \ge \max _k U_{i,k}\) and \(U_j \ge \sum _{i=1}^N \lambda _i\). If this set is non empty, it contains as least the point corresponding to \(\lambda _i = \max \{0, \max _k U_{i,k}\}\), which is an integer: this point is integer valued, concluding the proof. \(\square \)

Technical details regarding Section 5.3

Write \({\mathcal {M}}_f\) for \({\mathcal {M}}_f(\varOmega )\) and define \({\mathcal {M}}_\pm \) the space of finite signed measures on \(\varOmega \), i.e. a measure \(\mu \in {\mathcal {M}}_\pm \) is written \(\mu _+ - \mu _-\) for two finite measures \(\mu _+,\mu _- \in {\mathcal {M}}_f\). The total variation distance \(|\cdot |\) is a norm on \({\mathcal {M}}_\pm \), and \(({\mathcal {M}}_\pm ,|\cdot |)\) is a Banach space. The Bochner integral Bochner (1933) is a generalization of the Lebesgue integral for functions taking their values in Banach space. We define the expected persistence measure of \(P\in {\mathcal {W}}^p({\mathcal {M}}^p)\) as the Bochner integral of some pushforward of \(P\). More precisely, recall the definition (3.5) of \(\mu ^{(p)}\) and define

$$\begin{aligned} F: ({\mathcal {M}}^p,\mathrm {OT}_p)&\rightarrow ({\mathcal {M}}_\pm , |\cdot |) \\ \mu&\mapsto \mu ^{(p)}. \end{aligned}$$

Note that F has an inverse G on \({\mathcal {M}}_f\), defined by \(G(\nu )(A) :=\int _A d(x,\varOmega )^{-p}\mathrm {d}\nu (x)\) for \(A\subset \varOmega \) a Borel set. Theorem 3.7 implies that G is a continuous function from \(({\mathcal {M}}_f, | \cdot |)\) to \(({\mathcal {M}}^p, \mathrm {OT}_p)\). In particular, as \({\mathcal {M}}_f\) and \({\mathcal {M}}^p\) are Polish spaces and G is injective, the map F is measurable (see Kechris 1995, Theorem 15.1). For \(P\in {\mathcal {W}}^p({\mathcal {M}}^p(\varOmega ))\), define for \({\varvec{\mu }} \sim P\), \({\mathbb {E}}[{\varvec{\mu }}]\) the linear expectation of \(P\) by

$$\begin{aligned} {\mathbb {E}}[{\varvec{\mu }}] :=G\left( \int \nu \mathrm {d}(F_\#P)(\nu ) \right) \in {\mathcal {M}}^p, \end{aligned}$$
(D.1)

where the integral is the Bochner integral on the Banach space \(({\mathcal {M}}_\pm ,|\cdot |)\) and \(F_\#P\) is the pushforward of \(P\) by F. It is straightforward to check that \({\mathbb {E}}[{\varvec{\mu }}]\) defined in that way satisfies the relation

$$\begin{aligned} \forall K \subset \varOmega \ \text {compact},\ {\mathbb {E}}[{\varvec{\mu }}](K) = {\mathbb {E}}[{\varvec{\mu }}(K)]. \end{aligned}$$

The proof of Proposition 5.5 consists in applying Jensen’s inequality in an infinite-dimensional setting. We first show that the function \(\mathrm {OT}_p^p\) is convex.

Lemma D.1

For \(1\le p < \infty \), the function \(\mathrm {OT}_p^p : {\mathcal {M}}^p \times {\mathcal {M}}^p \rightarrow {\mathbb {R}}\) is convex.

Proof

Fix \(\mu _1,\mu _2,\nu _1,\nu _2 \in {\mathcal {M}}^p\) and \(t\in [0,1]\). Our goal is to show that

$$\begin{aligned} \mathrm {OT}_p^p(t\mu _1+(1-t)\mu _2,t\nu _1 + (1-t)\nu _2) \le t\mathrm {OT}_p^p(\mu _1,\nu _1)+(1-t)\mathrm {OT}_p^p(\mu _2,\nu _2). \end{aligned}$$

Let \(\pi _{11} \in \mathrm {Opt}_p(\mu _1,\nu _1)\) and \(\pi _{22}\in \mathrm {Opt}_p(\mu _2,\nu _2)\). It is straightforward to check that \(\pi :=t\pi _{11} +(1-t)\pi _{22}\) is an admissible plan between \(t\mu _1+(1-t)\mu _2\) and \(t\nu _1 + (1-t)\nu _2\). The cost of this admissible plan is \(t\mathrm {OT}_p^p(\mu _1,\nu _1) +(1-t)\mathrm {OT}_p^p(\mu _2,\nu _2)\), which is therefore larger than \(\mathrm {OT}_p^p(t\mu _1+(1-t)\mu _2,t\nu _1 + (1-t)\nu _2)\). \(\square \)

We then use the following result, which is a particular case of Perlman (1974, Theorem 3.10).

Proposition D.2

Let \({\mathcal {X}}\) be a Hausdorff locally convex topological vector space and \(C\subset {\mathcal {X}}\) a closed convex set. Let Q be a probability measure on \({\mathcal {X}}\) endowed with its borelian \(\sigma \)-algebra, which is supported on C. Assume that \(\int \Vert x\Vert \mathrm {d}Q(x) <\infty \). Let \(f:C\rightarrow [0,\infty )\) be a continuous convex function with \(\int f(x) \mathrm {d}Q(x) < \infty \). Then

$$\begin{aligned} f\left( \int x \mathrm {d}Q(x) \right) \le \int f(x) \mathrm {d}Q(x). \end{aligned}$$

Let \({\mathcal {X}}={\mathcal {M}}_\pm \times {\mathcal {M}}_\pm \) which is a Banach space (endowed with the product norm), and thus in particular a Hausdorff locally convex topological vector space. Let \(C = {\mathcal {M}}_f \times {\mathcal {M}}_f\), which is convex and closed (closedness follows immediately from the definition of the total variation \(|\cdot |\)) and let \(f=\mathrm {OT}_p^p \circ (G,G) : {\mathcal {X}}\rightarrow {\mathbb {R}}\). The continuity of G implies that f is continuous and Lemma D.1 implies the convexity of f. Let P, \(P'\) be two probability measures in \({\mathcal {W}}^p({\mathcal {M}}^p)\) and \(\gamma \) be an optimal coupling between P and \(P'\). We let Q be the image measure of \(\gamma \) by (FF), so that

$$\begin{aligned} \int _{x\in {\mathcal {X}}} \Vert x\Vert \mathrm {d}Q(x)&= \int _{\mu ,\mu '\in {\mathcal {M}}^p} \max (|\mu ^{(p)}|,|(\mu ')^{(p)}|)\mathrm {d}\gamma (\mu ,\mu ') \\&\le \int _{\mu }\mathrm {Pers}_p(\mu )\mathrm {d}P(\mu ) + \int _{\mu '}\mathrm {Pers}_p(\mu ')\mathrm {d}P'(\mu ')<\infty \end{aligned}$$

and that

$$\begin{aligned} \int _{x\in {\mathcal {X}}} f(x)\mathrm {d}Q(x) = \int _{\mu ,\mu '\in {\mathcal {M}}^p} \mathrm {OT}_p^p(\mu ,\mu ')\mathrm {d}\gamma (\mu ,\mu ') = W_{p,\mathrm {OT}_p}^p(P,P') <\infty . \end{aligned}$$

Also, we have

$$\begin{aligned} \int x \mathrm {d}Q(x)&= \int _{\nu ,\nu '\in {\mathcal {M}}^p} (\nu ,\nu ')\mathrm {d}(F,F)_{\#}\gamma (\nu ,\nu ')\\&= \left( \int _{\nu \in {\mathcal {M}}^p} \nu \mathrm {d}F_{\#} P(\nu ), \int _{\nu '\in {\mathcal {M}}^p} \nu '\mathrm {d}F_{\#}P'(\nu ')\right) , \end{aligned}$$

so that by (D.1), \(f\left( \int x \mathrm {d}Q(x) \right) = \mathrm {OT}_p^p({\mathbb {E}}[{\varvec{\mu }}],{\mathbb {E}}[{\varvec{\mu '}}])\), where \({\varvec{\mu }}\sim P\) and \({\varvec{\mu }}' \sim P'\). Proposition 5.5 yields the conclusion.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Divol, V., Lacombe, T. Understanding the topology and the geometry of the space of persistence diagrams via optimal partial transport. J Appl. and Comput. Topology 5, 1–53 (2021). https://doi.org/10.1007/s41468-020-00061-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41468-020-00061-z

Keywords

Mathematics Subject Classification

Navigation