Abstract
Optimal transport is a notoriously difficult problem to solve numerically, with current approaches often remaining intractable for very large-scale applications such as those encountered in machine learning. Wasserstein barycenters—the problem of finding measures in-between given input measures in the optimal transport sense—are even more computationally demanding as it requires to solve an optimization problem involving optimal transport distances. By training a deep convolutional neural network, we improve by a factor of 80 the computational speed of Wasserstein barycenters over the fastest state-of-the-art approach on the GPU, resulting in milliseconds computational times on \(512\times 512\) regular grids. We show that our network, trained on Wasserstein barycenters of pairs of measures, generalizes well to the problem of finding Wasserstein barycenters of more than two measures. We demonstrate the efficiency of our approach for computing barycenters of sketches and transferring colors between multiple images.
Similar content being viewed by others
References
Amos, B., Xu, L., Kolter, J.Z.: Input convex neural networks. In: International Conference on Machine Learning, pp. 146–155 (2017)
Andoni, A., Indyk, P., Krauthgamer, R.: Earth mover distance over high-dimensional spaces. SODA 8, 343–352 (2008)
Andoni, A., Naor, A., Neiman, O.: Impossibility of sketching of the 3d transportation metric with quadratic cost. In: 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2016)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. Preprint at arXiv: 1701.07875 (2017)
Backhoff-Veraguas, J., Fontbona, J., Rios, G., Tobar, F.: Bayesian learning with wasserstein barycenters. Preprint at arXiv:1805.10833 (2018)
Benamou, J.D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative Bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), A1111–A1138 (2015)
Bigot, J., Gouet, R., Klein, T., López, A., et al.: Geodesic pca in the wasserstein space by convex pca. In: Annales de l’Institut Henri Poincaré, Probabilités et Statistiques. Institut Henri Poincaré, vol. 53, pp. 1–26 (2017)
Bonneel, N., van de Panne, M., Paris, S., Heidrich, W.: Displacement interpolation using Lagrangian mass transport. In: ACM Transactions on Graphics (SIGGRAPH ASIA 2011) vol 30(6) (2011)
Bonneel, N., Rabin, J., Peyré, G., Pfister, H.: Sliced and radon wasserstein barycenters of measures. J. Math. Imaging Vis. 51(1), 22–45 (2015)
Bonneel, N., Peyré, G., Cuturi, M.: Wasserstein barycentric coordinates: histogram regression using optimal transport. ACM Trans. Gr. 35(4), 71 (2016)
Claici, S., Chien, E., Solomon, J.: Stochastic wasserstein barycenters. Preprint at arXiv:1802.05757 (2018)
Courty, N., Flamary, R., Tuia, D.: Domain adaptation with regularized optimal transport. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp. 274–289 (2014)
Courty, N., Flamary, R., Ducoffe, M.: Learning wasserstein embeddings. Preprint at arXiv:1710.07457 (2017)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems, pp. 2292–2300 (2013)
Cuturi, M., Doucet, A.: Fast computation of wasserstein barycenters. In: International Conference on Machine Learning, PMLR, pp. 685–693 (2014)
Dognin, P., Melnyk, I., Mroueh, Y., Ross, J., Santos, C.D., Sercu, T.: Wasserstein barycenter model ensembling. Preprint at arXiv:1902.04999 (2019)
Domazakis, G., Drivaliaris, D., Koukoulas, S., Papayiannis, G., Tsekrekos, A., Yannacopoulos, A.: Clustering measure-valued data with wasserstein barycenters. Preprint at arXiv:1912.11801 (2020)
Ehrlacher, V., Lombardi, D., Mula, O., Vialard, F.X.: Nonlinear model reduction on metric spaces. Application to one-dimensional conservative PDEs in Wasserstein spaces. ESAIM Math. Model. Numer. Anal. (2020). https://doi.org/10.1051/m2an/2020013
Fan, J., Taghvaei, A., Chen, Y.: Scalable computations of wasserstein barycenter via input convex neural networks. Preprint at arXiv:2007.04462 (2020)
Feydy, J.: Geometric loss functions between sampled measures, images and volumes. https://www.kernel-operations.io/geomloss/ (2019)
Feydy, J.: Geometric data analysis, beyond convolutions. Theses, Université Paris-Saclay, https://tel.archives-ouvertes.fr/tel-02945979 (2020)
Feydy, J., Séjourné, T., Vialard, F.X., Amari, S.I., Trouvé, A., Peyré, G.: Interpolating between optimal transport and mmd using sinkhorn divergences. Preprint at arXiv:1810.08278 (2018)
Feydy, J., Roussillon, P., Trouvé, A., Gori, P.: Fast and scalable optimal transport for brain tractograms. In: MICCAI 2019, Shenzhen, China, https://hal.telecom-paris.fr/hal-02264177 (2019a)
Feydy, J., Roussillon, P., Trouvé, A., Gori, P.: Fast and scalable optimal transport for brain tractograms. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, pp. 636–644 (2019b)
Frogner, C., Zhang, C., Mobahi, H., Araya-Polo, M., Poggio, T.: Learning with a wasserstein loss. Preprint at arXiv:1506.05439 (2015)
Frogner, C., Mirzazadeh, F., Solomon, J.: Learning embeddings into entropic wasserstein spaces. Preprint at arXiv:1905.03329 (2019)
Genevay, A., Peyré, G., Cuturi, M.: Learning generative models with sinkhorn divergences. Preprint at arXiv:1706.00292 (2017)
Google, I.: The quick, draw! dataset. https://github.com/googlecreativelab/quickdraw-dataset (2020)
Heitz, M., Bonneel, N., Coeurjolly, D., Cuturi, M., Peyré, G.: Ground metric learning on graphs. Preprint at arXiv:1911.03117 (2019)
Hunter, J.D.: Matplotlib: a 2d graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007). https://doi.org/10.1109/MCSE.2007.55
Janati, H., Cuturi, M., Gramfort, A.: Debiased sinkhorn barycenters. In: International Conference on Machine Learning, PMLR, pp. 4692–4701 (2020)
Kantorovich, L.: On the transfer of masses (in russian). Doklady Akademii Nauk 37, 227–229 (1942)
Korotin, A., Li, L., Solomon, J., Burnaev, E.: Continuous wasserstein-2 barycenter estimation without minimax optimization. Preprint at arXiv:2102.01752 (2021)
Lacombe, T., Cuturi, M., Oudot, S.: Large scale computation of means and clusters for persistence diagrams using optimal transport. In: Bengio S., Wallach H., Larochelle H., Grauman K., Cesa-Bianchi N., Garnett R. (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 31, (2018a) https://proceedings.neurips.cc/paper/2018/file/b58f7d184743106a8a66028b7a28937c-Paper.pdf
Lacombe, T., Cuturi, M., Oudot, S.: Large scale computation of means and clusters for persistence diagrams using optimal transport. Preprint at arXiv:1805.08331 (2018b)
Li, L., Genevay, A., Yurochkin, M., Solomon, J.: Continuous regularized wasserstein barycenters. Preprint at arXiv:2008.12534 (2020)
Liutkus, A., Simsekli, U., Majewski, S., Durmus, A,. Stöter, F.R.: Sliced-wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions. In: International Conference on Machine Learning, PMLR, pp. 4104–4113 (2019)
Loshchilov, I., Hutter, F.: Sgdr: stochastic gradient descent with warm restarts. Preprint at arXiv:1608.03983 (2016)
McInnes, L., Healy, J., Melville, J.: Umap: uniform manifold approximation and projection for dimension reduction. Preprint at arXiv:1802.03426 (2018)
Mérigot, Q., Delalande, A., Chazal, F.: Quantitative stability of optimal transport maps and linearization of the 2-wasserstein space. Proc. Mach. Learn. Res. 108, 3186–3196 (2020)
Metelli, A.M., Likmeta, A., Restelli, M.: Propagating uncertainty in reinforcement learning via wasserstein barycenters. In: Advances in Neural Information Processing Systems, pp. 4333–4345 (2019)
Mi, L., Zhang, W., Gu, X., Wang, Y.: Variational Wasserstein clustering. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 322–337 (2018)
Moosmüller, C., Cloninger, A.: Linear optimal transport embedding: provable fast wasserstein distance computation and classification for nonlinear problems. Preprint at arXiv: 2008.09165 (2020)
Nader, G., Guennebaud, G.: Instant transport maps on 2d grids. ACM Trans. Graph. 37(6), 13 (2018)
Nane, S., Nayar, S., Murase, H.: Columbia Object Image Library: Coil-20. Columbia University, New York (1996)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach H., Larochelle H., Beygelzimer A., d’ Alché-Buc F., Fox E., Garnett R. (eds) Advances in Neural Information Processing Systems 32, Curran Associates, Inc., pp. 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf (2019)
Peyré, G., Cuturi, M., et al.: Computational optimal transport. Found. Trends ® Mach. Learn. 11(5–6), 355–607 (2019)
Rabin, J., Delon, J., Gousseau, Y.: Removing artefacts from color and contrast modifications. IEEE Trans. Image Process. 20(11), 3073–3085 (2011a)
Rabin, J., Peyré, G., Delon, J., Bernot, M.: Wasserstein barycenter and its application to texture mixing. In: International Conference on Scale Space and Variational Methods in Computer Vision, Springer, pp. 435–446 (2011b)
Reinhard, E., Pouli, T.: Colour spaces for colour transfer. In: International Workshop on Computational Color Imaging, Springer, pp. 1–15 (2011)
Rolet, A., Cuturi, M., Peyré, G.: Fast dictionary learning with a smoothed wasserstein loss. In: Artificial Intelligence and Statistics, pp. 630–638 (2016)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, Springer, pp. 234–241 (2015)
Schmitz, M.A., Heitz, M., Bonneel, N., Mboula, F.M.N., Coeurjolly, D., Cuturi, M., Peyré, G., Starck, J.L.: Wasserstein dictionary learning: optimal transport-based unsupervised non-linear dictionary learning. SIAM J. Imaging Sci. 11(1), 643–678 (2018)
Schmitzer, B.: Stabilized sparse scaling algorithms for entropy regularized transport problems. SIAM J. Sci. Comput. 41(3), A1443–A1481 (2019)
Seguy, V., Cuturi, M.: Principal geodesic analysis for probability measures under the optimal transport metric. In: Cortes C., Lawrence N., Lee D., Sugiyama M., Garnett R. (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol. 28, (2015) https://proceedings.neurips.cc/paper/2015/file/f26dab9bf6a137c3b6782e562794c2f2-Paper.pdf
Solomon, J., De Goes, F., Peyré, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., Guibas, L.: Convolutional wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans. Graph. (TOG) 34(4), 1–11 (2015)
Srivastava, S., Cevher, V., Dinh, Q., Dunson, D.: Wasp: scalable bayes via barycenters of subset posteriors. In: Artificial Intelligence and Statistics, PMLR, pp. 912–920 (2015)
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization. Preprint at arXiv:1607.08022 (2016)
Wang, W., Slepčev, D., Basu, S., Ozolek, J.A., Rohde, G.K.: A linear optimal transportation framework for quantifying and visualizing variations in sets of images. Int. J. Comput. Vis. 101(2), 254–269 (2013)
Acknowledgements
This work was granted access to the HPC resources of IDRIS under the allocations 2020-AD011011538 and 2020-AD0110 12218 made by GENCI. We also thank the authors of all the images used in our color transfer figures.
Funding
Partial financial support was received from the ANR ROOT (RegressiOn with Optimal Transport): ANR-16-CE23-0009 and ANR AI chair OTTOPIA under reference ANR-20-CHIA-0030
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Code availability
Our implementation is publicly available at https://github.com/jlacombe/learning-to-generate-wasserstein-barycenters
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A Learning Strategy
Instead of using a fixed learning rate or a decreasing learning rate, we choose a learning rate schedule with warm restart as proposed by [38]. The learning schedule is shown in Fig. 12: the learning rate decreased and is periodically restarted to its initial value, the period increasing as the number of epochs grows. This schedule was chosen after comparing with both Adam and SGD with stepwise schedules, and yielded better convergence in practice.
B Test of Equivariance
Wasserstein barycenters are equivariant under rotation, translation and scaling. This amounts to \(Barycenter(\{T(\mu _i), \lambda _i\}_i) = T(Barycenter(\{\mu _i, \lambda _i\}_i))\) for T a rotation, translation or scaling. We verify this behavior qualitatively on the output of our network on two examples shown in Fig. 13.
C Additional Results
While the model presented in this paper uses 6 depth levels (following the convention of Fig. 1), we provide additional experiments showing the differences between our model with different number of depth levels in Fig. 14. Note that while the gap in performance is particularly important between the model with 4 depth levels (DL), 5DL and 6DL, there is much less difference between 6DL and 7DL.
We provide additional experiments showing barycenters of 5 sketches in Fig. 15. The weights evolve linearly inside the pentagon. As a stress test, we also show a barycenter of 100 cats with equal weights in Fig. 16 and compare it with a barycenter computed with GeomLoss. While both results recover more or less the global shape of the cat, details are clearly lost and our result looks much smoother.
In the debiasing approach of Janati et al. [31], a single iteration of Bregman projections to minimize the functional in the variable d is used in their paper. However, their available implementation uses 10 iterations. Figure 17 shows the (minor) difference in quality between these two approaches. Our comparisons were made against the original implementation.
Finally, we provide an additional color transfer experiment in Fig. 18 reproducing an experiment from [9] with our model trained with ContoursDS and HistoDS.
D Linearized Barycenters
Figure 19 shows the error introduced by using a linearized version of Wasserstein barycenters [40, 43, 44, 59]. Our predicted barycenters reflect this error.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lacombe, J., Digne, J., Courty, N. et al. Learning to Generate Wasserstein Barycenters. J Math Imaging Vis 65, 354–370 (2023). https://doi.org/10.1007/s10851-022-01121-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-022-01121-y