An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization

Hosseini, Reshad; Sra, Suvrit

doi:10.1007/s10107-019-01381-4

An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization

Full Length Paper
Series A
Published: 19 March 2019

Volume 181, pages 187–223, (2020)
Cite this article

Mathematical Programming Submit manuscript

1839 Accesses
22 Citations
Explore all metrics

Abstract

We consider maximum likelihood estimation for Gaussian Mixture Models (Gmm s). This task is almost invariably solved (in theory and practice) via the Expectation Maximization (EM) algorithm. EM owes its success to various factors, of which is its ability to fulfill positive definiteness constraints in closed form is of key importance. We propose an alternative to EM grounded in the Riemannian geometry of positive definite matrices, using which we cast Gmm parameter estimation as a Riemannian optimization problem. Surprisingly, such an out-of-the-box Riemannian formulation completely fails and proves much inferior to EM. This motivates us to take a closer look at the problem geometry, and derive a better formulation that is much more amenable to Riemannian optimization. We then develop Riemannian batch and stochastic gradient algorithms that outperform EM, often substantially. We provide a non-asymptotic convergence analysis for our stochastic method, which is also the first (to our knowledge) such global analysis for Riemannian stochastic gradient. Numerous empirical results are included to demonstrate the effectiveness of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Tutorial on PCA and approximate PCA and approximate kernel PCA

Article Open access 31 October 2022

A Guide for Sparse PCA: Model Comparison and Applications

Article Open access 29 June 2021

Notes

Not to be confused with “manifold learning” a separate problem altogether.
Available via https://archive.ics.uci.edu/ml/datasets.

References

Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)
MATH Google Scholar
Alvarez, F., Bolte, J., Brahic, O.: Hessian Riemannian gradient flows in convex programming. SIAM J. Control Optim. 43(2), 477–501 (2004)
Article MathSciNet Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1027–1035 (2007)
Balakrishnan, S., Wainwright, M.J., Yu, B.: Statistical guarantees for the EM algorithm: from population to sample-based analysis (2014). arXiv:1408.2156
Bhatia, R.: Positive Definite Matrices. Princeton University Press, Princeton (2007)
MATH Google Scholar
Bhojanapalli, S., Kyrillidis, A., Sanghavi, S.: Dropping convexity for faster semi-definite optimization. In: 29th Annual Conference on Learning Theory (COLT), pp. 530–582 (2016)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2007)
MATH Google Scholar
Bonnabel, S.: Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Autom. Control 58(9), 2217–2229 (2013)
Article MathSciNet Google Scholar
Boumal, N., Mishra, B., Absil, P.A., Sepulchre, R.: Manopt, a matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(1), 1455–1459 (2014)
MATH Google Scholar
Boumal, N., Absil, P.A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds (2016). arXiv:1605.08101v1
Burer, S., Monteiro, R.D., Zhang, Y.: Solving semidefinite programs via nonlinear programming. part I: transformations and derivatives. Tech. Rep. TR99-17, Department of Computational and Applied Mathematics, Rice University, Houston TX (1999)
Dasgupta, S.: Learning mixtures of Gaussians. In: 40th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 634–644 (1999)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2000)
MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, Berlin (2001)
MATH Google Scholar
Ge, R., Huang, Q., Kakade, S.M.: Learning mixtures of Gaussians in high dimensions (2015). arXiv:1503.00424
Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)
Article MathSciNet Google Scholar
Hiai, F., Petz, D.: Riemannian metrics on positive definite matrices related to means. Linear Algebra Appl. 430(11–12), 3105–3130 (2009)
Article MathSciNet Google Scholar
Hiai, F., Petz, D.: Riemannian metrics on positive definite matrices related to means. ii. Linear Algebra Appl. 436(7), 2117–2136 (2012)
Article MathSciNet Google Scholar
Hosseini, R., Sra, S.: Matrix manifold optimization for Gaussian mixtures. In: Advances in Neural Information Processing Systems (NIPS), vol. 28, pp. 910–918 (2015)
Jeuris, B., Vandebril, R., Vandereycken, B.: A survey and comparison of contemporary algorithms for computing the matrix geometric mean. Electron. Trans. Numer. Anal. 39, 379–402 (2012)
MathSciNet MATH Google Scholar
Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6(2), 181–214 (1994)
Article Google Scholar
Journée, M., Bach, F., Absil, P.A., Sepulchre, R.: Low-rank optimization on the cone of positive semidefinite matrices. SIAM J. Optim. 20(5), 2327–2351 (2010)
Article MathSciNet Google Scholar
Keener, R.W.: Theoretical Statistics. Springer Texts in Statistics. Springer, Berlin (2010)
Google Scholar
Lee, J.M.: Introduction to Smooth Manifolds. Springer, Berlin (2012)
Book Google Scholar
Ma, J., Xu, L., Jordan, M.I.: Asymptotic convergence rate of the EM algorithm for Gaussian mixtures. Neural Comput. 12(12), 2881–2907 (2000)
Article Google Scholar
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New Yrok (2000)
Book Google Scholar
Moitra, A., Valiant, G.: Settling the polynomial learnability of mixtures of Gaussians. In: 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 93–102 (2010)
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)
MATH Google Scholar
Naim, I., Gildea, D.: Convergence of the EM algorithm for Gaussian mixtures with unbalanced mixing coefficients. In: 29th International Conference on Machine Learning (ICML), pp. 1655–1662 (2012)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, Berlin (2006)
MATH Google Scholar
Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood, and the EM algorithm. SIAM Rev. 26, 195–239 (1984)
Article MathSciNet Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1–3), 19–41 (2000)
Article Google Scholar
Ridolfi, A., Idier, J., Mohammad-Djafari, A.: Penalized maximum likelihood estimation for univariate normal mixture distributions. In: Actes du \(17^e\) Colloque GRETSI, pp. 259–262 (1999)
Ring, W., Wirth, B.: Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optim. 22(2), 596–627 (2012)
Article MathSciNet Google Scholar
Salakhutdinov, R., Roweis, S.T., Ghahramani, Z.: Optimization with EM and expectation-conjugate-gradient. In: 20th International Conference on Machine Learning (ICML), pp. 672–679 (2003)
Sra, S., Hosseini, R.: Geometric optimisation on positive definite matrices for elliptically contoured distributions. In: Advances in Neural Information Processing Systems (NIPS), vol. 26, pp. 2562–2570 (2013)
Sra, S., Hosseini, R.: Conic geometric optimization on the manifold of positive definite matrices. SIAM J. Optim. 25(1), 713–739 (2015)
Article MathSciNet Google Scholar
Udrişte, C.: Convex Functions and Optimization Methods on Riemannian Manifolds. Kluwer Academic, Dordrecht (1994)
Book Google Scholar
Vanderbei, R.J., Benson, H.Y.: On formulating semidefinite programming problems as smooth convex nonlinear optimization problems. Tech. Rep. ORFE-99-01, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ (2000)
Vandereycken, B.: Low-rank matrix completion by Riemannian optimization. SIAM J. Optim. 23(2), 1214–1236 (2013)
Article MathSciNet Google Scholar
Wiesel, A.: Geodesic convexity and covariance estimation. IEEE Trans. Signal Process. 60(12), 6182–89 (2012)
Article MathSciNet Google Scholar
Wisdom, S., Powers, T., Hershey, J., Le Roux, J., Atlas, L.: Full-capacity unitary recurrent neural networks. In: Advances in Neural Information Processing Systems (NIPS), vol. 29, pp. 4880–4888 (2016)
Xu, L., Jordan, M.I.: On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput. 8, 129–151 (1996)
Article Google Scholar
Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. In: 29th Annual Conference on Learning Theory (COLT), pp 1617–1638 (2016)
Zhang, H., Reddi, S., Sra, S.: Riemannian SVRG: Fast stochastic optimization on Riemannian manifolds. In: Advances in Neural Information Processing Systems (NIPS), vol. 29, pp. 4592–4600 (2016)

Download references

Author information

Authors and Affiliations

School of ECE, College of Engineering, University of Tehran, Tehran, Iran
Reshad Hosseini
School of Computer Science, Institute of Research in Fundamental Sciences (IPM), Tehran, Iran
Reshad Hosseini
Massachusetts Institute of Technology, Cambridge, MA, USA
Suvrit Sra

Authors

Reshad Hosseini
View author publications
You can also search for this author in PubMed Google Scholar
Suvrit Sra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reshad Hosseini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

S. Sra was partially supported by NSF-IIS-1409802.

A preliminary version of this work appeared at the Advances in Neural Information Processing Systems (NIPS 2015), wherein this reformulation was originally introduced.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hosseini, R., Sra, S. An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization. Math. Program. 181, 187–223 (2020). https://doi.org/10.1007/s10107-019-01381-4

Download citation

Received: 09 June 2017
Accepted: 17 February 2019
Published: 19 March 2019
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10107-019-01381-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

Tutorial on PCA and approximate PCA and approximate kernel PCA

A Guide for Sparse PCA: Model Comparison and Applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

Tutorial on PCA and approximate PCA and approximate kernel PCA

A Guide for Sparse PCA: Model Comparison and Applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation