Abstract
We consider maximum likelihood estimation for Gaussian Mixture Models (Gmm s). This task is almost invariably solved (in theory and practice) via the Expectation Maximization (EM) algorithm. EM owes its success to various factors, of which is its ability to fulfill positive definiteness constraints in closed form is of key importance. We propose an alternative to EM grounded in the Riemannian geometry of positive definite matrices, using which we cast Gmm parameter estimation as a Riemannian optimization problem. Surprisingly, such an out-of-the-box Riemannian formulation completely fails and proves much inferior to EM. This motivates us to take a closer look at the problem geometry, and derive a better formulation that is much more amenable to Riemannian optimization. We then develop Riemannian batch and stochastic gradient algorithms that outperform EM, often substantially. We provide a non-asymptotic convergence analysis for our stochastic method, which is also the first (to our knowledge) such global analysis for Riemannian stochastic gradient. Numerous empirical results are included to demonstrate the effectiveness of our methods.
Similar content being viewed by others
Notes
Not to be confused with “manifold learning” a separate problem altogether.
Available via https://archive.ics.uci.edu/ml/datasets.
References
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)
Alvarez, F., Bolte, J., Brahic, O.: Hessian Riemannian gradient flows in convex programming. SIAM J. Control Optim. 43(2), 477–501 (2004)
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1027–1035 (2007)
Balakrishnan, S., Wainwright, M.J., Yu, B.: Statistical guarantees for the EM algorithm: from population to sample-based analysis (2014). arXiv:1408.2156
Bhatia, R.: Positive Definite Matrices. Princeton University Press, Princeton (2007)
Bhojanapalli, S., Kyrillidis, A., Sanghavi, S.: Dropping convexity for faster semi-definite optimization. In: 29th Annual Conference on Learning Theory (COLT), pp. 530–582 (2016)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2007)
Bonnabel, S.: Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Autom. Control 58(9), 2217–2229 (2013)
Boumal, N., Mishra, B., Absil, P.A., Sepulchre, R.: Manopt, a matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(1), 1455–1459 (2014)
Boumal, N., Absil, P.A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds (2016). arXiv:1605.08101v1
Burer, S., Monteiro, R.D., Zhang, Y.: Solving semidefinite programs via nonlinear programming. part I: transformations and derivatives. Tech. Rep. TR99-17, Department of Computational and Applied Mathematics, Rice University, Houston TX (1999)
Dasgupta, S.: Learning mixtures of Gaussians. In: 40th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 634–644 (1999)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2000)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, Berlin (2001)
Ge, R., Huang, Q., Kakade, S.M.: Learning mixtures of Gaussians in high dimensions (2015). arXiv:1503.00424
Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)
Hiai, F., Petz, D.: Riemannian metrics on positive definite matrices related to means. Linear Algebra Appl. 430(11–12), 3105–3130 (2009)
Hiai, F., Petz, D.: Riemannian metrics on positive definite matrices related to means. ii. Linear Algebra Appl. 436(7), 2117–2136 (2012)
Hosseini, R., Sra, S.: Matrix manifold optimization for Gaussian mixtures. In: Advances in Neural Information Processing Systems (NIPS), vol. 28, pp. 910–918 (2015)
Jeuris, B., Vandebril, R., Vandereycken, B.: A survey and comparison of contemporary algorithms for computing the matrix geometric mean. Electron. Trans. Numer. Anal. 39, 379–402 (2012)
Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6(2), 181–214 (1994)
Journée, M., Bach, F., Absil, P.A., Sepulchre, R.: Low-rank optimization on the cone of positive semidefinite matrices. SIAM J. Optim. 20(5), 2327–2351 (2010)
Keener, R.W.: Theoretical Statistics. Springer Texts in Statistics. Springer, Berlin (2010)
Lee, J.M.: Introduction to Smooth Manifolds. Springer, Berlin (2012)
Ma, J., Xu, L., Jordan, M.I.: Asymptotic convergence rate of the EM algorithm for Gaussian mixtures. Neural Comput. 12(12), 2881–2907 (2000)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New Yrok (2000)
Moitra, A., Valiant, G.: Settling the polynomial learnability of mixtures of Gaussians. In: 51st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 93–102 (2010)
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)
Naim, I., Gildea, D.: Convergence of the EM algorithm for Gaussian mixtures with unbalanced mixing coefficients. In: 29th International Conference on Machine Learning (ICML), pp. 1655–1662 (2012)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, Berlin (2006)
Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood, and the EM algorithm. SIAM Rev. 26, 195–239 (1984)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1–3), 19–41 (2000)
Ridolfi, A., Idier, J., Mohammad-Djafari, A.: Penalized maximum likelihood estimation for univariate normal mixture distributions. In: Actes du \(17^e\) Colloque GRETSI, pp. 259–262 (1999)
Ring, W., Wirth, B.: Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optim. 22(2), 596–627 (2012)
Salakhutdinov, R., Roweis, S.T., Ghahramani, Z.: Optimization with EM and expectation-conjugate-gradient. In: 20th International Conference on Machine Learning (ICML), pp. 672–679 (2003)
Sra, S., Hosseini, R.: Geometric optimisation on positive definite matrices for elliptically contoured distributions. In: Advances in Neural Information Processing Systems (NIPS), vol. 26, pp. 2562–2570 (2013)
Sra, S., Hosseini, R.: Conic geometric optimization on the manifold of positive definite matrices. SIAM J. Optim. 25(1), 713–739 (2015)
Udrişte, C.: Convex Functions and Optimization Methods on Riemannian Manifolds. Kluwer Academic, Dordrecht (1994)
Vanderbei, R.J., Benson, H.Y.: On formulating semidefinite programming problems as smooth convex nonlinear optimization problems. Tech. Rep. ORFE-99-01, Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ (2000)
Vandereycken, B.: Low-rank matrix completion by Riemannian optimization. SIAM J. Optim. 23(2), 1214–1236 (2013)
Wiesel, A.: Geodesic convexity and covariance estimation. IEEE Trans. Signal Process. 60(12), 6182–89 (2012)
Wisdom, S., Powers, T., Hershey, J., Le Roux, J., Atlas, L.: Full-capacity unitary recurrent neural networks. In: Advances in Neural Information Processing Systems (NIPS), vol. 29, pp. 4880–4888 (2016)
Xu, L., Jordan, M.I.: On convergence properties of the EM algorithm for Gaussian mixtures. Neural Comput. 8, 129–151 (1996)
Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. In: 29th Annual Conference on Learning Theory (COLT), pp 1617–1638 (2016)
Zhang, H., Reddi, S., Sra, S.: Riemannian SVRG: Fast stochastic optimization on Riemannian manifolds. In: Advances in Neural Information Processing Systems (NIPS), vol. 29, pp. 4592–4600 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
S. Sra was partially supported by NSF-IIS-1409802.
A preliminary version of this work appeared at the Advances in Neural Information Processing Systems (NIPS 2015), wherein this reformulation was originally introduced.
Rights and permissions
About this article
Cite this article
Hosseini, R., Sra, S. An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization. Math. Program. 181, 187–223 (2020). https://doi.org/10.1007/s10107-019-01381-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-019-01381-4
Keywords
- Stochastic optimization
- Riemannian optimization
- Gaussian mixture models
- Positive definite matrices
- Retraction
- Non-asymptotic rate of convergence