Coordinate descent algorithms

Wright, Stephen J.

doi:10.1007/s10107-015-0892-3

Coordinate descent algorithms

Full Length Paper
Series B
Published: 25 March 2015

Volume 151, pages 3–34, (2015)
Cite this article

Mathematical Programming Submit manuscript

Stephen J. Wright¹

13k Accesses
756 Citations
13 Altmetric
Explore all metrics

Abstract

Coordinate descent algorithms solve optimization problems by successively performing approximate minimization along coordinate directions or coordinate hyperplanes. They have been used in applications for many years, and their popularity continues to grow because of their usefulness in data analysis, machine learning, and other areas of current interest. This paper describes the fundamentals of the coordinate descent approach, together with variants and extensions and their convergence properties, mostly with reference to convex objectives. We pay particular attention to a certain problem structure that arises frequently in machine learning applications, showing that efficient implementations of accelerated coordinate descent algorithms are possible for problems of this type. We also present some parallel variants and discuss their convergence properties under several models of parallel execution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On optimal probabilities in stochastic coordinate descent methods

Article Open access 02 July 2015

A flexible coordinate descent method

Article 24 February 2018

Conjugate Gradients

References

Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Article MATH MathSciNet Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-threshold algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MATH MathSciNet Google Scholar
Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent methods. SIAM J. Optim. 23(4), 2037–2060 (2013)
Article MATH MathSciNet Google Scholar
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall Inc, Englewood Cliffs (1989)
MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. Ser. A 146, 1–36 (2014)
Article MathSciNet Google Scholar
Bouman, C.A., Sauer, K.: A unified approach to statistical tomography using coordinate descent optimization. IEEE Trans. Image Process. 5(3), 480–492 (1996)
Article Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction methods of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article Google Scholar
Bradley, J.K., Kyrola, A., Bickson, D., Guestrin, C.: Parallel coordunate descent for \(\ell _1\)-regularized loss minimization. In: Proceedings of the 28 International Conference on Machine Learning (ICML 2011) (2011)
Breheny, P., Huang, J.: Coordinate descent algroithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232–252 (2011)
Article MATH MathSciNet Google Scholar
Canutescu, A.A., Dunbrack, R.L.: Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Sci. 12(5), 963–972 (2003)
Article Google Scholar
Chang, K., Hsieh, C., Lin, C.: Coordinate descent method for large-scale l2-loss linear support vector machines. J. Mach. Learn. Res. 9, 1369–1398 (2008)
MATH MathSciNet Google Scholar
Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Article MATH MathSciNet Google Scholar
Eckstein, J., Yao, W.: Understanding the convergence of the alternating direction method of multipliers: theoretical and computational perspectives. Technical Report, RUTCOR, Rutgers University (2014)
Fercoq, O., Qu, Z., Richtarik, P., Takac, M.: Fast distributed coordinate descent for non-strongly convex losses (2014). arxiv:1405.5300
Fercoq, O., Richtarik, P.: Accelerated, parallel, and proximal coordinate descent. Technical Report, School of Mathematics, University of Edinburgh (2013). arXiv:1312.5799
Florian, M., Chen, Y.: A coordinate descent method for the bilevel O-D matrix adjustment problem. Int. Trans. Oper. Res. 2(2), 165–179 (1995)
MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
Article MATH Google Scholar
Friedman, J.H., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Google Scholar
Jaggi, M., Smith, V., Takác, M., Terhorst, J., Krishnan, S., Hoffman, T., Jordan, M.I.: Communication-efficient distributed dual coordinate ascent. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3068–3076. Curran Associates (2014)
Jain, P., Netrapalli, P., Sanghavi, S.: Low-rank matrix completion using alternating minimization. Technical Report (2012). arXiv:1212.0467
Kaczmarz, S.: Angenäherte auflösung von systemen linearer gleichungen. Bulletin International de l’Academie Polonaise des Sciences et des Lettres 35, 355–357 (1937)
Google Scholar
Lee, Y.T., Sidford, A.: Efficient accelerated coordinate descent methods and faster algorihtms for solving linear systems. In: 54th Annual Symposium on Foundations of Computer Science, pp. 147–156 (2013)
Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res. 35(3), 641–654 (2010)
Article MATH MathSciNet Google Scholar
Lin, Q., Lu, Z., Xiao, L.: An accelerated proximal coordinate gradient method and its application to empirical risk minimization. Technical Report, Microsoft Research (2014). arXiv:1407.1296
Liu, H., Palatucci, M., Zhang, J.: Lockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In: Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09, pp. 649–656. ACM, New York, NY, USA (2009)
Liu, J., Wright, S.J.: Asynchronous stochastic coordinate descent: parallelism and convergence properties. Technical Report, University of Wisconsin, Madison. (2014). (To appear in SIAM Journal on Optimization). arXiv:1403.3862
Liu, J., Wright, S.J., Ré, C., Bittorf, V., Sridhar, S.: An asynchronous parallel stochastic coordinate descent algorithm. Technical Report, Computer Sciences Department, University of Wisconsin-Madison (2013). (To appear in Journal of Machine Learning Research). arXiv:1311.1873
Liu, J., Wright, S.J., Sridhar, S.: An accelerated randomized Kaczmarz algorithm. Technical Report, Computer Sciences Department, University of Wisconsin-Madison (2013). (To appear in Mathematics of Computation). arXiv 1310.2887
Luo, Z.Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 7–35 (1992)
Article MATH MathSciNet Google Scholar
Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993)
Article MathSciNet Google Scholar
Marecek, J., Richtarik, P., Takac, M.: Distributed block coordinate descent for minimizing partially separable functions. Technical Report arXiv:1406.0238 (2014)
Mazumder, R., Friedman, J.H., Hastie, T.: SparseNet: coordinate descent with nonconvex penalties. J. Am. Stat. Assoc. 106, 1125–1138 (2011)
Article MATH MathSciNet Google Scholar
Necoara, I., Clipici, D.: Distributed random coordinate descent method for composite minimization. Technical Report 1–41, University Politehnica Bucharest (2013)
Nesterov, Y.: A method for unconstrained convex problem with the rate of convergence \(O(1/k^2)\). Doklady AN SSSR 269, 543–547 (1983)
MathSciNet Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Springer, New York (2004)
Book Google Scholar
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)
Article MATH MathSciNet Google Scholar
Niu, F., Recht, B., Ré, C., Wright, S.J.: Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 693–701. Curran Associates (2011)
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)
MATH Google Scholar
Patrascu, A., Necoara, I.: Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization. J. Glob. Optim. (2013). doi:10.1007/s10898-014-0151-9
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods—Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Google Scholar
Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)
Google Scholar
Powell, M.J.D.: On search directions for minimization algorithms. Math. Program. 4, 193–201 (1973)
Article MATH Google Scholar
Razaviyayn, M., Hong, M., Luo, Z.Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)
Article MATH MathSciNet Google Scholar
Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)
Article MATH MathSciNet Google Scholar
Richtarik, P., Takac, M.: Parallel coordinate descent methods for big data optimization. Technical Report, School of Mathematics, University of Edinburgh (2013). arXiv:1212.0873
Richtarik, P., Takac, M.: Iteration complexity of a randomized block-coordinate descent methods for minimizing a composite function. Math. Program. Ser. A 144(1), 1–38 (2014)
Article MATH MathSciNet Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
MATH Google Scholar
Sardy, S., Bruce, A., Tseng, P.: Block coordinate relaxation methods for nonparametric wavelet denoising. J. Comput. Graph. Stat. 9, 361–379 (2000)
MathSciNet Google Scholar
Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(\ell _1\)-regularized loss minimization. J. Mach. Learn. Res. 12, 1865–1892 (2011)
MATH MathSciNet Google Scholar
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent mehods for regularized loss minimization. J. Mach. Learn. Res. 14, 437–469 (2013)
MathSciNet Google Scholar
Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262–278 (2009)
Article MATH MathSciNet Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. B 58, 267–288 (1996)
MATH MathSciNet Google Scholar
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Article MATH MathSciNet Google Scholar
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117, 387–423 (2009)
Article MATH MathSciNet Google Scholar
Ye, J.C., Webb, K.J., Bouman, C.A., Millane, R.P.: Optical diffusion tomography by iterative-coordinate-descent optimization in a bayesian framework. J. Opt. Soc. Am. A 16(10), 2400–2412 (1999)
Article Google Scholar

Download references

Acknowledgments

I thank Ji Liu for the pleasure of collaborating with him on this topic over the past two years. I am grateful to the editors and referees of the paper, whose expert and constructive comments led to numerous improvements.

Author information

Authors and Affiliations

Department of Computer Sciences, University of Wisconsin-Madison, 1210 W. Dayton St., Madison, WI, 53706-1685, USA
Stephen J. Wright

Authors

Stephen J. Wright
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephen J. Wright.

Additional information

The author was supported by NSF Awards DMS-1216318 and IIS-1447449, ONR Award N00014-13-1-0129, AFOSR Award FA9550-13-1-0138, and Subcontract 3F-30222 from Argonne National Laboratory.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wright, S.J. Coordinate descent algorithms. Math. Program. 151, 3–34 (2015). https://doi.org/10.1007/s10107-015-0892-3

Download citation

Received: 30 November 2014
Accepted: 17 February 2015
Published: 25 March 2015
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10107-015-0892-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coordinate descent algorithms

Abstract

Access this article

Similar content being viewed by others

On optimal probabilities in stochastic coordinate descent methods

A flexible coordinate descent method

Conjugate Gradients

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Coordinate descent algorithms

Abstract

Access this article

Similar content being viewed by others

On optimal probabilities in stochastic coordinate descent methods

A flexible coordinate descent method

Conjugate Gradients

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation