Skip to main content
Log in

Coordinate descent algorithms

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

Coordinate descent algorithms solve optimization problems by successively performing approximate minimization along coordinate directions or coordinate hyperplanes. They have been used in applications for many years, and their popularity continues to grow because of their usefulness in data analysis, machine learning, and other areas of current interest. This paper describes the fundamentals of the coordinate descent approach, together with variants and extensions and their convergence properties, mostly with reference to convex objectives. We pay particular attention to a certain problem structure that arises frequently in machine learning applications, showing that efficient implementations of accelerated coordinate descent algorithms are possible for problems of this type. We also present some parallel variants and discuss their convergence properties under several models of parallel execution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  2. Beck, A., Teboulle, M.: A fast iterative shrinkage-threshold algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  3. Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent methods. SIAM J. Optim. 23(4), 2037–2060 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  4. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  5. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall Inc, Englewood Cliffs (1989)

    MATH  Google Scholar 

  6. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. Ser. A 146, 1–36 (2014)

    Article  MathSciNet  Google Scholar 

  7. Bouman, C.A., Sauer, K.: A unified approach to statistical tomography using coordinate descent optimization. IEEE Trans. Image Process. 5(3), 480–492 (1996)

    Article  Google Scholar 

  8. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction methods of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  Google Scholar 

  9. Bradley, J.K., Kyrola, A., Bickson, D., Guestrin, C.: Parallel coordunate descent for \(\ell _1\)-regularized loss minimization. In: Proceedings of the 28 International Conference on Machine Learning (ICML 2011) (2011)

  10. Breheny, P., Huang, J.: Coordinate descent algroithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232–252 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  11. Canutescu, A.A., Dunbrack, R.L.: Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Sci. 12(5), 963–972 (2003)

    Article  Google Scholar 

  12. Chang, K., Hsieh, C., Lin, C.: Coordinate descent method for large-scale l2-loss linear support vector machines. J. Mach. Learn. Res. 9, 1369–1398 (2008)

    MATH  MathSciNet  Google Scholar 

  13. Eckstein, J., Bertsekas, D.P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  14. Eckstein, J., Yao, W.: Understanding the convergence of the alternating direction method of multipliers: theoretical and computational perspectives. Technical Report, RUTCOR, Rutgers University (2014)

  15. Fercoq, O., Qu, Z., Richtarik, P., Takac, M.: Fast distributed coordinate descent for non-strongly convex losses (2014). arxiv:1405.5300

  16. Fercoq, O., Richtarik, P.: Accelerated, parallel, and proximal coordinate descent. Technical Report, School of Mathematics, University of Edinburgh (2013). arXiv:1312.5799

  17. Florian, M., Chen, Y.: A coordinate descent method for the bilevel O-D matrix adjustment problem. Int. Trans. Oper. Res. 2(2), 165–179 (1995)

    MATH  Google Scholar 

  18. Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)

    Article  MATH  Google Scholar 

  19. Friedman, J.H., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)

    Google Scholar 

  20. Jaggi, M., Smith, V., Takác, M., Terhorst, J., Krishnan, S., Hoffman, T., Jordan, M.I.: Communication-efficient distributed dual coordinate ascent. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3068–3076. Curran Associates (2014)

  21. Jain, P., Netrapalli, P., Sanghavi, S.: Low-rank matrix completion using alternating minimization. Technical Report (2012). arXiv:1212.0467

  22. Kaczmarz, S.: Angenäherte auflösung von systemen linearer gleichungen. Bulletin International de l’Academie Polonaise des Sciences et des Lettres 35, 355–357 (1937)

    Google Scholar 

  23. Lee, Y.T., Sidford, A.: Efficient accelerated coordinate descent methods and faster algorihtms for solving linear systems. In: 54th Annual Symposium on Foundations of Computer Science, pp. 147–156 (2013)

  24. Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res. 35(3), 641–654 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  25. Lin, Q., Lu, Z., Xiao, L.: An accelerated proximal coordinate gradient method and its application to empirical risk minimization. Technical Report, Microsoft Research (2014). arXiv:1407.1296

  26. Liu, H., Palatucci, M., Zhang, J.: Lockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In: Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09, pp. 649–656. ACM, New York, NY, USA (2009)

  27. Liu, J., Wright, S.J.: Asynchronous stochastic coordinate descent: parallelism and convergence properties. Technical Report, University of Wisconsin, Madison. (2014). (To appear in SIAM Journal on Optimization). arXiv:1403.3862

  28. Liu, J., Wright, S.J., Ré, C., Bittorf, V., Sridhar, S.: An asynchronous parallel stochastic coordinate descent algorithm. Technical Report, Computer Sciences Department, University of Wisconsin-Madison (2013). (To appear in Journal of Machine Learning Research). arXiv:1311.1873

  29. Liu, J., Wright, S.J., Sridhar, S.: An accelerated randomized Kaczmarz algorithm. Technical Report, Computer Sciences Department, University of Wisconsin-Madison (2013). (To appear in Mathematics of Computation). arXiv 1310.2887

  30. Luo, Z.Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 72(1), 7–35 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  31. Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993)

    Article  MathSciNet  Google Scholar 

  32. Marecek, J., Richtarik, P., Takac, M.: Distributed block coordinate descent for minimizing partially separable functions. Technical Report arXiv:1406.0238 (2014)

  33. Mazumder, R., Friedman, J.H., Hastie, T.: SparseNet: coordinate descent with nonconvex penalties. J. Am. Stat. Assoc. 106, 1125–1138 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  34. Necoara, I., Clipici, D.: Distributed random coordinate descent method for composite minimization. Technical Report 1–41, University Politehnica Bucharest (2013)

  35. Nesterov, Y.: A method for unconstrained convex problem with the rate of convergence \(O(1/k^2)\). Doklady AN SSSR 269, 543–547 (1983)

    MathSciNet  Google Scholar 

  36. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Springer, New York (2004)

    Book  Google Scholar 

  37. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22, 341–362 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  38. Niu, F., Recht, B., Ré, C., Wright, S.J.: Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 693–701. Curran Associates (2011)

  39. Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)

    MATH  Google Scholar 

  40. Patrascu, A., Necoara, I.: Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization. J. Glob. Optim. (2013). doi:10.1007/s10898-014-0151-9

  41. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods—Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)

    Google Scholar 

  42. Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)

    Google Scholar 

  43. Powell, M.J.D.: On search directions for minimization algorithms. Math. Program. 4, 193–201 (1973)

    Article  MATH  Google Scholar 

  44. Razaviyayn, M., Hong, M., Luo, Z.Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Optim. 23(2), 1126–1153 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  45. Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  46. Richtarik, P., Takac, M.: Parallel coordinate descent methods for big data optimization. Technical Report, School of Mathematics, University of Edinburgh (2013). arXiv:1212.0873

  47. Richtarik, P., Takac, M.: Iteration complexity of a randomized block-coordinate descent methods for minimizing a composite function. Math. Program. Ser. A 144(1), 1–38 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  48. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    MATH  Google Scholar 

  49. Sardy, S., Bruce, A., Tseng, P.: Block coordinate relaxation methods for nonparametric wavelet denoising. J. Comput. Graph. Stat. 9, 361–379 (2000)

    MathSciNet  Google Scholar 

  50. Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(\ell _1\)-regularized loss minimization. J. Mach. Learn. Res. 12, 1865–1892 (2011)

    MATH  MathSciNet  Google Scholar 

  51. Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent mehods for regularized loss minimization. J. Mach. Learn. Res. 14, 437–469 (2013)

    MathSciNet  Google Scholar 

  52. Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262–278 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  53. Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. B 58, 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  54. Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  55. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. Ser. B 117, 387–423 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  56. Ye, J.C., Webb, K.J., Bouman, C.A., Millane, R.P.: Optical diffusion tomography by iterative-coordinate-descent optimization in a bayesian framework. J. Opt. Soc. Am. A 16(10), 2400–2412 (1999)

    Article  Google Scholar 

Download references

Acknowledgments

I thank Ji Liu for the pleasure of collaborating with him on this topic over the past two years. I am grateful to the editors and referees of the paper, whose expert and constructive comments led to numerous improvements.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen J. Wright.

Additional information

The author was supported by NSF Awards DMS-1216318 and IIS-1447449, ONR Award N00014-13-1-0129, AFOSR Award FA9550-13-1-0138, and Subcontract 3F-30222 from Argonne National Laboratory.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wright, S.J. Coordinate descent algorithms. Math. Program. 151, 3–34 (2015). https://doi.org/10.1007/s10107-015-0892-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-015-0892-3

Keywords

Mathematics Subject Classification

Navigation