skip to main content
10.1145/1553374.1553434acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

An accelerated gradient method for trace norm minimization

Published:14 June 2009Publication History

ABSTRACT

We consider the minimization of a smooth loss function regularized by the trace norm of the matrix variable. Such formulation finds applications in many machine learning tasks including multi-task learning, matrix classification, and matrix completion. The standard semidefinite programming formulation for this problem is computationally expensive. In addition, due to the non-smooth nature of the trace norm, the optimal first-order black-box method for solving such class of problems converges as O(1/√k), where k is the iteration counter. In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O(1/k). We further propose an accelerated gradient algorithm, which achieves the optimal convergence rate of O(1/k2) for smooth problems. Experiments on multi-task learning problems demonstrate the efficiency of the proposed algorithms.

References

  1. Abernethy, J., Bach, F., Evgeniou, T., & Vert, J.-P. (2006). Low-rank matrix factorization with attributes (Technical Report N24/06/MM). Ecole des Mines de Paris.Google ScholarGoogle Scholar
  2. Abernethy, J., Bach, F., Evgeniou, T., & Vert, J.-P. (2009). A new approach to collaborative filtering: Operator estimation with spectral regularization. J. Mach. Learn. Res., 10, 803--826. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amit, Y., Fink, M., Srebro, N., & Ullman, S. (2007). Uncovering shared structures in multiclass classification. In Proceedings of the International Conference on Machine Learning, 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73, 243--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bach, F. R. (2008). Consistency of trace norm minimization. J. Mach. Learn. Res., 9, 1019--1048. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2, 183--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bertsekas, D. P. (1999). Nonlinear programming. Athena Scientific. 2nd edition.Google ScholarGoogle Scholar
  8. Cai, J.-F., Candés, E. J., & Shen, Z. (2008). A singular value thresholding algorithm for matrix completion (Technical Report 08-77). UCLA Computational and Applied Math.Google ScholarGoogle Scholar
  9. Candés, E. J., & Recht, B. (2008). Exact matrix completion via convex optimization (Technical Report 08-76). UCLA Computational and Applied Math.Google ScholarGoogle Scholar
  10. Fazel, M., Hindi, H., & Boyd, S. P. (2001). A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American Control Conference, 4734--4739.Google ScholarGoogle ScholarCross RefCross Ref
  11. Lu, Z., Monteiro, R. D. C., & Yuan, M. (2008). Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression. Submitted to Mathematical Programming. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ma, S., Goldfarb, D., & Chen, L. (2008). Fixed point and Bregman iterative methods for matrix rank minimization (Technical Report 08-78). UCLA Computational and Applied Math.Google ScholarGoogle Scholar
  13. Nemirovsky, A. S., & Yudin, D. B. (1983). Problem complexity and method efficiency in optimization. John Wiley & Sons Ltd.Google ScholarGoogle Scholar
  14. Nesterov, Y. (1983). A method for solving a convex programming problem with convergence rate O(1/k 2). Soviet Math. Dokl., 27, 372--376.Google ScholarGoogle Scholar
  15. Nesterov, Y. (2003). Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers.Google ScholarGoogle Scholar
  16. Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Mathematical Programming, 103, 127--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Nesterov, Y. (2007). Gradient methods for minimizing composite objective function (Technical Report 2007/76). CORE, Université catholique de Louvain.Google ScholarGoogle Scholar
  18. Obozinski, G., Taskar, B., & Jordan, M. I. (2009). Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing. In press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Recht, B., Fazel, M., & Parrilo, P. (2008a). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. Submitted to SIAM Review. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Recht, B., Xu, W., & Hassibi, B. (2008b). Necessary and sufficient condtions for success of the nuclear norm heuristic for rank minimization. In Proceedings of the 47th IEEE Conference on Decision and Control, 3065--3070.Google ScholarGoogle Scholar
  21. Rennie, J. D. M., & Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the International Conference on Machine Learning, 713--719. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Srebro, N., Rennie, J. D. M., & Jaakkola, T. S. (2005). Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems, 1329--1336.Google ScholarGoogle Scholar
  23. Toh, K.-C., & Yun, S. (2009). An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Preprint, Department of Mathematics, National University of Singapore, March 2009.Google ScholarGoogle Scholar
  24. Tomioka, R., & Aihara, K. (2007). Classifying matrices with a spectral regularization. In Proceedings of the International Conference on Machine Learning, 895--902. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tseng, P. (2008). On accelerated proximal gradient methods for convex-concave optimization. Submitted to SIAM Journal on Optimization.Google ScholarGoogle Scholar
  26. Weimer, M., Karatzoglou, A., Le, Q., & Smola, A. rank (2008a). COFIrank - maximum margin matrix factorization for collaborative ranking. In Advances in Neural Information Processing Systems, 1593--1600.Google ScholarGoogle Scholar
  27. Weimer, M., Karatzoglou, A., & Smola, A. (2008b). Improving maximum margin matrix factorization. Machine Learning, 72, 263--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yuan, M., Ekici, A., Lu, Z., & Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B, 69, 329--346.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An accelerated gradient method for trace norm minimization

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Other conferences
                ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
                June 2009
                1331 pages
                ISBN:9781605585161
                DOI:10.1145/1553374

                Copyright © 2009 Copyright 2009 by the author(s)/owner(s).

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 14 June 2009

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                Overall Acceptance Rate140of548submissions,26%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader