ABSTRACT
We consider the minimization of a smooth loss function regularized by the trace norm of the matrix variable. Such formulation finds applications in many machine learning tasks including multi-task learning, matrix classification, and matrix completion. The standard semidefinite programming formulation for this problem is computationally expensive. In addition, due to the non-smooth nature of the trace norm, the optimal first-order black-box method for solving such class of problems converges as O(1/√k), where k is the iteration counter. In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O(1/k). We further propose an accelerated gradient algorithm, which achieves the optimal convergence rate of O(1/k2) for smooth problems. Experiments on multi-task learning problems demonstrate the efficiency of the proposed algorithms.
- Abernethy, J., Bach, F., Evgeniou, T., & Vert, J.-P. (2006). Low-rank matrix factorization with attributes (Technical Report N24/06/MM). Ecole des Mines de Paris.Google Scholar
- Abernethy, J., Bach, F., Evgeniou, T., & Vert, J.-P. (2009). A new approach to collaborative filtering: Operator estimation with spectral regularization. J. Mach. Learn. Res., 10, 803--826. Google ScholarDigital Library
- Amit, Y., Fink, M., Srebro, N., & Ullman, S. (2007). Uncovering shared structures in multiclass classification. In Proceedings of the International Conference on Machine Learning, 17--24. Google ScholarDigital Library
- Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73, 243--272. Google ScholarDigital Library
- Bach, F. R. (2008). Consistency of trace norm minimization. J. Mach. Learn. Res., 9, 1019--1048. Google ScholarDigital Library
- Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2, 183--202. Google ScholarDigital Library
- Bertsekas, D. P. (1999). Nonlinear programming. Athena Scientific. 2nd edition.Google Scholar
- Cai, J.-F., Candés, E. J., & Shen, Z. (2008). A singular value thresholding algorithm for matrix completion (Technical Report 08-77). UCLA Computational and Applied Math.Google Scholar
- Candés, E. J., & Recht, B. (2008). Exact matrix completion via convex optimization (Technical Report 08-76). UCLA Computational and Applied Math.Google Scholar
- Fazel, M., Hindi, H., & Boyd, S. P. (2001). A rank minimization heuristic with application to minimum order system approximation. In Proceedings of the American Control Conference, 4734--4739.Google ScholarCross Ref
- Lu, Z., Monteiro, R. D. C., & Yuan, M. (2008). Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression. Submitted to Mathematical Programming. Google ScholarDigital Library
- Ma, S., Goldfarb, D., & Chen, L. (2008). Fixed point and Bregman iterative methods for matrix rank minimization (Technical Report 08-78). UCLA Computational and Applied Math.Google Scholar
- Nemirovsky, A. S., & Yudin, D. B. (1983). Problem complexity and method efficiency in optimization. John Wiley & Sons Ltd.Google Scholar
- Nesterov, Y. (1983). A method for solving a convex programming problem with convergence rate O(1/k 2). Soviet Math. Dokl., 27, 372--376.Google Scholar
- Nesterov, Y. (2003). Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers.Google Scholar
- Nesterov, Y. (2005). Smooth minimization of non-smooth functions. Mathematical Programming, 103, 127--152. Google ScholarDigital Library
- Nesterov, Y. (2007). Gradient methods for minimizing composite objective function (Technical Report 2007/76). CORE, Université catholique de Louvain.Google Scholar
- Obozinski, G., Taskar, B., & Jordan, M. I. (2009). Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing. In press. Google ScholarDigital Library
- Recht, B., Fazel, M., & Parrilo, P. (2008a). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. Submitted to SIAM Review. Google ScholarDigital Library
- Recht, B., Xu, W., & Hassibi, B. (2008b). Necessary and sufficient condtions for success of the nuclear norm heuristic for rank minimization. In Proceedings of the 47th IEEE Conference on Decision and Control, 3065--3070.Google Scholar
- Rennie, J. D. M., & Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. In Proceedings of the International Conference on Machine Learning, 713--719. Google ScholarDigital Library
- Srebro, N., Rennie, J. D. M., & Jaakkola, T. S. (2005). Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems, 1329--1336.Google Scholar
- Toh, K.-C., & Yun, S. (2009). An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Preprint, Department of Mathematics, National University of Singapore, March 2009.Google Scholar
- Tomioka, R., & Aihara, K. (2007). Classifying matrices with a spectral regularization. In Proceedings of the International Conference on Machine Learning, 895--902. Google ScholarDigital Library
- Tseng, P. (2008). On accelerated proximal gradient methods for convex-concave optimization. Submitted to SIAM Journal on Optimization.Google Scholar
- Weimer, M., Karatzoglou, A., Le, Q., & Smola, A. rank (2008a). COFIrank - maximum margin matrix factorization for collaborative ranking. In Advances in Neural Information Processing Systems, 1593--1600.Google Scholar
- Weimer, M., Karatzoglou, A., & Smola, A. (2008b). Improving maximum margin matrix factorization. Machine Learning, 72, 263--276. Google ScholarDigital Library
- Yuan, M., Ekici, A., Lu, Z., & Monteiro, R. (2007). Dimension reduction and coefficient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B, 69, 329--346.Google ScholarCross Ref
Index Terms
- An accelerated gradient method for trace norm minimization
Recommendations
Decomposable norm minimization with proximal-gradient homotopy algorithm
We study the convergence rate of the proximal-gradient homotopy algorithm applied to norm-regularized linear least squares problems, for a general class of norms. The homotopy algorithm reduces the regularization parameter in a series of steps, and uses ...
Accelerated reweighted nuclear norm minimization algorithm for low rank matrix recovery
In this paper we propose an accelerated reweighted nuclear norm minimization algorithm to recover a low rank matrix. Our approach differs from other iterative reweighted algorithms, as we design an accelerated procedure which makes the objective ...
Nonlocal image denoising via adaptive tensor nuclear norm minimization
Nonlocal self-similarity shows great potential in image denoising. Therefore, the denoising performance can be attained by accurately exploiting the nonlocal prior. In this paper, we model nonlocal similar patches through the multi-linear approach and ...
Comments