Abstract
We consider minimization of functions that are compositions of convex or prox-regular functions (possibly extended-valued) with smooth vector functions. A wide variety of important optimization problems fall into this framework. We describe an algorithmic framework based on a subproblem constructed from a linearized approximation to the objective and a regularization term. Properties of local solutions of this subproblem underlie both a global convergence result and an identification property of the active manifold containing the solution of the original problem. Preliminary computational results on both convex and nonconvex examples are promising.
Similar content being viewed by others
References
Bolte, J., Daniilidis, A., Lewis, A.S.: Generic optimality conditions for semialgebraic convex problems. Math. Oper. Res. 36, 55–70 (2011)
Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer Series in Operations Research. Springer, Berlin (2000)
Burke, J.V.: Descent methods for composite nondifferentiable optimization problems. Math. Program. Ser. A 33, 260–279 (1985)
Burke, J.V.: On the identification of active constraints II: the nonconvex case. SIAM J. Numer. Anal. 27, 1081–1102 (1990)
Burke, J.V., Moré, J.J.: On the identification of active constraints. SIAM J. Numer. Anal. 25, 1197–1211 (1988)
Byrd, R., Gould, N.I.M., Nocedal, J., Waltz, R.A.: On the convergence of successive linear-quadratic programming algorithms. SIAM J. Optim. 16, 471–489 (2005)
Cai, J.-F., Candès, E., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20, 1956–1982 (2010)
Candès, E., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9, 717–772 (2009)
Candès, E.J.: Compressive sampling. In: Proceedings of the International Congress of Mathematicians, Madrid (2006)
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1998)
Combettes, P., Pennanen, T.: Proximal methods for cohypomonotone operators. SIAM J. Control Optim. 43, 731–742 (2004)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)
Daniilidis, A., Hare, W., Malick, J.: Geometrical interpretation of the predictor-corrector type algorithms in structured optimization problems. Optimization 55, 481–503 (2006)
Dmitruk, A.V., Kruger, A.Y.: Metric regularity and systems of generalized equations. J. Math. Anal. Appl. 342, 864–873 (2008)
Dontchev, A.L., Lewis, A.S., Rockafellar, R.T.: The radius of metric regularity. Trans. Am. Math. Soc. 355, 493–517 (2003)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–499 (2004)
Fan, J., Li, R.: Variable selection via nonconvex penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1361 (2001)
Fletcher, R., Sainz de la Maza, E.: Nonlinear programming and nonsmooth optimization by successive linear programming. Math. Program. 43, 235–256 (1989)
Friedlander, M.P., Gould, N.I.M., Leyffer, S., Munson, T.S.: A filter active-set trust-region method, Preprint ANL/MCS-P1456-0907, Mathematics and Computer Science Division, Argonne National Laboratory, 9700 S. Cass Avenue, Argonne IL 60439, September 2007
Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain nonconvex minimization problems. Int. J. Syst. Sci. 12, 989–1000 (1981)
Hale, E.T., Yin, W., Zhang, Y.: A fixed-point continuation method for \(\ell _1\)-minimization: methodology and convergence. SIAM J. Optim. 19, 1107–1130 (2008)
Hare, W., Lewis, A.: Identifying active constraints via partial smoothness and prox-regularity. J. Convex Anal. 11, 251–266 (2004)
Iusem, A., Pennanen, T., Svaiter, B.: Inexact variants of the proximal point algorithm without monotonicity. SIAM J. Optim. 13, 1080–1097 (2003)
Jokar, S., Pfetsch, M.E.: Exact and approximate sparse solutions of underdetermined linear equations. SIAM J. Sci. Comput. 31, 23–44 (2008)
Kaplan, A., Tichatschke, R.: Proximal point methods and nonconvex optimization. J. Glob. Optim. 13, 389–406 (1998)
Kim, T., Wright, S.J.: An \(\text{ S }\ell _1\text{ LP }\)-active set approach for feasibility restoration in power systems, tech. rep., Computer Science Department, University of Wisconsin-Madison, May 2014. arXiv:1405.0322
Lan, G.: Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization. Math. Program. Ser. A 149, 1–45 (2015)
Lemaréchal, C., Oustry, F., Sagastizábal, C.: The \({\cal {U}}\)-Lagrangian of a convex function. Trans. Am. Math. Soc. 352, 711–729 (2000)
Levy, A.: Lipschitzian multifunctions and a Lipschitzian inverse mapping theorem. Math. Oper. Res. 26, 105–118 (2001)
Lewis, A.: Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. 13, 702–725 (2003)
Mangasarian, O.L.: Minimum-support solutions of polyhedral concave programs. Optimization 45, 149–162 (1999)
Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Rev. Française Informat. Recherche Opérationnelle 4, 154–158 (1970)
Mifflin, R., Sagastizábal, C.: A VU-algorithm for convex minimization. Math. Program. Ser. B 104, 583–608 (2005)
Miller, S.A., Malick, J.: Newton methods for nonsmooth convex minimization: connections among \({\cal {U}}\)-Lagrangian, Reimannian Newton, and SQP methods. Math. Program. Ser. B 104, 609–633 (2005)
Mordukhovich, B.: Variational Analysis and Generalized Differentiation, I: Basic Theory; II: Applications. Springer, New York (2006)
Pennanen, T.: Local convergence of the proximal point algorithm and multiplier methods without monotonicity. Math. Oper. Res. 27, 170–191 (2002)
Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum-rank solutions of matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2010)
Rockafellar, R.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Rockafellar, R.T., Wets, R.J.: Variational Analysis. Springer, Berlin (1998)
Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60, 259–268 (1992)
Sagastizábal, C.: Composite proximal bundle method. Math. Program. Ser. B 140, 189–233 (2013)
Sagastizábal, C., Mifflin, R.: Proximal points are on the fast track. J. Convex Anal. 9, 563–579 (2002)
Shapiro, A.: On a class of nonsmooth composite functions. Math. Oper. Res. 28, 677–692 (2003)
Shi, W., Wahba, G., Wright, S.J., Lee, K., Klein, R., Klein, B.: LASSO-Patternsearch algorithm with application to opthalmology data. Stat. Interface 1, 137–153 (2008)
Spingarn, J.: Submonotone mappings and the proximal point algorithm. Numer. Funct. Anal. Optim. 4, 123–150 (1981/82)
Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. B 58, 267–288 (1996)
Wen, Z., Yin, W., Zhang, H., Goldfarb, D.: On the convergence of an active set method for \(\ell _1\) minimization. SIAM J. Sci. Comput. 32, 1832–1857 (2010)
Wright, S.J.: Convergence of an inexact algorithm for composite nonsmooth optimization. IMA J. Numer. Anal. 9, 299–321 (1990)
Wright, S.J.: Identifiable surfaces in constrained optimization. SIAM J. Control Optim. 31, 1063–1079 (1993)
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)
Yuan, Y.: Conditions for convergence of a trust-region method for nonsmooth optimization. Math. Program. 31, 220–228 (1985)
Yuan, Y.: On the superlinear convergence of a trust region algorithm for nonsmooth optimization. Math. Program. 31, 269–285 (1985)
Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Zimmerman, R.D., Murillo-Sánchez, C.E., Thomas, R.J.: MATPOWER: steady-state operations, planning, and analysis tools for power systems research and education. IEEE Trans. Power Syst. 26, 12–19 (2011)
Acknowledgments
We acknowledge the support of NSF Grants 0430504 and DMS-0806057. We are grateful for the comments of two referees, which were most helpful in revising earlier versions. We thank Mr. Taedong Kim for obtaining computational results for the formulation (6.4).
Author information
Authors and Affiliations
Corresponding author
Additional information
A.S. Lewis’s research supported in part by NSF Award DMS-1208338.
S.J. Wright’s research supported in part by NSF Awards DMS-1216318 and IIS-1447449, ONR Award N00014-13-1-0129, AFOSR Award FA9550-13-1-0138, and Subcontract 3F-30222 from Argonne National Laboratory.
Appendix
Appendix
The basic building block for variational analysis (see Rockafellar and Wets [40] or Mordukhovich [35]) is the normal cone to a (locally) closed set S at a point \(s \in S\), denoted by \(N_S(s)\). It consists of all normal vectors: limits of sequences of vectors of the form \(\lambda (u-v)\) for points \(u,v \in \mathfrak {R}^m\) approaching s such that v is a closest point to u in S, and scalars \(\lambda > 0\). On the other hand, tangent vectors are limits of sequences of vectors of the form \(\lambda (u-s)\) for points \(u \in S\) approaching s and scalars \(\lambda > 0\). The set S is Clarke regular at s when the inner product of any normal vector with any tangent vector is always nonpositive. Closed convex sets and smooth manifolds are everywhere Clarke regular.
The epigraph of a function \(h:\mathfrak {R}^m \rightarrow {\bar{\mathfrak {R}}}\) is the set
If the value of h is finite at some point \(\bar{c} \in \mathfrak {R}^m\), then h is lower semicontinuous nearby if and only if its epigraph is locally closed around the point \(\big (\bar{c}, h(\bar{c})\big )\). Henceforth we focus on that case.
The subdifferential of h at \(\bar{c}\) is the set
and the horizon subdifferential is
(see [40, Theorem 8.9]). The function h is subdifferentially regular at \(\bar{c}\) if its epigraph is Clarke regular at \(\big (\bar{c}, h(\bar{c})\big )\) (as holds in particular if h is convex lower semicontinuous, or smooth). Subdifferential regularity implies that \(\partial h(\bar{c})\) is a closed and convex set in \(\mathfrak {R}^m\), and its recession cone is exactly \(\partial ^{\infty } h(\bar{c})\) (see [40, Corollary 8.11]). In the case when h is locally Lipschitz, it is almost everywhere differentiable: h is then subdifferentially regular at \(\bar{c}\) if and only if its directional derivative for every direction \(d \in \mathfrak {R}^m\) equals
where the \(\limsup \) is taken over points c where h is differentiable.
Consider a subgradient \(\bar{v} \in \partial h(\bar{c})\), and a localization of the subdifferential mapping \(\partial h\) around the point \((\bar{c},\bar{v})\), by which we mean a set-valued mapping \(T:\mathfrak {R}^m \rightrightarrows \mathfrak {R}^m\) defined by
for some constant \(\epsilon >0\). The function h is prox-regular at \(\bar{c}\) for \(\bar{v}\) if some such localization is hypomonotone: that is, for some constant \(\rho > 0\), we have
This definition is equivalent to Definition 1.1 (with the same constant \(\rho \)) [40, Example 12.28 and Theorem 13.36]. Prox-regularity at \(\bar{c}\) (for all subgradients v) implies subdifferential regularity.
A general class of prox-regular functions common in engineering applications is “lower \({\mathcal {C}}^2\)” functions [40, Definition 10.29]. A function \(h:\mathfrak {R}^m \rightarrow \mathfrak {R}\) is lower \({\mathcal {C}}^2\) around a point \(\bar{c} \in \mathfrak {R}^m\) if h has the local representation
for some function \(f:\mathfrak {R}^m \times T \rightarrow \mathfrak {R}\), where the space T is compact and the quantities f(c, t), \(\nabla _c f(c,t)\), and \(\nabla ^2_{cc} f(c,t)\) all depend continuously on (c, t). All lower \({\mathcal {C}}^2\) functions are prox-regular [40, Proposition 13.3]. A simple equivalent property, useful in theory though harder to check in practice, is that h has the form \(g-\kappa |\cdot |^2\) around the point \(\bar{c}\) for some continuous convex function g and some constant \(\kappa \).
The normal cone is crucial to the definition of another central variational-analytic tool. Given a set-valued mapping \(F : \mathfrak {R}^p \rightrightarrows \mathfrak {R}^q\) with closed graph,
at any point \((\bar{u},\bar{v}) \in \text{ gph }\,F\), the coderivative \(D^*F(\bar{u}|\bar{v}):\mathfrak {R}^q \rightrightarrows \mathfrak {R}^p\) is defined by
The coderivative generalizes the adjoint of the derivative of smooth vector function: for smooth \(c : \mathfrak {R}^n \rightarrow \mathfrak {R}^m\), the set-valued mapping \(x \mapsto F(x) := \{c(x)\}\) has coderivative given by \(D^*F(x|c(x))(y) = \{\nabla c(x)^* y\}\) for all \(x \in \mathfrak {R}^n\) and \(y\in \mathfrak {R}^m\). As we see next, coderivative calculations drive two of the arguments in Sect. 4.1.
Proof of Corollary 4.3
Corresponding to any linear map \(A :\mathfrak {R}^p \rightarrow \mathfrak {R}^q\), define a set-valued mapping \(F_A :\mathfrak {R}^p \rightrightarrows \mathfrak {R}^q\) by \(F_A(u) = Au-S\). A coderivative calculation shows, for vectors \(v \in \mathfrak {R}^p\),
Hence, by assumption, the only vector \(v \in \mathfrak {R}^p\) satisfying \(0 \in D^* F_{\bar{A}}(0|0)(v)\) is zero, so by [40, Thm 9.43], the mapping \(F_{\bar{A}}\) is metrically regular at zero for zero. Applying Theorem 4.2 shows that there exist constants \(\delta ,\gamma > 0\) such that, if \(\Vert A-\bar{A}\Vert < \delta \) and \(|v| < \delta \), then we have
or equivalently,
Since \(0 \in S\), the right-hand side is bounded above by \(\gamma |v|\), so the result follows. \(\square \)
Proof of Theorem 4.4
We simply need to check that the set-valued mapping \(G :\mathfrak {R}^p \!\rightrightarrows \mathfrak {R}^q\) defined by \(G(z) = F(z) - S\) is metrically regular at \(\bar{z}\) for zero. Much the same coderivative calculation as in the proof of Corollary 4.3 shows, for vectors \(v \in \mathfrak {R}^p\), the formula
Hence, by assumption, the only vector \(v \in \mathfrak {R}^p\) satisfying \(0 \in D^* G(\bar{z}|0)(v)\) is zero, so metric regularity follows by [40, Thm 9.43]. \(\square \)
Alternative proof of Theorem 4.2
In the text we gave a short ad hoc proof of Theorem 4.2. Here we present a more formal approach. Denote the space of linear maps from \(\mathfrak {R}^p\) to \(\mathfrak {R}^q\) by \(L(\mathfrak {R}^p,\mathfrak {R}^q)\), and define a mapping \(g :L(\mathfrak {R}^p,\mathfrak {R}^q) \times \mathfrak {R}^p \rightarrow \mathfrak {R}^q\) and a parametric mapping \(g_H :\mathfrak {R}^p \rightarrow \mathfrak {R}^q\) by \(g(H,u)= g_H(u) = Hu\) for maps \(H \in L(\mathfrak {R}^p,\mathfrak {R}^q)\) and points \(u \in \mathfrak {R}^p\). Using the notation of [14, Section 3], the Lipschitz constant \(l[g](0;\bar{u},0)\), is by definition the infimum of the constants \(\rho \) for which the inequality
holds for all triples (u, w, H) sufficiently near the triple \((\bar{u}, 0, 0)\). Inequality (6.6) says simply
a property that holds providing \(\rho \ge \Vert H\Vert \). We deduce
We can also consider \(F+g\) as a set-valued mapping from \(L(\mathfrak {R}^p,\mathfrak {R}^q) \times \mathfrak {R}^p\) to \(\mathfrak {R}^q\), defined by \((F+g)(H,u) = F(u) + Hu\), and then the parametric mapping \((F+g)_H :\mathfrak {R}^p \rightrightarrows \mathfrak {R}^q\) is defined in the obvious way: in other words, \((F+g)_H(u) = F(u) + Hu\). According to [14, Theorem 2], Equation (6.7) implies the following relationship between the “covering rates” for F and \(F+g\):
The reciprocal of the right-hand side is, by definition, the infimum of the constants \(\kappa > 0\) such that inequality (4.1) holds for all pairs (u, v) sufficiently near the pair \((\bar{u}, \bar{v})\). By metric regularity, this number is strictly positive. On the other hand, the reciprocal of the left-hand side is, by definition, the infimum of the constants \(\gamma > 0\) such that inequality (4.2) holds for all triples (u, v, H) sufficiently near the pair \((\bar{u}, \bar{v},0)\).
Rights and permissions
About this article
Cite this article
Lewis, A.S., Wright, S.J. A proximal method for composite minimization. Math. Program. 158, 501–546 (2016). https://doi.org/10.1007/s10107-015-0943-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-015-0943-9
Keywords
- Prox-regular functions
- Polyhedral convex functions
- Sparse optimization
- Global convergence
- Active constraint identification