A trust region algorithm with a worst-case iteration complexity of $$\mathcal{O}(\epsilon ^{-3/2})$$ for nonconvex optimization

Curtis, Frank E.; Robinson, Daniel P.; Samadi, Mohammadreza

doi:10.1007/s10107-016-1026-2

A trust region algorithm with a worst-case iteration complexity of $\mathcal{O}(\epsilon ^{-3/2})$ for nonconvex optimization

Full Length Paper
Series A
Published: 17 May 2016

Volume 162, pages 1–32, (2017)
Cite this article

Mathematical Programming Submit manuscript

Frank E. Curtis ORCID: orcid.org/0000-0001-7214-9187¹,
Daniel P. Robinson² &
Mohammadreza Samadi¹

1839 Accesses
87 Citations
Explore all metrics

Abstract

We propose a trust region algorithm for solving nonconvex smooth optimization problems. For any $\overline{\epsilon }\in (0,\infty )$, the algorithm requires at most $\mathcal{O}(\epsilon ^{-3/2})$ iterations, function evaluations, and derivative evaluations to drive the norm of the gradient of the objective function below any $\epsilon \in (0,\overline{\epsilon }]$. This improves upon the $\mathcal{O}(\epsilon ^{-2})$ bound known to hold for some other trust region algorithms and matches the $\mathcal{O}(\epsilon ^{-3/2})$ bound for the recently proposed Adaptive Regularisation framework using Cubics, also known as the arc algorithm. Our algorithm, entitled trace, follows a trust region framework, but employs modified step acceptance criteria and a novel trust region update mechanism that allow the algorithm to achieve such a worst-case global complexity bound. Importantly, we prove that our algorithm also attains global and fast local convergence guarantees under similar assumptions as for other trust region algorithms. We also prove a worst-case upper bound on the number of iterations, function evaluations, and derivative evaluations that the algorithm requires to obtain an approximate second-order stationary point.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Preconditioned golden ratio primal-dual algorithm with linesearch

Article 16 April 2024

References

Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 3rd edn. Wiley, London (2006)
Book MATH Google Scholar
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
MATH Google Scholar
Cartis, C., Gould, N.I.M., Toint, Ph.L.: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM J. Optim. 20(6), 2833–2852 (2010)
Cartis, C., Gould, N.I.M., Toint, Ph.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: Motivation, convergence and numerical results. Math. Program. 127, 245–295 (2011)
Cartis, C., Gould, N.I.M., Toint, Ph.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part II: Worst-case function- and derivative-evaluation complexity. Math. Program. 130, 295–319 (2011)
Cartis, C., Gould, N.I.M., Toint. Ph.L.: Optimal Newton-type methods for nonconvex smooth optimization problems. Technical report ERGO 11-009, School of Mathematics, University of Edinburgh (2011)
Conn, A.R., Gould, N.I.M., Toint, Ph.L.: Trust-Region Methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2000)
Dennis, J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1996)
Book MATH Google Scholar
Griewank, A.: The modification of Newton’s method for unconstrained optimization by bounding cubic terms. Technical report NA/12, Department of Applied Mathematics and Theoretical Physics, University of Cambridge (1981)
Griva, I., Nash, S.G., Sofer, A.: Linear and Nonlinear Optimization, 2nd edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2008)
MATH Google Scholar
Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983)
Article MathSciNet MATH Google Scholar
Nesterov, Yu., Polyak, B.T.: Cubic regularization of Newton’s method and its global performance. Math. Program. 108(1), 117–205 (2006)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization, vol. 87. Springer, Berlin (2004)
MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, Berlin (2006)
Google Scholar
Ruszczynski, A.: Nonlinear Optimization. Princeton University Press, Princeton (2006)
MATH Google Scholar
Weiser, M., Deuflhard, P., Erdmann, B.: Affine conjugate adaptive Newton methods for nonlinear elastomechanics. Optim. Methods Softw. 22(3), 413–431 (2007)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors are extremely grateful to Coralia Cartis, Nicholas I. M. Gould, and Philippe L. Toint for enlightening discussions about the arc algorithm and its theoretical properties that were inspirational for the algorithm proposed in this paper. The authors would also like to thank the two anonymous referees whose comments and suggestions helped to improve the paper.

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA, USA
Frank E. Curtis & Mohammadreza Samadi
Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
Daniel P. Robinson

Authors

Frank E. Curtis
View author publications
You can also search for this author in PubMed Google Scholar
Daniel P. Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Mohammadreza Samadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frank E. Curtis.

Additional information

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics, Early Career Research Program under Contract Number DE-SC0010615, as well as by the U.S. National Science Foundation under Grant Nos. DMS-1217153 and DMS-1319356.

Appendix: Subproblem solver

The trace algorithm follows the framework of a trust region algorithm in which, in each iteration, a subproblem with a quadratic objective and a trust region constraint is solved to optimality and all other steps are explicit computations involving the iterate and related sequences. The only exception is Step 13 in which a new trust region radius is computed via the contract subroutine. In this appendix, we outline a practical procedure entailing the main computational components of contract, revealing that it can be implemented in such a way that each iteration of trace need not be more computationally expensive than similar implementations of those in a traditional trust region algorithm or arc.

Suppose that Steps 3 and 19 are implemented using a traditional approach of applying Newton’s method to solve a secular equation of the form $\phi _k(\lambda ) = 0$, where, for a given $\lambda \ge \max \{0,-\xi _{k,1}\}$ (where, as in the proof of Lemma 3.12, we define $\xi _{k,1}$ as the leftmost eigenvalue of $H_k$), the vector $s_k(\lambda )$ is defined as a solution of the linear system (2.2) and

$$\begin{aligned} \phi _k(\lambda ) = \Vert s_k(\lambda )\Vert _2^{-1} - \delta _k^{-1}. \end{aligned}$$

(A.1)

A practical implementation of such an approach involves the initialization and update of an interval of uncertainty, say $[\underline{\lambda },\overline{\lambda }]$, in which the dual variable $\lambda _k$ corresponding to a solution $s_k = s_k(\lambda _k)$ of $\mathcal{Q}_k$ is known to lie [11]. In particular, for a given estimate $\lambda \in [\underline{\lambda },\overline{\lambda }]$, a factorization of $(H_k + \lambda I)$ is computed (or at least attempted), yielding a trial solution $s_k(\lambda )$ and a corresponding derivative of $\phi _k$ for the application of a (safeguarded) Newton iteration.

In the context of such a strategy for the implementation of Steps 3 and 19, most of the computations involved in the contract subroutine can be considered as part of the initialization process for such a Newton iteration, if not a replacement for the entire Newton iteration. For example, if Step 33 is reached, then the computation of $(\lambda ,s)$ in Steps 33–34 are exactly those that would be performed in such a Newton iteration with an initial solution estimate of $\lambda \leftarrow \gamma _{\lambda }\lambda _k$. If Step 36 is reached, then the solution $(s_{k+1},\lambda _{k+1})$ of $\mathcal{Q}_{k+1}$ in Step 19 is yielded by this computation and a Newton solve of a secular equation is not required; otherwise, if Step 38 is reached, then one could employ $\overline{\lambda }\leftarrow \lambda $ in the Newton iteration for solving $\mathcal{Q}_{k+1}$ in Step 19. Overall, if Step 33 is reached, then the computations in contract combined with Step 19 are no more expensive than the subproblem solve in a traditional trust region algorithm, and may be significantly cheaper in cases when Step 36 is reached.

The situation is similar when Step 25 is reached. In particular, if the pair $(\lambda ,s)$ computed in Steps 25–26 result in Step 28 being reached, then the pair $(s_{k+1},\lambda _{k+1})$ required in Step 19 is available without having to run an expensive Newton iteration to solve a secular equation, meaning that computational expense is saved in our mechanism for setting $\delta _{k+1}$ implicitly via our choice of the dual variable $\lambda _{k+1}$. On the other hand, if Step 30 is reached, then the algorithm requests a value $\lambda \in (\lambda _k,\hat{\lambda })$ such that $\underline{\sigma }\le \lambda /\Vert s_k(\lambda )\Vert _2 \le \overline{\sigma }$. A variety of techniques could be employed for finding such a $\lambda $, but perhaps the most direct is to consider a technique such as [4, Algorithm 6.1] in which a cubic regularization subproblem is solved using a (safeguarded) Newton iteration applied to solve a secular equation similar to (A.1). It should be noted, however, that while [4, Algorithm 6.1] attempts to solve $\lambda /\Vert s_k(\lambda )\Vert _2 = \sigma $ for some given $\sigma > 0$, the computation in Step 30 merely requires $\underline{\sigma }\le \lambda /\Vert s_k(\lambda )\Vert _2 \le \overline{\sigma }$, meaning that one could, say, choose $\sigma = (\underline{\sigma }+\overline{\sigma })/2$, but terminate the Newton iteration as soon as $\lambda /\Vert s_k(\lambda )\Vert _2$ is computed in the (potentially very large) interval $[\underline{\sigma },\overline{\sigma }]$. Clearly, such a computation is no more expensive than [4, Algorithm 6.1].

Finally, it is worthwhile to note that since the contract subroutine desires the computation of a trust region radius such that the new corresponding dual variable satisfies $\lambda _{k+1} > \lambda _k \ge \max \{0,-\xi _{k,1}\}$, it follows that, after a contraction, the subproblem $\mathcal{Q}_{k+1}$ will not involve the well known “hard case” in the context of solving a trust region subproblem. (We remark that this avoidance of the “hard case” does not necessarily occur if one were to perform a contraction merely by setting the trust region radius as a fraction of the norm of the trial step, as is typically done in other trust region methods.)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Curtis, F.E., Robinson, D.P. & Samadi, M. A trust region algorithm with a worst-case iteration complexity of $\mathcal{O}(\epsilon ^{-3/2})$ for nonconvex optimization. Math. Program. 162, 1–32 (2017). https://doi.org/10.1007/s10107-016-1026-2

Download citation

Received: 21 October 2014
Accepted: 05 May 2016
Published: 17 May 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s10107-016-1026-2

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A trust region algorithm with a worst-case iteration complexity of \(\mathcal{O}(\epsilon ^{-3/2})\) for nonconvex optimization

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

Random Gradient-Free Minimization of Convex Functions

Preconditioned golden ratio primal-dual algorithm with linesearch

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Subproblem solver

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A trust region algorithm with a worst-case iteration complexity of \(\mathcal{O}(\epsilon ^{-3/2})\) for nonconvex optimization

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

Random Gradient-Free Minimization of Convex Functions

Preconditioned golden ratio primal-dual algorithm with linesearch

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Subproblem solver

Appendix: Subproblem solver

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation