Skip to main content
Log in

A trust region algorithm with a worst-case iteration complexity of \(\mathcal{O}(\epsilon ^{-3/2})\) for nonconvex optimization

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

We propose a trust region algorithm for solving nonconvex smooth optimization problems. For any \(\overline{\epsilon }\in (0,\infty )\), the algorithm requires at most \(\mathcal{O}(\epsilon ^{-3/2})\) iterations, function evaluations, and derivative evaluations to drive the norm of the gradient of the objective function below any \(\epsilon \in (0,\overline{\epsilon }]\). This improves upon the \(\mathcal{O}(\epsilon ^{-2})\) bound known to hold for some other trust region algorithms and matches the \(\mathcal{O}(\epsilon ^{-3/2})\) bound for the recently proposed Adaptive Regularisation framework using Cubics, also known as the arc algorithm. Our algorithm, entitled trace, follows a trust region framework, but employs modified step acceptance criteria and a novel trust region update mechanism that allow the algorithm to achieve such a worst-case global complexity bound. Importantly, we prove that our algorithm also attains global and fast local convergence guarantees under similar assumptions as for other trust region algorithms. We also prove a worst-case upper bound on the number of iterations, function evaluations, and derivative evaluations that the algorithm requires to obtain an approximate second-order stationary point.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bazaraa, M.S., Sherali, H.D., Shetty, C.M.: Nonlinear Programming: Theory and Algorithms, 3rd edn. Wiley, London (2006)

    Book  MATH  Google Scholar 

  2. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  3. Cartis, C., Gould, N.I.M., Toint, Ph.L.: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM J. Optim. 20(6), 2833–2852 (2010)

  4. Cartis, C., Gould, N.I.M., Toint, Ph.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: Motivation, convergence and numerical results. Math. Program. 127, 245–295 (2011)

  5. Cartis, C., Gould, N.I.M., Toint, Ph.L.: Adaptive cubic regularisation methods for unconstrained optimization. Part II: Worst-case function- and derivative-evaluation complexity. Math. Program. 130, 295–319 (2011)

  6. Cartis, C., Gould, N.I.M., Toint. Ph.L.: Optimal Newton-type methods for nonconvex smooth optimization problems. Technical report ERGO 11-009, School of Mathematics, University of Edinburgh (2011)

  7. Conn, A.R., Gould, N.I.M., Toint, Ph.L.: Trust-Region Methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2000)

  8. Dennis, J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1996)

    Book  MATH  Google Scholar 

  9. Griewank, A.: The modification of Newton’s method for unconstrained optimization by bounding cubic terms. Technical report NA/12, Department of Applied Mathematics and Theoretical Physics, University of Cambridge (1981)

  10. Griva, I., Nash, S.G., Sofer, A.: Linear and Nonlinear Optimization, 2nd edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2008)

    MATH  Google Scholar 

  11. Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  12. Nesterov, Yu., Polyak, B.T.: Cubic regularization of Newton’s method and its global performance. Math. Program. 108(1), 117–205 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  13. Nesterov, Y.: Introductory Lectures on Convex Optimization, vol. 87. Springer, Berlin (2004)

    MATH  Google Scholar 

  14. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, Berlin (2006)

    Google Scholar 

  15. Ruszczynski, A.: Nonlinear Optimization. Princeton University Press, Princeton (2006)

    MATH  Google Scholar 

  16. Weiser, M., Deuflhard, P., Erdmann, B.: Affine conjugate adaptive Newton methods for nonlinear elastomechanics. Optim. Methods Softw. 22(3), 413–431 (2007)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors are extremely grateful to Coralia Cartis, Nicholas I. M. Gould, and Philippe L. Toint for enlightening discussions about the arc algorithm and its theoretical properties that were inspirational for the algorithm proposed in this paper. The authors would also like to thank the two anonymous referees whose comments and suggestions helped to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank E. Curtis.

Additional information

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics, Early Career Research Program under Contract Number DE-SC0010615, as well as by the U.S. National Science Foundation under Grant Nos. DMS-1217153 and DMS-1319356.

Appendix: Subproblem solver

Appendix: Subproblem solver

The trace algorithm follows the framework of a trust region algorithm in which, in each iteration, a subproblem with a quadratic objective and a trust region constraint is solved to optimality and all other steps are explicit computations involving the iterate and related sequences. The only exception is Step 13 in which a new trust region radius is computed via the contract subroutine. In this appendix, we outline a practical procedure entailing the main computational components of contract, revealing that it can be implemented in such a way that each iteration of trace need not be more computationally expensive than similar implementations of those in a traditional trust region algorithm or arc.

Suppose that Steps 3 and 19 are implemented using a traditional approach of applying Newton’s method to solve a secular equation of the form \(\phi _k(\lambda ) = 0\), where, for a given \(\lambda \ge \max \{0,-\xi _{k,1}\}\) (where, as in the proof of Lemma 3.12, we define \(\xi _{k,1}\) as the leftmost eigenvalue of \(H_k\)), the vector \(s_k(\lambda )\) is defined as a solution of the linear system (2.2) and

$$\begin{aligned} \phi _k(\lambda ) = \Vert s_k(\lambda )\Vert _2^{-1} - \delta _k^{-1}. \end{aligned}$$
(A.1)

A practical implementation of such an approach involves the initialization and update of an interval of uncertainty, say \([\underline{\lambda },\overline{\lambda }]\), in which the dual variable \(\lambda _k\) corresponding to a solution \(s_k = s_k(\lambda _k)\) of \(\mathcal{Q}_k\) is known to lie [11]. In particular, for a given estimate \(\lambda \in [\underline{\lambda },\overline{\lambda }]\), a factorization of \((H_k + \lambda I)\) is computed (or at least attempted), yielding a trial solution \(s_k(\lambda )\) and a corresponding derivative of \(\phi _k\) for the application of a (safeguarded) Newton iteration.

In the context of such a strategy for the implementation of Steps 3 and 19, most of the computations involved in the contract subroutine can be considered as part of the initialization process for such a Newton iteration, if not a replacement for the entire Newton iteration. For example, if Step 33 is reached, then the computation of \((\lambda ,s)\) in Steps 33–34 are exactly those that would be performed in such a Newton iteration with an initial solution estimate of \(\lambda \leftarrow \gamma _{\lambda }\lambda _k\). If Step 36 is reached, then the solution \((s_{k+1},\lambda _{k+1})\) of \(\mathcal{Q}_{k+1}\) in Step 19 is yielded by this computation and a Newton solve of a secular equation is not required; otherwise, if Step 38 is reached, then one could employ \(\overline{\lambda }\leftarrow \lambda \) in the Newton iteration for solving \(\mathcal{Q}_{k+1}\) in Step 19. Overall, if Step 33 is reached, then the computations in contract combined with Step 19 are no more expensive than the subproblem solve in a traditional trust region algorithm, and may be significantly cheaper in cases when Step 36 is reached.

The situation is similar when Step 25 is reached. In particular, if the pair \((\lambda ,s)\) computed in Steps 25–26 result in Step 28 being reached, then the pair \((s_{k+1},\lambda _{k+1})\) required in Step 19 is available without having to run an expensive Newton iteration to solve a secular equation, meaning that computational expense is saved in our mechanism for setting \(\delta _{k+1}\) implicitly via our choice of the dual variable \(\lambda _{k+1}\). On the other hand, if Step 30 is reached, then the algorithm requests a value \(\lambda \in (\lambda _k,\hat{\lambda })\) such that \(\underline{\sigma }\le \lambda /\Vert s_k(\lambda )\Vert _2 \le \overline{\sigma }\). A variety of techniques could be employed for finding such a \(\lambda \), but perhaps the most direct is to consider a technique such as [4, Algorithm 6.1] in which a cubic regularization subproblem is solved using a (safeguarded) Newton iteration applied to solve a secular equation similar to (A.1). It should be noted, however, that while [4, Algorithm 6.1] attempts to solve \(\lambda /\Vert s_k(\lambda )\Vert _2 = \sigma \) for some given \(\sigma > 0\), the computation in Step 30 merely requires \(\underline{\sigma }\le \lambda /\Vert s_k(\lambda )\Vert _2 \le \overline{\sigma }\), meaning that one could, say, choose \(\sigma = (\underline{\sigma }+\overline{\sigma })/2\), but terminate the Newton iteration as soon as \(\lambda /\Vert s_k(\lambda )\Vert _2\) is computed in the (potentially very large) interval \([\underline{\sigma },\overline{\sigma }]\). Clearly, such a computation is no more expensive than [4, Algorithm 6.1].

Finally, it is worthwhile to note that since the contract subroutine desires the computation of a trust region radius such that the new corresponding dual variable satisfies \(\lambda _{k+1} > \lambda _k \ge \max \{0,-\xi _{k,1}\}\), it follows that, after a contraction, the subproblem \(\mathcal{Q}_{k+1}\) will not involve the well known “hard case” in the context of solving a trust region subproblem. (We remark that this avoidance of the “hard case” does not necessarily occur if one were to perform a contraction merely by setting the trust region radius as a fraction of the norm of the trial step, as is typically done in other trust region methods.)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Curtis, F.E., Robinson, D.P. & Samadi, M. A trust region algorithm with a worst-case iteration complexity of \(\mathcal{O}(\epsilon ^{-3/2})\) for nonconvex optimization. Math. Program. 162, 1–32 (2017). https://doi.org/10.1007/s10107-016-1026-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-016-1026-2

Keywords

Mathematics Subject Classification

Navigation