A framework for generalising the Newton method and other iterative methods from Euclidean space to manifolds

Manton, Jonathan H.

doi:10.1007/s00211-014-0630-4

A framework for generalising the Newton method and other iterative methods from Euclidean space to manifolds

Published: 14 May 2014

Volume 129, pages 91–125, (2015)
Cite this article

Numerische Mathematik Aims and scope Submit manuscript

Jonathan H. Manton¹

584 Accesses
19 Citations
Explore all metrics

Abstract

The Newton iteration is a popular method for minimising a cost function on Euclidean space. Various generalisations to cost functions defined on manifolds appear in the literature. In each case, the convergence rate of the generalised Newton iteration needed establishing from first principles. The present paper presents a framework for generalising iterative methods from Euclidean space to manifolds that ensures local convergence rates are preserved. It applies to any (memoryless) iterative method computing a coordinate independent property of a function (such as a zero or a local minimum). All possible Newton methods on manifolds are believed to come under this framework. Changes of coordinates, and not any Riemannian structure, are shown to play a natural role in lifting the Newton method to a manifold. The framework also gives new insight into the design of Newton methods in general.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A class of spectral conjugate gradient methods for Riemannian optimization

Article 13 January 2023

A Riemannian BFGS Method for Nonconvex Optimization Problems

Memoryless Quasi-Newton Methods Based on the Spectral-Scaling Broyden Family for Riemannian Optimization

Article 22 March 2023

Notes

The main results of this paper were obtained in 2004–2005 and communicated privately to colleagues.

References

Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)
Book MATH Google Scholar
Absil, P.A., Mahony, R., Sepulchre, R., Van Dooren, P.: A Grassmann-Rayleigh quotient iteration for computing invariant subspaces. SIAM Rev. Publ. Soc. Indus Appl. Math. 44(1), 57–73 (2002)
MATH Google Scholar
Absil, P.A., Malick, J.: Projection-like retractions on matrix manifolds. Siam J. Optim. 22(1), 135–158 (2012)
Article MATH MathSciNet Google Scholar
Absil, P.A., Sepulchre, R., Van Dooren, P., Mahony, R.: Cubically convergent iterations for invariant subspace computation. SIAM J. Matrix Anal. Appl. 26(1), 70–96 (2004)
Article MATH MathSciNet Google Scholar
Adler, R.L., Dedieu, J.-P., Margulies, J.Y., Martens, M., Shub, M.: Newton’s method on Riemannian manifolds and a geometric model for the human spine. IMA J. Numer. Anal. 22(3), 359–390 (2002)
Article MATH MathSciNet Google Scholar
Alvarez, F., Bolte, J., Munier, J.: A unifying local convergence result for Newton’s method in Riemannian manifolds. Found. Comput. Math. 8(2), 197–226 (2008)
Article MATH MathSciNet Google Scholar
Argyros, I.K.: An improved unifying convergence analysis of Newton’s method in Riemannian manifolds. J. Appl. Math. Comput. 25(1–2), 345–351 (2007)
Article MATH MathSciNet Google Scholar
Deuflhard, P.: Newton methods for nonlinear problems: affine invariance and adaptive algorithms. In: Springer Series in Computational Mathematics. Springer, Berlin (2004)
Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal. Appl. 20(2), 303–353 (1998)
Article MATH MathSciNet Google Scholar
Ferreira, O., Svaiter, B.: Kantorovich’s theorem on Newton’s method in Riemannian manifolds. J. Complex. 18(1), 304–329 (2002)
Article MATH MathSciNet Google Scholar
Gabay, D.: Minimizing a differentiable function over a differentiable manifold. J. Optim. Theory Appl. 37(2), 177–219 (1982)
Article MATH MathSciNet Google Scholar
Helmke, U., Moore, J.B.: Optimization and dynamical systems. In: Communications and Control Engineering Series. Springer-Verlag London Ltd., London (1994)
Hirsch, M.W., Smale, S.: Differential Equations, Dynamical Systems, and Linear Algebra. Academic Press, New York (1974)
MATH Google Scholar
Kantorovich, L., Akhilov, G.: Functional Analysis in Normed Spaces. Fizmatgiz, Moscow (1959)
MATH Google Scholar
Manton, J.H.: Optimisation algorithms exploiting unitary constraints. IEEE Trans. Signal Process. 50(3), 635–650 (2002)
Article MathSciNet Google Scholar
Manton, J.H.: Optimisation geometry. In: Hüper, K., Trumpf, J. (eds.) Mathematical System Theory-Festschrift in Honor of Uwe Helmke on the Occasion of his Sixtieth Birthday, pp. 261–274. Create Space (2013)
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)
MATH Google Scholar
Polak, E.: Optimization: Algorithms and Consistent Approximations. Springer, New York (1997)
Book MATH Google Scholar
Shub, M.: Some remarks on dynamical systems and numerical analysis. In: Dynamical Systems and Partial Differential Equations (Caracas, 1984), pp. 69–91. Universidad Simon Bolivar, Caracas (1986)

Download references

Acknowledgments

This work was funded in part by the Australian Research Council. Special thanks to Dr Jochen Trumpf for insightful and thought-provoking discussions during the preliminary stages of this paper, and to the two anonymous reviewers for excellent guidance on improving the presentation.

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering, The University of Melbourne, Victoria, 3010, Australia
Jonathan H. Manton

Authors

Jonathan H. Manton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan H. Manton.

Appendices

Appendix A: Rate of Convergence of Iterates on Manifolds

Prior to this work,^{Footnote 1} it was natural to define convergence with respect to a Riemannian metric. The belief that Newton methods should not depend on any Riemannian geometry led to the following. Compared with [1, Section 4.5], the lemmata here are careful to ensure the iterates do not fall outside the domain of definition of the iteration function.

Convergence rates are not preserved by arbitrary homeomorphisms. A sufficient condition for rates $K>1$ is the following.

Lemma 26

Let $N$ be an iteration function on $\mathbb {R}^n$ which converges locally to ${x^{*}}$ with rate $K>1$ and constant $\kappa $. Let $U$ be a neighbourhood of ${x^{*}}$ and $\phi :U \rightarrow V \subset \mathbb {R}^n$ a bi-Lipschitz homeomorphism about ${x^{*}}$, meaning there exist positive constants $\alpha ,\beta \in \mathbb {R}$ such that

$$\begin{aligned} \forall x \in U,\quad \frac{1}{\alpha }\,\Vert x - {x^{*}}\Vert \le \Vert \phi (x) - \phi ({x^{*}}) \Vert \le \beta \,\Vert x - {x^{*}}\Vert . \end{aligned}$$

(45)

Then $\tilde{N} = \phi \circ N \circ \phi ^{-1}$ converges locally to $\phi ({x^{*}})$ with rate $K$ and constant $\alpha ^K\beta \kappa $.

Proof

As noted in Sect. 2, since $N$ converges locally to ${x^{*}}$, for all sufficiently small balls $B$ centred at ${x^{*}}$, $N$ is defined on $B$, and $x \in B$ implies $N(x) \in B$ and $\Vert N(x) - {x^{*}}\Vert \le \kappa \,\Vert x-{x^{*}}\Vert ^K$. Choose such a $B$ contained in $U$. Since $\phi $ is a homeomorphism, $Y = \phi (B)$ is a non-empty open subset of $V$. If $y \in Y$ then $\tilde{N}(y)$ is well-defined and contained in $Y$, and $\Vert \tilde{N}(y) - \phi ({x^{*}}) \Vert \le \beta \,\Vert N(\phi ^{-1}(y)) - {x^{*}}\Vert \le \beta \kappa \, \Vert \phi ^{-1}(y) - {x^{*}}\Vert ^K \le \alpha ^K \beta \kappa \, \Vert y - \phi ({x^{*}}) \Vert ^K.$ $\square $

A significantly stronger condition is required if $K=1$. One such example is the following.

Lemma 27

Let $N$ be an iteration function on $\mathbb {R}^n$ converging locally to ${x^{*}}$ at a linear rate. Let $U$ be a neighbourhood of ${x^{*}}$ and $\phi :U \rightarrow V \subset \mathbb {R}^n$ a $\mathcal {C}^1$-diffeomorphism whose differential $D\phi $ at ${x^{*}}$ is proportional to the identity. Then $\tilde{N} = \phi \circ N \circ \phi ^{-1}$ converges locally to $\phi ({x^{*}})$ at a linear rate.

Proof

Let $\gamma \in \mathbb {R}$ be such that $D\phi ({x^{*}})\cdot \xi = \gamma \xi $ for $\xi \in \mathbb {R}^n$. Note $\gamma \ne 0$ because $\phi $ is a diffeomorphism. Since $\phi (x) - \phi ({x^{*}}) = \gamma (x-{x^{*}}) + r(x)$ where $\lim _{x \rightarrow {x^{*}}} \Vert r(x) \Vert / \Vert x - {x^{*}}\Vert = 0$, by shrinking $U$ to become a sufficiently small neighbourhood of ${x^{*}}$, it can be arranged for (45) to hold with $\beta = |\gamma | + \epsilon $ and $\alpha = \frac{1}{|\gamma |-\epsilon }$ for any $\epsilon > 0$. The result follows from Lemma 26 by choosing $\epsilon $ so that $\alpha \beta \kappa < 1$, where $\kappa < 1$ is the constant associated with $N$. $\square $

The above suggests the following definition. An iteration function $E:M \rightarrow M$ on an $n$-dimensional manifold $M$ is said to converge locally with rate $K \ge 1$ to ${p^{*}}$ with respect to the homeomorphism $\varphi :W \subset M \rightarrow V \subset \mathbb {R}^n$, where ${p^{*}}\in W$, if $\varphi \circ E \circ \varphi ^{-1}$, as an iteration function on $\mathbb {R}^n$, converges locally with rate $K$ to $\varphi ({p^{*}})$.

If $K=1$ or $M$ is only a topological manifold, there is no distinguished choice of homeomorphism $\varphi $ with respect to which convergence can be defined. If $M$ is a $\mathcal {C}^1$-manifold, $K>1$ and an iterate converges with respect to one coordinate chart $\varphi $ then Lemma 26 implies it converges with respect to any other chart $\psi $. (Proof: If $N = \varphi \circ E \circ \varphi ^{-1}$ converges then, since $\psi \circ \varphi ^{-1}$ is $\mathcal {C}^1$ and hence bi-Lipschitz on a possibly smaller domain, $\psi \circ E \circ \psi ^{-1} = (\psi \circ \varphi ^{-1}) \circ (\varphi \circ E \circ \varphi ^{-1}) \circ (\psi \circ \varphi ^{-1})^{-1}$ converges too.) Definition 28 affords a coordinate independent definition of rate of convergence.

Definition 28

An iteration function $E:M \rightarrow M$ on a $\mathcal {C}^1$-differentiable manifold converges locally with rate $K > 1$ to ${p^{*}}\in M$ if there exists a coordinate chart $\varphi :W \rightarrow V \subset \mathbb {R}^n$ defined on a neighbourhood of ${p^{*}}$ such that $\varphi \circ E \circ \varphi ^{-1}$ converges locally with rate $K$ to $\varphi ({p^{*}})$ as an iteration function on $\mathbb {R}^n$.

Appendix B: Local parametrisations

The normalisation $\phi _x(x)=x$ used in Sect. 4.1 does not generalise well to the manifold setting. Sect. 5 implicitly introduced $h_x(y) = \phi _x(x+y)$, thereby changing the normalisation to $h_x(0) = x$. Properties H1 to H3 of Sect. 5 are the analogues of properties P1 and P2 in Sect. 4.1.

Choosing $h_x(y) = x + y + y^2$ if $x$ is rational and $h_x(y) = x + y - y^2$ if $x$ is irrational exemplifies H1–H3 do not imply continuity of $h$. Conversely, $h$ being $\mathcal {C}^1$-smooth and satisfying H1 and H2 need not imply H3.

Example 29

Let $\alpha :\mathbb {R}\rightarrow \mathbb {R}$ be a $\mathcal {C}^2$-smooth (or even $\mathcal {C}^\infty $-smooth) bump function satisfying: $0 \le \alpha (t) \le 1$; $\alpha (t) = \alpha '(t) = 0$ for $t \not \in (1/2,1)$; $\alpha (3/4) = 1$. Let $h(x,y) = x + y + x^{-1/2} \alpha (y/x^2) y^2$ if $x > 0$ and $h(x,y)=x+y$ otherwise. Then differentiation shows that $h(x,y)$ is $\mathcal {C}^1$-smooth in $(x,y)$. Furthermore, $h_x(0)=x$, $Dh_x(0) = 1$ and $D^2h_x(0) = 0$. Therefore, H1 and H2 are satisfied, but H3 is not; if $x_n \rightarrow 0$ with $x_n > 0$ and $y_n = (3/4)x_n^2$ then $(h_{x_n}(y_n) - x_n - y_n)y_n^{-2} \rightarrow \infty $.

Nevertheless, a corollary of Lemma 30 is that $h$ being $\mathcal {C}^2$-smooth, or even just $D^2h_x(y)$ being continuous in $(x,y)$, suffices for H1 to imply H2 and H3.

Lemma 30

If, for $x,y \in B(0;\rho )$, $D^2h_x(y)$ is bounded in $(x,y)$ and continuous in $y$ (that is, for each $x$, $h_x(y)$ is $\mathcal {C}^2$-smooth in $y$) then $h$ satisfying H1${}_{\rho }$ implies it satisfies H2${}_{\rho }$ and H3${}_{\rho }$.

Proof

Let $\alpha = \sup _{x,y \in B(0;\rho )} \Vert D^2h_x(y)\Vert $; then H2${}_{\rho }$ is satisfied. Taylor’s theorem implies $h_x(y) = x + y + \frac{1}{2} D^2h_x(x+t(y-x)) \cdot (y-x,y-x)$ for some $t \in [0,1]$. Thus, H3${}_{\rho }$ holds with $\beta = \alpha /2$. $\square $

If $h(y) = y + t^3\sin (1/t)$ then $|h(y)-y| \le |y|^2$ whenever $|y|\le 1$, however, $D^2h(0)$ does not exist. This puts Lemma 31 into context.

Lemma 31

If $h$ satisfies H3${}_{\rho }$ then it satisfies H1${}_{\rho }$, and if additionally $D^2h_x(0)$ exists for $x \in B(0;\rho )$ then $h$ satisfies H2${}_{\rho }$ (with $\alpha =2\beta $).

Proof

That H3${}_{\rho }$ implies H1${}_{\rho }$ is clear. If $D^2h_x(0)$ exists, it is known that

$$\begin{aligned} \lim _{\Vert y \Vert \rightarrow 0} \Vert h_x(y) - 2 h_x(0) + h_x(-y) - D^2h_x(0)\cdot (y,y) \Vert \,\Vert y\Vert ^{-2} = 0. \end{aligned}$$

(46)

Thus, for any $\epsilon > 0$ there is a $\delta > 0$ such that $\Vert h_x(y)-2x+h_x(-y)-D^2h_x(0)\cdot (y,y)\Vert \le \epsilon \Vert y\Vert ^2$ whenever $\Vert y\Vert \le \delta $. Then $\Vert D^2h_x(0)\cdot (y,y)\Vert \le \epsilon \Vert y\Vert ^2 + \Vert h_x(y)-x-y \Vert + \Vert h_x(-y)-x-(-y)\Vert \le (\epsilon + 2\beta )\Vert y\Vert ^2$, proving the result; both sides scale as $\Vert y\Vert ^2$ and $\epsilon >0$ was arbitrary. $\square $

Lemma 32 asserts that H3${}_{\rho }$ is preserved under second-order changes to $h$; the straightforward proof is omitted.

Lemma 32

For some $\rho > 0$, assume $h$ satisfies H3${}_{\rho }$. If there exists a $\gamma \in \mathbb {R}$ such that $\tilde{h}$ satisfies $\Vert h_x(y) - \tilde{h}_x(y) \Vert \le \gamma \Vert y\Vert ^2$ whenever $x,y \in B(0;\rho )$ then $\tilde{h}$ satisfies H3${}_{\rho }$.

The following two technical lemmata will be required in subsequent proofs; Lemma 33 is well-known.

Lemma 33

Given $g:\mathbb {R}^n \rightarrow \mathbb {R}^m$ and $\delta > 0$, define $L = \sup _{z \in B(0;\delta )} \Vert Dg(z)\Vert $ and $M = \sup _{z \in B(0;\delta )} \frac{1}{2} \Vert D^2g(z)\Vert $. If $g$ is $\mathcal {C}^1$-smooth on $B(0;\delta )$ and $L$ is finite then $\Vert g(x)-g(y) \Vert \le L \Vert x-y\Vert $, and if $g$ is $\mathcal {C}^2$-smooth on $B(0;\delta )$ and $M$ is finite then $\Vert g(x)-g(y)-Dg(y)\cdot (x-y)\Vert \le M\Vert x-y\Vert ^2$ and $\Vert Dg(x) - Dg(y) \Vert \le 2M\Vert x-y\Vert $ for $x,y \in B(0;\delta )$. If $g$ is $\mathcal {C}^1$-smooth on $\overline{B}(0;\delta )$, meaning it is $\mathcal {C}^1$-smooth on an open set $U \supset \overline{B}(0;\delta )$, then $L$ is finite, and $M$ is finite if $g$ is $\mathcal {C}^2$-smooth on $\overline{B}(0;\delta )$.

Lemma 34

Fix a dimension $n$. Given scalars $\rho _1, \rho _2, \beta _1, G, L, M > 0$, there exist $\rho , \beta > 0$ such that, for any $\bar{h}:B_n(0;\rho _1) \rightarrow \mathbb {R}^n$ satisfying $\Vert \bar{h}(y) - y \Vert \le \beta _1 \Vert y\Vert ^2$ for $y \in B(0;\rho _1)$, and for any $g:B_n(0;\rho _2) \rightarrow \mathbb {R}^n$ that is a $\mathcal {C}^2$-diffeomorphism onto its image and satisfies $g(0)=0$, $\Vert [Dg(0)]^{-1} \Vert \le G$, $\Vert Dg(0) \Vert \le L$ and $\sup _{z \in B(0;\rho _2)} \frac{1}{2} \Vert D^2g(z) \Vert \le M$, it follows that $\tilde{h}(y) = (g \circ \bar{h} \circ [Dg(0)]^{-1})(y)$ is defined for $y \in B(0;\rho )$ and satisfies $\Vert \tilde{h}(y) - y \Vert \le \beta \Vert y \Vert ^2$.

Proof

For brevity, define $A = [Dg(0)]^{-1}$. By successively shrinking $\rho > 0$ as required, the following requirements can be met for all $y \in B(0;\rho )$: $\Vert A y \Vert \le \rho G < \rho _1$; $\Vert \bar{h}(A y) - A y \Vert \le \beta _1 \Vert A y\Vert ^2 \le \beta _1 G^2 \Vert y\Vert ^2$; $\Vert A^{-1} \bar{h}(A y) - y \Vert \le \beta _1 L G^2 \Vert y\Vert ^2$; $\Vert \bar{h}(A y) \Vert \le (1+\beta _1 \rho G) G\Vert y\Vert < \rho _2$; $\Vert g(\bar{h}(A y)) - A^{-1} \bar{h}(A y) \Vert \le M \Vert \bar{h}(A y) \Vert ^2 \le M G^2 (1+\rho \beta _1 G)^2 \Vert y\Vert ^2$ (Lemma 33); and finally $\Vert \tilde{h}(y) - y \Vert \le \Vert \tilde{h}(y) - A^{-1} \bar{h}(A y) \Vert + \Vert A^{-1} \bar{h}(A y) - y \Vert \le M G^2 (1+\rho \beta _1 G)^2 \Vert y\Vert ^2 + \beta _1 L G^2 \Vert y\Vert ^2$. Importantly, an appropriate value of $\rho $ can be determined as a function of the other scalars and does not depend on $g$ or $\bar{h}$. Similarly, $\beta = M G^2 (1+\rho \beta _1 G)^2 + \beta _1 L G^2$ suffices. $\square $

In certain situations, such as in Sect. 7.2, $h_x$ is constructed from transformed versions of a prototype $\bar{h}$, as in Lemma 35.

Lemma 35

Let $\bar{h}:\mathbb {R}^n \rightarrow \mathbb {R}^n$ restricted to $B(0;\rho _1)$ satisfy $\Vert \bar{h}(y) - y \Vert \le \beta \Vert y\Vert ^2$ for some $\beta \in \mathbb {R}$. Assume $D^2\bar{h}(0)$ exists. Define $h_x(y) = g_x \circ \bar{h} ([Dg_x(0)]^{-1} \cdot y)$ where, for each $x \in B(0;\rho _3) \subset \mathbb {R}^n$, $g_x:\mathbb {R}^n \rightarrow \mathbb {R}^n$ restricted to $B(0;\rho _2)$ is a $\mathcal {C}^2$-diffeomorphism satisfying $g_x(0)=x$, $\Vert Dg_x(0) \Vert \le L$ and $\Vert [Dg_x(0)]^{-1} \Vert \le G$ for some $G, L \in \mathbb {R}$. Assume $M = \sup _{x \in B(0;\rho _3),\,y \in B(0;\rho _2)} \frac{1}{2} \Vert D^2g_x(y)\Vert ^2 < \infty $. Here, $\rho _1,\rho _2,\rho _3 > 0$. Then $h$ satisfies H1, H2 and H3.

Proof

Since $D^2 \bar{h}(0)$ exists and $g_x$ is $\mathcal {C}^2$-smooth, $D^2 h_x(0)$ exists. By Lemma 31, it suffices to prove $h$ satisfies H3. Fix $x \in B(0;\rho _3)$. Choose $\rho $ and $\beta $ as in Lemma 34 with $g(z) = g_x(z) - x$. Then $\Vert h_x(y) - x - y \Vert = \Vert (g \circ \bar{h} \circ [Dg(0)]^{-1})(y) - y \Vert \le \beta \Vert y\Vert ^2$ whenever $\Vert y\Vert < \rho $. Therefore $h$ satisfies H3${}_{\min \{\rho ,\rho _2\}}$. $\square $

Properties H1–H3 are preserved under a change of coordinates.

Lemma 36

Let $g:\mathbb {R}^n \rightarrow \mathbb {R}^n$ restricted to $B(0;\rho _3)$ be a $\mathcal {C}^2$-diffeomorphism onto its image, with $g(0)=0$. Given a function $h:\mathbb {R}^n \times \mathbb {R}^n \rightarrow \mathbb {R}^n$, define $\tilde{h}_x(y) = g \circ h_{g^{-1}(x)}( D(g^{-1})(x) \cdot y)$. If $h$ satisfies H1 and H2 then $\tilde{h}$ satisfies H1 and H2. If $h$ satisfies H3 then $\tilde{h}$ satisfies H3.

Proof

Assume first that $h$ satisfies H3${}_{\rho _1}$. Choose a $\rho _2$ such that $0 < \rho _2 < \frac{\rho _3}{2}$ and $B(0;\rho _2) \subset g(B(0;\min \{\rho _1,\frac{\rho _3}{2}\}))$. Fix an $x \in B(0;\rho _2)$. Define $\bar{h}(y) = h_{g^{-1}(x)}(y) - g^{-1}(x)$ and $\tilde{g}(z) = g(z+g^{-1}(x))-x$. Note that $\bar{h}(y)$ is well-defined for $\Vert y\Vert < \rho _1$ and $\tilde{g}(z)$ is well-defined for $\Vert z \Vert < \rho _2$. Then Lemma 34 is applicable, with $\tilde{g}$ replacing $g$. (By shrinking $\rho _3$ if necessary, it can be assumed the derivatives of $\tilde{g}$ are uniformly bounded.) In particular, there exist $\rho $ and $\beta $, independent of $x$, such that $\Vert \tilde{h}_x(y)-x-y\Vert = \Vert (\tilde{g} \circ \bar{h} \circ [Dg(0)]^{-1})(y) - y \Vert \le \beta \Vert y\Vert ^2$ whenever $y \in B(0;\rho )$. Therefore $\tilde{h}$ satisfies H3${}_{\min \{\rho ,\rho _2\}}$, as required.

Next, assume $h$ satisfies H1 and H2 (but not necessarily H3). It is reasonably clear that $\tilde{h}_x(y)$ has a sufficiently large domain of definition required for $\tilde{h}_x(0)$, $D\tilde{h}_x(0)$ and $D^2\tilde{h}_x(0)$ to exist in a neighbourhood of $x=0$. Explicit calculations, using the chain rule to compute derivatives, verify that $\tilde{h}$ satisfies H1 and H2. $\square $

It is remarked that the tedious nature of the last few proofs comes from the necessity of ensuring the transformed $h$ has a valid domain of definition. This is a consequence of the standing assumption that $h$ itself need not be defined on the whole of $\mathbb {R}^n \times \mathbb {R}^n$. This becomes important when coordinate charts on manifolds enter the picture.

Appendix C: Further results on the generalised Newton method

1.1 Appendix C.1.: Intrinsic conditions

Condition (37) does not depend on the choice of coordinates.

Proposition 37

In Theorem 11, if (37) holds, it holds with respect to any $\mathcal {C}^2$-chart $(\tilde{U}, \tilde{\varphi })$ with $\tilde{\varphi }({p^{*}})=0$ and $\tilde{U}$ sufficiently small.

Proof

Referring to Theorem 11, let $(\tilde{U},\tilde{\varphi })$ be a chart with $\tilde{\varphi }({p^{*}})=0$ and choose $\rho >0$ so that $h = \varphi \circ \tilde{\varphi }^{-1}$ is well-defined on $\overline{B}(0;\rho )$. Then $H_{f \circ \tilde{\varphi }^{-1}}(x) = H_{\widehat{f} \circ h}(x) = A_x^T H_{\widehat{f}}(h(x)) A_x + G_x$ where $A_x$ and $G_x$ are the matrix representations of $Dh$ and $(Df \circ h) D^2h$ respectively. Since $Dh$ and $Df \circ h$ are $\mathcal {C}^1$-smooth and $D^2h$ is continuous, there exist constants $\alpha ,\beta $ such that $\Vert A_x-A_0\Vert \le \alpha \Vert x\Vert $ and $\Vert G_x\Vert \le \beta \Vert x\Vert $ whenever $x \in \overline{B}(0;\rho )$. Similarly, from (37) and Taylor series arguments, there exists a constant $\gamma $ such that $\Vert [ H_{\widehat{f}}(h(x)) - H_{\widehat{f}}(0) ] A_0 x \Vert \le \Vert [ H_{\widehat{f}}(h(x)) - H_{\widehat{f}}(0) ] h(x) \Vert + \Vert [ H_{\widehat{f}}(h(x)) - H_{\widehat{f}}(0) ] (h(x) - A_0 x) \Vert \le \gamma \Vert x\Vert ^2$ whenever $x \in \overline{B}(0;\rho )$. Shrink $\tilde{U}$ to equal $\tilde{\varphi }^{-1}(B(0;\rho ))$. The result follows by noting

$$\begin{aligned}&\Vert [H_{f \circ \tilde{\varphi }^{-1}}(x) - H_{f \circ \tilde{\varphi }^{-1}}(0)] x \Vert \le \Vert [A_x^T H_{\widehat{f}}(h(x)) A_x - A_x^T H_{\widehat{f}}(h(x)) A_0] x \Vert \nonumber \\&\quad + \Vert [A_x^T H_{\widehat{f}}(h(x)) A_0 - A_0^T H_{\widehat{f}}(h(x)) A_0] x \Vert + \Vert [A_0^T H_{\widehat{f}}(h(x)) A_0\nonumber \\&\quad - A_0^T H_{\widehat{f}}(0) A_0] x \Vert + \Vert G_x x \Vert . \end{aligned}$$

(47)

$\square $

Conditions C1–C2 are also intrinsic; the choice of coordinate charts is immaterial and the conditions are preserved under diffeomorphisms. Let $h_\star :TM \rightarrow TN$ denote the push-forward of tangent vectors induced by a map $h:M \rightarrow N$ between manifolds; $h_\star (v_p) = Dh(p)\cdot v_p$.

Proposition 38

Let $\phi ,\psi :TM \rightarrow M$ satisfy C1–C2. Then about any point $p \in M$, C1 and C2 hold with respect to any $\mathcal {C}^2$-chart $(\tilde{U}, \tilde{\varphi })$ with $\tilde{\varphi }(p)=0$. Furthermore, if $h:M \rightarrow N$ is a $\mathcal {C}^2$-diffeomorphism of manifolds then the induced maps $\tilde{\phi }= h \circ \phi \circ h_\star ^{-1}$ and $\tilde{\psi }= h \circ \psi \circ h_\star ^{-1}$ satisfy C1–C2.

Proof

Let $h:M \rightarrow N$ be a $\mathcal {C}^2$-diffeomorphism. Fix $p \in M$. Let $(\tilde{U}, \tilde{\varphi })$ be a $\mathcal {C}^2$-chart on N with $\tilde{\varphi }\circ h(p) = 0$. It will be shown $\widehat{\tilde{\phi }} = \tilde{\varphi }\circ \tilde{\phi }\circ \tau _{\tilde{\varphi }}^{-1}$ satisfies H1 and H2, and $\widehat{\tilde{\psi }} = \tilde{\varphi }\circ \tilde{\psi }\circ \tau _{\tilde{\varphi }}^{-1}$ satisfies H3. This proves the second part of the lemma. The first part then follows by letting $h:M \rightarrow M$ be the identity map.

Let $(U,\varphi )$ be a $\mathcal {C}^2$-chart on $M$ with $\varphi (p)=0$ and such that $\widehat{\phi }=\varphi \circ \phi \circ \tau _\varphi ^{-1}$ satisfies H1 and H2, and $\widehat{\psi }=\varphi \circ \psi \circ \tau _\varphi ^{-1}$ satisfies H3. Let $g = \tilde{\varphi }\circ h \circ \varphi ^{-1}$; it is a $\mathcal {C}^2$-diffeomorphism from $\varphi (U \cap h^{-1}(\tilde{U}))$ to $\tilde{\varphi }(h(U) \cap \tilde{U})$ and $\widehat{\tilde{\psi }}(x,y) = g \circ \widehat{\psi }_{g^{-1}(x)} \circ D(g^{-1})(x) \cdot y$. Apply Lemma 36 to conclude $\widehat{\tilde{\psi }}$ satisfies H3. Analogously, Lemma 36 implies $\widehat{\tilde{\phi }}$ satisfies H1 and H2. $\square $

1.2 Appendix C.2.: Sufficient conditions

Conditions C1–C2 are readily satisfied by $\mathcal {C}^2$-smooth parametrisations. In this case $M$ must be $\mathcal {C}^3$-smooth. If $M$ were only $\mathcal {C}^2$-smooth then $\phi :TM \rightarrow M$ at best can be $\mathcal {C}^1$-smooth because $TM$ is only a $\mathcal {C}^1$-manifold.

Lemma 39

Let $M$ be a $\mathcal {C}^3$-manifold. If $\phi $ is $\mathcal {C}^2$-smooth and, for all $p \in M$, $\phi _p(0_p) = p$ and $D\phi _p(0_p) = I$, then C1 holds. If $\psi $ is $\mathcal {C}^2$-smooth and, for all $p \in M$, $\psi _p(0_p) = p$ and $D\psi _p(0_p) = I$, then C2 holds.

Proof

Follows from Lemma 30. $\square $

Remark 40

Since C1–C2 are local in nature (Lemma 42), it suffices in Lemma 39 for $\phi $ and $\psi $ to be smooth on a neighbourhood of the zero section of $TM$.

Conditions C1–C2 are preserved under restriction to submanifolds.

Lemma 41

Let $i:N \rightarrow M$ be a $\mathcal {C}^2$-embedding of $N$ in $M$, with $i_\star :TN \rightarrow TM$ the induced push-forward of tangent vectors. Let $\phi ,\psi :TM \rightarrow M$ be parametrisations of $M$ satisfying C1–C2, and $\tilde{\phi },\tilde{\psi }:TN \rightarrow N$ parametrisations of $N$ satisfying $\phi \circ i_\star = i \circ \tilde{\phi }$ and $\psi \circ i_\star = i \circ \tilde{\psi }$. Then $\tilde{\phi },\tilde{\psi }$ satisfy C1–C2.

Proof

From Proposition 38, it suffices to assume $N \subset M$. Then $\tilde{\phi }$ and $\tilde{\psi }$ are simply the restrictions of $\phi $ and $\psi $ to $TN$. The result follows by observing that if $h:\mathbb {R}^n \times \mathbb {R}^n \rightarrow \mathbb {R}^n$ in the definitions of H1–H3 is restricted to $\mathbb {R}^m \subset \mathbb {R}^n$ then H1–H3 would continue to hold. $\square $

One way to express precisely the local nature of C1–C2 is with the aid of a Riemannian metric on $M$.

Lemma 42

Let $\phi ,\psi :TM \rightarrow M$ satisfy C1–C2 where $M$ is a $\mathcal {C}^2$-Riemannian manifold. Let $r:M \rightarrow (0,\infty )$ be a possibly discontinuous function. Assume $\tilde{\phi },\tilde{\psi }:TM \rightarrow M$ satisfy $\tilde{\phi }(v_p) = \phi (v_p)$ and $\tilde{\psi }(v_p) = \psi (v_p)$ whenever $\Vert v_p \Vert < r(p)$. Then $\tilde{\phi }$ satisfies C1. If $\inf _{p \in K} r(p) > 0$ for any compact $K \subset M$ then $\tilde{\psi }$ satisfies C2.

Proof

Fix $p \in M$ and let $\varphi $ and $\rho $ be such that C1 and C2 hold. Then $B(0;\rho ) \times B(0;\rho )$ is in the image of $\tau _\varphi $; let $V$ be its pre-image. For any $\bar{r} > 0$, the set $V_{\bar{r}} = \{ v_p \in V \mid \Vert v_p \Vert < \bar{r}\}$ is open, hence $\tau _\varphi (V_{\bar{r}})$ is open too.

Choose an $x \in B(0;\delta )$ and let $\bar{r} = r(\varphi ^{-1}(x))$. There exists a $\delta _x > 0$ such that $(x,B(0;\delta _x)) \subset \tau _\varphi (V_{\bar{r}})$. Restricted to $(x,B(0;\delta _x))$, $\varphi \circ \phi \circ \tau _\varphi ^{-1}$ and $\varphi \circ \tilde{\phi }\circ \tau _\varphi ^{-1}$ are equal. It follows that $\phi $ satisfies C1.

Let $K = \varphi ^{-1}(\overline{B}(0;\rho / 2))$ and $\bar{r} = \inf _{p \in K} r(p)$. Let $\bar{\rho }\in (0, \rho / 2)$ be such that $B(0;\bar{\rho }) \times B(0;\bar{\rho }) \subset \tau _\varphi (V_{\bar{r}})$. Restricted to $B(0;\bar{\rho }) \times B(0;\bar{\rho })$, $\varphi \circ \psi \circ \tau _\varphi ^{-1}$ and $\varphi \circ \tilde{\psi }\circ \tau _\varphi ^{-1}$ are equal. It follows that $\psi $ satisfies C2. $\square $

1.3 Appendix C.3.: Embedded submanifolds of Euclidean space

For manifolds embedded in Euclidean space, C1–C2 can be expressed in extrinsic coordinates.

Treating $\mathbb {R}^m$ as a manifold, a parametrisation $\phi :T\mathbb {R}^m \rightarrow \mathbb {R}^m$ can be specified by its representation $\widehat{\phi }:\mathbb {R}^m \times \mathbb {R}^m \rightarrow \mathbb {R}^m$ with respect to the identity chart, denoted $\phi = \widehat{\phi }\circ \tau _I$. Given a $\mathcal {C}^2$-embedding $i:M \rightarrow \mathbb {R}^m$, let $V_xM$ for $x \in i(M)$ denote the realisation of $T_{i^{-1}(x)}M$ as a subspace of $\mathbb {R}^m$, that is, $(x,V_xM) = \tau _I \circ i_\star (T_{i^{-1}(x)}M)$ where $i_\star :TM \rightarrow T\mathbb {R}^m$ is the push-forward of $i$. (The elements of $V_xM$ are the vectors $\gamma '(0)$ where $\gamma :(-\epsilon , \epsilon ) \rightarrow \mathbb {R}^m$, $\gamma (0)=x$, is a curve whose image is contained in $i(M)$.)

If $\widehat{\phi }(x,y)$ belongs to $i(M)$ whenever $x \in i(M)$ and $y \in V_xM$ then it induces a parametrisation $\tilde{\phi }:TM \rightarrow M$ given by $\tilde{\phi }= i^{-1} \circ \widehat{\phi }\circ \tau _I \circ i_\star $. In essence, $\widehat{\phi }$ maps a point $x+y$ on the affine tangent space of $i(M)$ at $x$, to the point $\widehat{\phi }(x,y)$ on $i(M)$. This is how parametrisations were specified in [15].

Lemma 43

Let $i:M \rightarrow \mathbb {R}^m$ be a $\mathcal {C}^2$-embedding of a manifold $M$. With notation as above, assume $\widehat{\phi },\widehat{\psi }:\mathbb {R}^m \times \mathbb {R}^m \rightarrow \mathbb {R}^m$ satisfy: $\forall z \in i(M)$, $\exists \alpha , \beta , \rho \in \mathbb {R}$ with $\rho > 0$, $\forall x \in B(z;\rho ) \cap i(M)$, $\forall y \in B(0;\rho ) \cap V_xM$, $\widehat{\phi }_x(y), \widehat{\psi }_x(y) \in i(M)$, $\widehat{\phi }_x(0)=x$, $D\widehat{\phi }_x(0) \cdot y = y$, $\Vert D^2 \widehat{\phi }_x(0) \cdot (y,y) \Vert \le \alpha \Vert y\Vert ^2$, $\Vert \widehat{\psi }_x(y) - x - y \Vert \le \beta \Vert y\Vert ^2$. Then the parametrisations $\tilde{\phi },\tilde{\psi }:TM \rightarrow M$ defined by $\tilde{\phi }= i^{-1} \circ \widehat{\phi }\circ \tau _I \circ i_\star $ and $\tilde{\psi }= i^{-1} \circ \widehat{\psi }\circ \tau _I \circ i_\star $ satisfy C1 and C2 of Sect. 5.

Proof

For $x \in i(M)$, let $P_x:\mathbb {R}^m \rightarrow V_xM$ denote Euclidean projection onto $V_xM$. Extend $\widehat{\phi }$ by defining $\widehat{\phi }(x,y)=x+y$ for $x \not \in i(M)$, and $\widehat{\phi }(x,y)= \widehat{\phi }(x,P_x(y)) + y - P_x(y)$ for $x \in i(M)$ and $y \not \in V_xM$. Extend $\widehat{\psi }$ similarly. Then $\phi =\widehat{\phi }\circ \tau _I$ and $\psi =\widehat{\psi }\circ \tau _I$ satisfy C1–C2. (Fix $p \in \mathbb {R}^m$ and define $\varphi (x) = x-p$. Note $\tau _I \circ \tau _\varphi ^{-1}(x,y) = (x+p,y)$. Hence $\varphi \circ \phi \circ \tau _\varphi ^{-1}(x,y) = \widehat{\phi }(x+p,y)-p$. Same for $\psi $. It is readily verified the assumptions in the proposition ensure H1, H2 and H3 are satisfied.) Hence, from Lemma 41, $\tilde{\phi }$ and $\tilde{\psi }$ satisfy C1–C2. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Manton, J.H. A framework for generalising the Newton method and other iterative methods from Euclidean space to manifolds. Numer. Math. 129, 91–125 (2015). https://doi.org/10.1007/s00211-014-0630-4

Download citation

Received: 25 August 2012
Revised: 25 July 2013
Published: 14 May 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s00211-014-0630-4

Mathematics Subject Classification

49M15

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework for generalising the Newton method and other iterative methods from Euclidean space to manifolds

Abstract

Access this article

Similar content being viewed by others

A class of spectral conjugate gradient methods for Riemannian optimization

A Riemannian BFGS Method for Nonconvex Optimization Problems

Memoryless Quasi-Newton Methods Based on the Spectral-Scaling Broyden Family for Riemannian Optimization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Rate of Convergence of Iterates on Manifolds

Lemma 26

Proof

Lemma 27

Proof

Definition 28

Appendix B: Local parametrisations

Example 29

Lemma 30

Proof

Lemma 31

Proof

Lemma 32

Lemma 33

Lemma 34

Proof

Lemma 35

Proof

Lemma 36

Proof

Appendix C: Further results on the generalised Newton method

1.1 Appendix C.1.: Intrinsic conditions

Proposition 37

Proof

Proposition 38

Proof

1.2 Appendix C.2.: Sufficient conditions

Lemma 39

Proof

Remark 40

Lemma 41

Proof

Lemma 42

Proof

1.3 Appendix C.3.: Embedded submanifolds of Euclidean space

Lemma 43

Proof

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation