A framework of constraint preserving update schemes for optimization on Stiefel manifold

Jiang, Bo; Dai, Yu-Hong

doi:10.1007/s10107-014-0816-7

A framework of constraint preserving update schemes for optimization on Stiefel manifold

Full Length Paper
Series A
Published: 17 September 2014

Volume 153, pages 535–575, (2015)
Cite this article

Mathematical Programming Submit manuscript

Bo Jiang¹ &
Yu-Hong Dai¹

2115 Accesses
67 Citations
1 Altmetric
Explore all metrics

Abstract

This paper considers optimization problems on the Stiefel manifold $X^{\mathsf{T}}X=I_p$, where $X\in \mathbb {R}^{n \times p}$ is the variable and $I_p$ is the $p$-by-$p$ identity matrix. A framework of constraint preserving update schemes is proposed by decomposing each feasible point into the range space of $X$ and the null space of $X^{\mathsf{T}}$. While this general framework can unify many existing schemes, a new update scheme with low complexity cost is also discovered. Then we study a feasible Barzilai–Borwein-like method under the new update scheme. The global convergence of the method is established with an adaptive nonmonotone line search. The numerical tests on the nearest low-rank correlation matrix problem, the Kohn–Sham total energy minimization and a specific problem from statistics demonstrate the efficiency of the new method. In particular, the new method performs remarkably well for the nearest low-rank correlation matrix problem in terms of speed and solution quality and is considerably competitive with the widely used SCF iteration for the Kohn–Sham total energy minimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonmonotone feasible arc search algorithm for minimization on Stiefel manifold

Article 08 May 2023

A matrix-free implementation of Riemannian Newton’s method on the Stiefel manifold

Article 25 October 2016

Maximization of the sum of the trace ratio on the Stiefel manifold, II: Computation

Article 26 April 2014

Notes

Manton [35] used the SVD decomposition of $X - \tau D$ to obtain the projection, and the cost is higher than that of the polar decomposition.
It can be downloaded from http://www.math.nus.edu.sg/~matsundf/#Codes.
It can be downloaded from http://optman.blogs.rice.edu/.
Dr. Qingna Li provided us this matrix kindly.

References

Abrudan, T., Eriksson, J., Koivunen, V.: Steepest descent algorithms for optimization under unitary matrix constraint. IEEE Trans. Signal Process. 56(3), 1134–1147 (2008)
Article MathSciNet Google Scholar
Abrudan, T., Eriksson, J., Koivunen, V.: Conjugate gradient algorithm for optimization under unitary matrix constraint. Sig. Process. 89(9), 1704–1714 (2009)
Article MATH Google Scholar
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)
Book MATH Google Scholar
Absil, P.A., Malick, J.: Projection-like retractions on matrix manifolds. SIAM J. Optim. 22(1), 135–158 (2012)
Article MathSciNet MATH Google Scholar
Andreani, R., Birgin, E.G., Martínez, J.M., Schuverdt, M.L.: On augmented Lagrangian methods with general lower-level constraints. SIAM J. Optim. 18(4), 1286–1309 (2007)
Article MathSciNet MATH Google Scholar
Balogh, J., Csendes, T., Rapcsák, T.: Some global optimization problems on Stiefel manifolds. J. Global Optim. 30(1), 91–101 (2004)
Article MathSciNet MATH Google Scholar
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Article MathSciNet MATH Google Scholar
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10(4), 1196–1211 (2000)
Article MathSciNet MATH Google Scholar
Bolla, M., Michaletzky, G., Tusnády, G., Ziermann, M.: Extrema of sums of heterogeneous quadratic forms. Linear Algebra Appl. 269(1), 331–365 (1998)
Article MathSciNet MATH Google Scholar
Borsdorf, R.: An Algorithm for Finding the Optimal Embedding of a Symmetric Matrix Into the Set of Diagonal Matrices. Technical report, University of Manchester (2012)
Dai, Y.H., Fletcher, R.: Projected Barzilai–Borwein methods for large-scale box-constrained quadratic programming. Numer. Math. 100(1), 21–47 (2005)
Article MathSciNet MATH Google Scholar
Dai, Y.H., Fletcher, R.: New algorithms for singly linearly constrained quadratic programs subject to lower and upper bounds. Math. Program. 106(3), 403–421 (2006)
Article MathSciNet MATH Google Scholar
Dai, Y.H., Hager, W.W., Schittkowski, K., Zhang, H.C.: The cyclic Barzilai–Borwein method for unconstrained optimization. IMA J. Numer. Anal. 26(3), 604–627 (2006)
Article MathSciNet MATH Google Scholar
Dai, Y.H., Liao, L.Z.: R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22, 1–10 (2002)
Article MathSciNet MATH Google Scholar
Dai, Y.H., Zhang, H.C.: Adaptive two-point stepsize gradient algorithm. Numer. Algorithms 27(4), 377–385 (2001)
Article MathSciNet MATH Google Scholar
d’Aspremont, A., Ei Ghaoui, L., Jordan, M.I., Lanckriet, G.R.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3), 434–448 (2007)
Article MathSciNet MATH Google Scholar
Davis, T., Hu, Y.F.: University of Florida sparse matrix collection. Technical report, University of Florida (2009)
Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)
Article MathSciNet MATH Google Scholar
Eldén, L., Park, H.: A Procrustes problem on the Stiefel manifold. Numer. Math. 82(4), 599–619 (1999)
Article MathSciNet MATH Google Scholar
Fletcher, R.: On the Barzilai–Borwein method. In: Qi, L.Q., Teo, K., Yang, X.Q. (eds.) Optimization and Control with Applications, Applied Optimization, vol. 96, pp. 235–256. Springer, US (2005)
Chapter Google Scholar
Flury, B.: Common principal components & related multivariate models. Wiley, New York (1988)
MATH Google Scholar
Gao, Y., Sun, D.F.: A majorized penalty approach for calibrating rank constrained correlation matrix problems. Technical report, National University of Signapore (2010)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins Univerisity Press, Maryland (1996)
MATH Google Scholar
Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 23, 707–716 (1986)
Article MathSciNet MATH Google Scholar
Grubišić, I., Pietersz, R.: Efficient rank reduction of correlation matrices. Linear Algebra Appl. 422(2), 629–653 (2007)
MathSciNet MATH Google Scholar
Jiang, B., Dai, Y.H.: A framework of constraint preserving update schemes for optimization on Stiefel manifold. Technical report, Institue of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Sciences, Chinese Academy of Sicences (2012)
Jiang, B., Dai, Y.H.: Feasible Barzilai–Borwein-like methods for extreme symmetric eigenvalue problems. Optim. Methods Softw. 28(4), 756–784 (2013)
Article MathSciNet MATH Google Scholar
Jiang, K.F., Sun, D.F., Toh, K.C.: An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP. SIAM J. Optim. 22(3), 1042–1064 (2012)
Article MathSciNet MATH Google Scholar
Joho, M., Mathis, H.: Joint diagonalization of correlation matrices by using gradient methods with application to blind signal separation. In: IEEE Proceedings of the Sensor Array and Multichannel Signal Processing Workshop, 2002, pp. 273–277 (2002)
Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)
MathSciNet MATH Google Scholar
Lai, R., Wen, Z., Yin, W., Gu, X., Lui, L.M.: Folding-free global conformal mapping for genus-0 surfaces by harmonic energy minimization. J. Sci. Comput. 58(3), 705–725 (2014)
Article MathSciNet MATH Google Scholar
Li, L., Toh, K.C.: An inexact interior point method for L1-regularized sparse covariance selection. Math. Program. Comput. 2, 291–315 (2010)
Article MathSciNet MATH Google Scholar
Li, Q.N., Qi, H.D.: A sequential semismooth Newton method for the nearest low-rank correlation matrix problem. SIAM J. Optim. 21(4), 1–26 (2011)
MathSciNet Google Scholar
Liu, Y.F., Dai, Y.H., Luo, Z.Q.: On the complexity of leakage interference minimization for interference alignment. In: 2011 IEEE 12th International Workshop on Signal Processing Advances in Wireless Communications, pp. 471–475 (2011)
Manton, J.H.: Optimization algorithms exploiting unitary constraints. IEEE Trans. Signal Process. 50(3), 635–650 (2002)
Article MathSciNet Google Scholar
Nishimori, Y., Akaho, S.: Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold. Neurocomputing 67, 106–135 (2005)
Article Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer Series in Operations Research and Financial Engineering. Springer, New York (2006)
Peters, S.W., Heath, R.W.: Interference alignment via alternating inimization. In: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2445–2448. IEEE Computer Society (2009)
Pietersz, R., Groenen, P.J.: Rank reduction of correlation matrices by majorization. Quant. Financ. 4(6), 649–662 (2004)
Article MathSciNet Google Scholar
Qi, H.D., Sun, D.F.: A quadratically convergent Newton method for computing the nearest correlation matrix. SIAM J. Matrix Anal. Appl. 28(2), 360–385 (2007)
Article MathSciNet Google Scholar
Rapcsák, T.: On minimization of sums of heterogeneous quadratic functions on Stiefel manifolds. In: Migdalas, A., Pardalos, P.M., Värbrand, P. (eds.) From Local to Global Optimization, vol. 53, pp. 277–290. Kluwer Academic Publishers (2001)
Rapcsák, T.: On minimization on Stiefel manifolds. Eur. J. Oper. Res. 143(2), 365–376 (2002)
Article MATH Google Scholar
Raydan, M.: On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 13, 321–326 (1993)
Article MathSciNet MATH Google Scholar
Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7(1), 26–33 (1997)
Article MathSciNet MATH Google Scholar
Rebonato, R., Jäckel, P.: The most general methodology to creating a valid correlation matrix for risk management and option pricing purposes. J. Risk 2, 17–27 (1999)
Google Scholar
Saad, Y.: Numerical Methods for Large Eigenvalue Problems. Manchester University Press, Manchester (1992)
MATH Google Scholar
Savas, B., Lim, L.H.: Quasi-Newton methods on Grassmannians and multilinear approximations of tensors. SIAM J. Sci. Comput. 32(6), 3352–3393 (2010)
Article MathSciNet MATH Google Scholar
Schönemann, P.H.: A generalized solution of the orthogonal Procrustes problem. Psychometrika 31(1), 1–10 (1966)
Article MathSciNet MATH Google Scholar
Stiefel, E.: Richtungsfelder und fernparallelismus in n-dimensionalen mannigfaltigkeiten. Comment. Math. Helv. 8(1), 305–353 (1935)
Article MathSciNet Google Scholar
Sun, W.Y., Yuan, Y.X.: Optimization Theory and Methods, Springer Optimization and Its Applications, vol. 1. Springer, New York (2006)
Google Scholar
Theis, F., Cason, T., Absil, P.A.: Soft dimension reduction for ica by joint diagonalization on the stiefel manifold. In: Adali, T., Jutten, C., Romano, J.M.T., Barros, A. (eds.) Independent Component Analysis and Signal Separation. Lecture Notes in Computer Science, vol. 5441, pp. 354–361. Springer, Berlin (2009)
Chapter Google Scholar
Toint, P.L.: Non-monotone trust-region algorithms for nonlinear optimization subject to convex constraints. Math. Program. 77(3), 69–94 (1997)
Article MathSciNet MATH Google Scholar
Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013)
Article MathSciNet MATH Google Scholar
Yang, C., Meza, J.C., Lee, B., Wang, L.W.: KSSOLV—a Matlab toolbox for solving the Kohn–Sham equations. ACM Trans. Math. Softw. 36, 1–35 (2009)
Article MathSciNet Google Scholar
Yang, C., Meza, J.C., Wang, L.W.: A constrained optimization algorithm for total energy minimization in electronic structure calculations. J. Comput. Phys. 217(2), 709–721 (2006)
Article MathSciNet MATH Google Scholar
Yang, C., Meza, J.C., Wang, L.W.: A trust region direct constrained minimization algorithm for the Kohn–Sham equation. SIAM J. Sci. Comput. 29(5), 1854–1875 (2007)
Article MathSciNet MATH Google Scholar
Zhang, H.C., Hager, W.W.: A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim. 14(4), 1043–1056 (2004)
Article MathSciNet MATH Google Scholar
Zhang, L.H., Liao, L.Z.: An alternating variable method for the maximal correlation problem. J. Global Optim. 54(1), 199–218 (2012)
Article MathSciNet MATH Google Scholar
Zhou, B., Gao, L., Dai, Y.H.: Gradient methods with adaptive step-sizes. Comput. Optim. Appl. 35(1), 69–86 (2006)
Article MathSciNet MATH Google Scholar
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)
Article MathSciNet Google Scholar

Download references

Acknowledgments

We are very grateful to Zaiwen Wen for useful discussions on the optimization on the Stiefel manifold and Qingna Li for discussions on the nearest low-rank correlation matrix problem. We thank Zaiwen Wen and Wotao Yin for sharing their code FOptM, Houduo Qi and Qingna Li for sharing their code SemiNewton, Defeng Sun and Yan Gao for sharing their code PenCorr. We also thank Kaifeng Jiang, Defeng Sun and Kim-Chuan Toh for sharing the gene correlation matrices for the nearest correlation matrix problem. Many thanks are also due to the associate editor, two anonymous referees and the editor, Prof. Alexander Shapiro, whose suggestions and comments greatly improved the quality of this paper.

Author information

Authors and Affiliations

LSEC, ICMSEC, Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences, Beijing , 100190, China
Bo Jiang & Yu-Hong Dai

Authors

Bo Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Hong Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu-Hong Dai.

Additional information

This work was partly supported by the National Key Basic Research Program of China (Grant No. 2015CB856000), the China National Funds for Distinguished Young Scientists (no. 11125107), the Chinese NSF grants (nos. 10831106 and 81211130105) and the CAS grant (no. kjcx-yw-s7-03).

Appendices

Appendix 1: Details of approach I and II in Sect. (3)

1.1 Details of approach I

Using the condition that $Z(\tau ) \equiv I_p$ and $\widetilde{R}'(0) = - X^{\mathsf{T}} E$ and denoting ${\mathcal {A}}(\tau ) = \left[ \begin{array}{c} \widetilde{R}(\tau ) \\ W \widetilde{N}(\tau ) \end{array}\right] $, it follows from (3.4) and (3.6) that

$$\begin{aligned} {\mathcal {A}}'(0) = \left[ \begin{array}{cc} -X^{\mathsf{T}}E &{}~B \\ W &{} ~F \end{array}\right] \left[ \begin{array}{c} I_p \\ 0\end{array}\right] \ \mathrm {and} \ {\mathcal {A}}(0) = \left[ \begin{array}{c} I_p \\ 0\end{array}\right] , \end{aligned}$$

where $B \in {\mathbb {R}}^{n\times p}$, $F \in {\mathbb {R}}^{p \times p}$. Solving the above ordinary differential equations, we get that

$$\begin{aligned} {\mathcal {A}}(\tau ) = \exp \left( \tau \left[ \begin{array}{cc} -X^{\mathsf{T}}E &{}~B \\ W &{} ~F \end{array}\right] \right) \left[ \begin{array}{c} I_p \\ 0\end{array}\right] . \end{aligned}$$

(8.1)

Since $Z(\tau ) \equiv I_p$, we know from (3.3) and the definition of ${\mathcal {A}}(\tau )$ that

$$\begin{aligned} Y(\tau ) = \big [\begin{array}{l} X, \ I_p\end{array} \big ] {\mathcal {A}}(\tau ), \end{aligned}$$

and ${\mathcal {A}}(\tau )^{\mathsf{T}} {\mathcal {A}}(\tau ) = I_p$. It follows that the matrix inside the brackets of (8.1) will be skew-symmetric. This means that $B = -W^{\mathsf{T}}$, $F + F^{\mathsf{T}} = 0$ and $X^{\mathsf{T}}E+E^{\mathsf{T}}X=0$ (i.e., $E \in {\mathcal {T}}_X$). Let $W = QR$ be the unique QR factorization of $W$ and assume that $F$ takes the form $F = Q{\widetilde{F}}Q^{\mathsf{T}}$, where ${\widetilde{F}}\in {\mathbb {R}}^{p\times p}$ is any skew-symmetric matrix. Then we can write

$$\begin{aligned} Y(\tau ) =&\big [\begin{array}{l} X, \ I_p\end{array} \big ] \exp \left( \tau \left[ \begin{array}{cc} I_p &{}~0 \\ 0 &{} ~Q \end{array}\right] \left[ \begin{array}{cc} -X^{\mathsf{T}}E &{}~-R^{\mathsf{T}} \\ R &{} ~{\widetilde{F}}\end{array}\right] \left[ \begin{array}{cc} I_p &{}~0 \\ 0 &{} ~Q^{\mathsf{T}} \end{array}\right] \right) \left[ \begin{array}{c} I_p \\ 0\end{array}\right] \nonumber \\ =&\big [\begin{array}{l} X, \ Q \end{array} \big ] \exp \left( \tau \left[ \begin{array}{cc} -X^{\mathsf{T}}E &{}~-R^{\mathsf{T}} \\ R &{} ~{\widetilde{F}}\end{array}\right] \right) \left[ \begin{array}{c} I_p \\ 0\end{array}\right] , \end{aligned}$$

(8.2)

where the second equality is due that $Q$ has orthonormal columns. The update scheme (8.2) can be regarded as a generalized geodesic update scheme, since letting ${\widetilde{F}}= 0$, (8.2) reduces to the geodesic update scheme (2.1).

1.2 Details of approach II

Based on the choices that $\widetilde{R}(\tau ) = I_p + \tau \widetilde{R}'(0)$ and $\widetilde{N}(\tau ) = \tau I_p$, we can get $Z(\tau )$ from (3.3) by the polar decomposition or the Cholesky factorization.

If by the polar decomposition, $Z(\tau )$ and $Z'(\tau )$ are always symmetric. In this case, it is easy to show that $Y(\tau ; X) = \mathcal {P}_{{\mathcal {S}}_{n,p}}{{\big (X \widetilde{R}(\tau ) + W \widetilde{N}(\tau ) \big )}}$, which with (3.6), (3.7) and (3.8) further implies that

$$\begin{aligned} Y(\tau ; X) = \mathcal {P}_{{\mathcal {S}}_{n,p}}\big (X - \tau E - \tau X Z'(0) \big ), \end{aligned}$$

(8.3)

where $Z'(0)$ is any $p$-by-$p$ symmetric matrix. If $Z'(0) = 0$, (8.3) reduces to the ordinary polar decomposition or Manton’s projection update scheme. If $Z'(0) = \text{ sym }(X^{\mathsf{T}}G)$ and $E = G - X \text{ sym }(X^{\mathsf{T}}G),$ it becomes the ordinary gradient projection update scheme.

If by the Cholesky factorization, $Z(\tau )$ and $Z'(\tau )$ are always upper triangular. Similarly, we can derive

$$\begin{aligned} Y(\tau ; X) = \text{ qr }\big (X - \tau E - \tau X Z'(0)\big ), \end{aligned}$$

(8.4)

where $Z'(0)$ is any $p$-by-$p$ upper triangular matrix. When $Z'(0) = 0$, (8.4) reduces to the ordinary QR factorization update scheme.

Appendix 2: Proof of lemma 3.1

The fact that $X^{\mathsf{T}}D_{\rho }$ is skew-symmetric implies that $J$ is invertible, thus $Y(\tau )$ is well-defined. (i) follows from the construction of the update scheme (3.11).

Meanwhile, we know that $Y'(0) = -D_{\rho }$, which with the chain rule shows that

$$\begin{aligned} {\mathcal {F}}'_{\tau }(Y(0)) = - \langle G, D_{\rho }\rangle = -\langle \nabla {\mathcal {F}}, P_X D_{\rho }\rangle , \end{aligned}$$

where the second equality is due to $D_{\rho }\in {\mathcal {T}}_X$ and the definition (2.8) of $\nabla {\mathcal {F}}$. Recall that $P_X = I_n - \frac{1}{2} XX^{\mathsf{T}}$. Substituting (3.12) into the above equation yields

$$\begin{aligned} {\mathcal {F}}'_{\tau }(Y(0))= -\langle \nabla \mathcal {F}, (I_n + (\rho -1 )XX^{\mathsf{T}})\nabla \mathcal {F}\rangle \le - \min \{\rho ,1\} \Vert \nabla \mathcal {F}\Vert _{\mathsf{F}}^2. \end{aligned}$$

So (ii) is true.

We prove (iii) by contradiction. Multiplying $X^{\mathsf{T}}$ from both sides the expression of $Y(\tau )$ in (3.11) and using $X^{\mathsf{T}}W = 0$, we get that $2J^{-1} - I_p = X^{\mathsf{T}}Y$. Assume that there exists a $p$-by-$p$ orthogonal matrix $Q_p$ such that $Y = XQ_p$. Then we have $2J^{-1} - I_p = Q_p$; i.e., $2I_p - J = Q_pJ$. It follows from $(2I_p - J)^{\mathsf{T}} (2I_p-J) = (Q_pJ)^{\mathsf{T}}Q_pJ$ and $Q_p^\mathsf{T}Q_p = I_p$ that

$$\begin{aligned} J^{\mathsf{T}} + J = 4I_p, \end{aligned}$$

which is a contradiction to the definition of $J$. The contradiction shows the truth of (iii).

For (iv), it is obvious that $\Vert J\Vert _2 \le 1 + \frac{\tau ^2}{4}\Vert W\Vert _{\mathsf{F}}^2 + \frac{\tau }{2}\Vert X^{\mathsf{T}}D_{\rho }\Vert _{\mathsf{F}}$, where $\Vert \cdot \Vert _2$ is the spectral norm. It follows from $-D_{\rho }= W - XX^{\mathsf{T}} D_{\rho }$ and $X^{\mathsf{T}} W=0$ that $\Vert D_{\rho }\Vert _{\mathsf{F}}^2 = \Vert W\Vert _{\mathsf{F}}^2 + \Vert X^{\mathsf{T}}D_{\rho }\Vert _{\mathsf{F}}^2.$ Notice that $\upsilon = \tau \Vert D_{\rho }\Vert _{\mathsf{F}}$. Then we have

$$\begin{aligned} \Vert J\Vert _2&\le 1 + \frac{\upsilon ^2}{4} \cdot \frac{\Vert W\Vert _{\mathsf{F}}^2}{\Vert D_{\rho }\Vert _{\mathsf{F}}^2 } + \frac{\upsilon }{2} \cdot \frac{\Vert X^{\mathsf{T}} D_{\rho }\Vert _{\mathsf{F}}^{}}{\Vert D_{\rho }\Vert _{\mathsf{F}}^{}}\\&\le 1 + \max _{0 \le t \le 1}\left( \frac{\upsilon ^2}{4} (1 - t^2) + \frac{\upsilon }{2} t\right) \le (5 + \upsilon ^2)/4. \end{aligned}$$

On the other hand, the relation $2J^{-1} = X^{\mathsf{T}} Y + I_p$ indicates that $\Vert J^{-1}\Vert _2 \le 1$. Thus (iv) is true.

In the case that $p=1$ or $n$, it is not difficult to simplify the corresponding update schemes and we omit the details here. Thus, we complete the proof.

Appendix 3: Proof of lemma 3.2

The relation $\langle G, D^{(i)} \rangle = e_i^{\mathsf{T}}\left( G^{\mathsf{T}} \nabla {\mathcal {F}}\right) e_i$ can be easily verified from the definition of $D^{(i)}$. With the definition of $D^{(q)}$ and $X^{\mathsf{T}} X = I_p$, we have that

$$\begin{aligned} {{X^{\mathsf{T}} D^{(q)} = X^{\mathsf{T}} G_{(q)}^{} e_q^{\mathsf{T}} - e_q G_{(q)}^{\mathsf{T}}X = \frac{2}{\tau }\left( be_q^{\mathsf{T}} - e_q b^{\mathsf{T}}\right) \in {\mathcal {T}}_X}}. \end{aligned}$$

Thus $Y(\tau )$ is well-defined. Moreover, we have that

$$\begin{aligned} W = -(I_n - XX^{\mathsf{T}}) D^{(q)} = -(I_n - XX^{\mathsf{T}}) G_{(q)}^{}e_q^{\mathsf{T}} \end{aligned}$$

Hence, the matrix $J = I_p + \frac{\tau ^2}{4} W^{\mathsf{T}}W + \frac{\tau }{2}X^{\mathsf{T}} D^{(q)}$ can be expressed as

$$\begin{aligned} J = I_p + \xi e_q^{}e_q^{\mathsf{T}} + b e_q^{\mathsf{T}} - e_q b^{\mathsf{T}} = {{I_p + \big [\begin{array}{cc} e_q,&\ b \end{array} \big ] \left( \left[ \begin{array}{cc} \xi &{}\ -1 \\ 1 &{} \ 0\end{array} \right] \left[ \begin{array}{c}e_q^{\mathsf{T}}\\ b^{\mathsf{T}}\end{array}\right] \right) }}, \end{aligned}$$

(10.1)

where $\xi = \frac{\tau ^2}{4}G_{(q)}^{\mathsf{T}}(I_n - XX^{\mathsf{T}}) G_{(q)}$. By the formulations of $b$ and $\xi $, we easily see that

$$\begin{aligned} {{e_q^{\mathsf{T}} b = 0, \ \xi =\alpha \, - \, b^{\mathsf{T}}b.}} \end{aligned}$$

Applying the SMW formula to (10.1) and using the above relations, we can obtain (3.15).

Moreover, notice that $\nabla {\mathcal {F}}= \sum _{i=1}^{p} D^{(i)}$. Thus we have

$$\begin{aligned} p\, {\mathcal {F}}'_{\tau }(Y(0)) = -p\, \langle G, D^{(q)} \rangle \le - \langle G, \nabla {\mathcal {F}}\rangle \le -\frac{1}{2}\Vert \nabla {\mathcal {F}}\Vert _{\mathsf{F}}^2, \end{aligned}$$

where the first inequality follows from the choice of $q$ in (3.14) and the second one is due to (ii) of Lemma 3.1. The proof is completed.

Appendix 4: Proof of proposition 3.1

Before going into the details of the proof, we recall the following fact on the inverse of a $2 \times 2$ block matrix.

Fact 11.1

Assume that $T = \left[ \begin{array}{cc} T_{11} &{} ~T_{12}\\ T_{21} &{}~T_{22} \end{array} \right] $, where $T_{22}^{}$ and $T_{11}^{} - T_{12}^{} T_{22}^{-1}T_{21}^{}$ are invertible. Then $T$ is invertible and

$$\begin{aligned} T^{-1} =\left[ \begin{array}{ll} ( T_{11}^{} - T_{12}^{} T_{22}^{-1}T_{21}^{})^{-1} &{} - ( T_{11}^{} - T_{12}^{} T_{22}^{-1}T_{21}^{})^{-1} T_{12}^{}T_{22}^{-1} \\ -T_{22}^{-1}T_{21}^{} ( T_{11}^{} - T_{12}^{} T_{22}^{-1}T_{21}^{})^{-1} &{}~ ~ T_{22}^{-1}T_{21}^{} ( T_{11}^{} - T_{12}^{} T_{22}^{-1}T_{21}^{})^{-1} T_{12}^{}T_{22}^{-1} + T_{22}^{-1}\end{array} \right] . \end{aligned}$$

First, we show that the update scheme (2.3) is well-defined, provided that $I_p+\frac{\tau }{4}X^{\mathsf{T}}D$ is invertible. Consider the update scheme (2.3) with $U=[P_X D, \,X]$ and $V=[X,\, -P_XD]$. It follows from $P_X = I_n - \frac{1}{2}XX^{\mathsf{T}}$ that $X^{\mathsf{T}}P_X D = \frac{1}{2} X^{\mathsf{T}}D$. Combining this with $X^{\mathsf{T}}X = I_p$ and $X^{\mathsf{T}}D$ being skew-symmetric, we can rewrite

$$\begin{aligned} I_{2p}+\frac{\tau }{2}V^{\mathsf{T}}U=\left[ \begin{array}{cc}I_p+\frac{\tau }{4}X^{\mathsf{T}}D &{} \frac{\tau }{2}I_p\\ -\frac{\tau }{2}D^{\mathsf{T}}P_X^{\mathsf{T}}P_X^{}D &{}~ I_p +\frac{\tau }{4}X^{\mathsf{T}}D\end{array}\right] . \end{aligned}$$

(11.1)

Moreover, we derive that

$$\begin{aligned} {{W^{\mathsf{T}}W = D^{\mathsf{T}}(I_n - XX^{\mathsf{T}})D = D^{\mathsf{T}}P_X^{\mathsf{T}} P_X^{} D + \frac{1}{4} \left( X^{\mathsf{T}}D\right) ^2,}} \end{aligned}$$

which with (3.11) implies that

$$\begin{aligned} J = \Big (I_p + \frac{\tau }{4}X^{\mathsf{T}} D \Big )^2 + \frac{\tau ^2}{4}D^{\mathsf{T}}P_X^{\mathsf{T}}P_X^{} D. \end{aligned}$$

Plugging this into (11.1) yields

$$\begin{aligned} I_{2p}+\frac{\tau }{2}V^{\mathsf{T}}U:= \left[ \begin{array}{cc}T_{11} &{} ~T_{12} \\ T_{21} &{}~ T_{22} \end{array}\right] =\left[ \begin{array}{cc}I_p+\frac{\tau }{4}X^{\mathsf{T}}D &{} \frac{\tau }{2}I_p\\ \frac{2}{\tau } \left( \left( I_p +\frac{\tau }{4}X^{\mathsf{T}}D\right) ^2 - J\right) &{}~ I_p +\frac{\tau }{4}X^{\mathsf{T}}D\end{array}\right] , \end{aligned}$$

(11.2)

where $T_{ij} \in {\mathbb {R}}^{p\times p}$ $(i,j \in \{1,2\})$. By simple calculations, we know that

$$\begin{aligned} {{T_{11}^{} - T_{12}^{} T_{22}^{-1} T_{21}^{} = \left( I_p + \frac{\tau }{4}X^{\mathsf{T}}D\right) ^{-1} J}}. \end{aligned}$$

Thus it follows from Fact 11.1 that $I_{2p}+\frac{\tau }{2}V^{\mathsf{T}}U$ is invertible and the update scheme (2.3) is well-defined.

We now prove the equivalence. With Lemma 4.1 and (11.2), some tedious manipulations yield that $(I_{2p}+\frac{\tau }{2}V^{\mathsf{T}}U)^{-1}=\left[ \begin{array}{cc}M_{11} &{} ~M_{12}\\ M_{21} &{} ~M_{22}\end{array}\right] $, where

$$\begin{aligned}\begin{array}{ll} M_{11} = J^{-1}(I_p + \frac{\tau }{4}X^{\mathsf{T}} D), &{} ~ M_{12} = -\frac{\tau }{2}J^{-1}, \\ M_{21} = \frac{2}{\tau }\big (I_p- {{M_{22}}}(I_p+\frac{\tau }{4}X^{\mathsf{T}}D)\big ), &{}~ M_{22}=(I_p+\frac{\tau }{4}X^{\mathsf{T}}D)J^{-1}. \end{array} \end{aligned}$$

Direct calculations show that

$$\begin{aligned} M_{11}+ \frac{1}{2}M_{12}X^{\mathsf{T}}D = J^{-1}, \quad \tau \Big (M_{21} + \frac{1}{2} M_{22}X^{\mathsf{T}}D\Big ) = 2I_p - 2\Big (I_p+ \frac{\tau }{4}X^{\mathsf{T}}D\Big )J^{-1}. \end{aligned}$$

(11.3)

Finally, we can obtain

$$\begin{aligned} Y_{\mathrm {wy}}(\tau ) ={}&X - \tau \left[ \begin{array}{l}P_XD,\ X\end{array}\right] \Bigg (\left[ \begin{array}{cc}M_{11} &{} ~M_{12}\\ M_{21} &{} ~M_{22}\end{array}\right] \left[ \begin{array}{c} I_p\\ \frac{1}{2}X^{\mathsf{T}}D \end{array}\right] \Bigg ) \nonumber \\ ={}&X-\tau \Big (\frac{1}{2} XX^{\mathsf{T}}D -W\Big )\Big (M_{11}+ \frac{1}{2}M_{12}X^{\mathsf{T}}D\Big ) -\tau X\Big (M_{21}+\frac{1}{2}M_{22}X^{\mathsf{T}}D\Big )\nonumber \\ ={}&(2X + \tau W)J^{-1}- X, \end{aligned}$$

where the first equality uses (2.3) and $X^{\mathsf{T}}X = I_p$, the second and third equalities are due to $P_XD = \frac{1}{2} XX^{\mathsf{T}}D -W$ and (11.3), respectively. The proof is completed.

Appendix 5: Some details of Table 1

We first review the computational costs of some basic matrix operations. Given $A \in {\mathbb {R}}^{n_1 \times n_2}$ and $B \in {\mathbb {R}}^{n_2 \times n_2}$, computing $AB$ needs $2n_1^{}n_2^2$ flops while computing $A^{\mathsf{T}} A$ only needs $n_1^{}n_2^2$ flops. Computing $\text{ qr }(A)$ by the modified Gram-Schmidt algorithm needs $2n_1^{}n_2^2$ flops (here $n_1\ge n_2$). If $B$ is symmetric, computing the eigenvalue decomposition $B = P\Sigma P^{\mathsf{T}}$ with orthogonal $ P\in {\mathbb {R}}^{n_2 \times n_2}$ and diagonal $\Sigma \in {\mathbb {R}}^{n_2 \times n_2}$ by the symmetric QR algorithm needs $9n_2^3$ flops. If $B$ is skew-symmetric, computing the exponential of $B$ needs $10n_2^3$ flops. If $B$ is symmetric positive definite, computing $B^{1/2}$ needs $10n_2^3$ flops. In addition, if $B$ is nonsingular, solving the matrix equation $B T = A^{\mathsf{T}}$ by the Gauss elimination with partial pivoting needs $2n_1^{}n_2^2 +2n_2^3/3$ flops. It follows that computing $B^{-1}$ needs $8n_2^3/3$ flops. We refer interested readers to [23] for more details.

To verify the computational costs for the corresponding update scheme in Table 1, we take the new update scheme (3.11) and the Wen-Yin update scheme (2.3) as examples. Notice that comparing with $O(np^2)$ or $O(n^3)$, the $O(p^2)$ term is omitted for $1\le p \le n$. For simplicity, we only consider the case that $1< p <n$. In the case that $p=1$ or $n$, the cost for the update schemes (3.11) and (2.3) can be easily obtained by a similar analysis. We omit its details here.

To analyze the computational cost for (3.11), since $E = D_{\rho }$ and hence $W = -G + XX^{\mathsf{T}}G$, we can rewrite the feasible curve as

$$\begin{aligned} Y(\tau ; X) = \left( X\Big (\frac{2}{\tau }I_p + X^{\mathsf{T}}G\Big ) - G\right) {\left( \frac{J(\tau )}{\tau }\right) ^{-1}} - X, \end{aligned}$$

and

$$\begin{aligned} J(\tau ) = I_p + \frac{\tau ^2}{4} (G^{\mathsf{T}}G - G^{\mathsf{T}}XX^{\mathsf{T}}G) + {{\rho \tau }}(X^{\mathsf{T}}G - G^{\mathsf{T}}X). \end{aligned}$$

Forming $J(\tau )$ needs $3np^2 + p^3$ flops involving computing $X^{\mathsf{T}}G$, $G^{\mathsf{T}}G$ and $(G^{\mathsf{T}}X )(X^{\mathsf{T}}G )$. The final assembly for $Y(\tau )$ consists of involving solving one matrix equation and performing one matrix multiplication and two matrix subtractions, and hence needs $4np^2 + 2np + 2p^3/3$ flops. Therefore, the total cost of forming $Y(\tau ; X)$ in (3.11) for any $\tau $ is $7np^2 + 2np + 5p^3/3$. While doing backtracking line searches, we need to update $Y(\tau ; X)$ for a different $\tau $. Denote the first trial and the new trial stepsizes by $\tau _{{{\mathrm {first}}}}$ and $\tau _{{{\mathrm {new}}}}$, respectively. It is easy to see that

$$\begin{aligned} Y(\tau _{{{\mathrm {new}}}}; X) = \left( X\Big (\frac{2}{\tau _{{{\mathrm {first}}}}}I_p + X^{\mathsf{T}}G\Big ) - G + \Big (\frac{2}{\tau _{{{\mathrm {new}}}}} -\frac{2}{\tau _{{{\mathrm {first}}}}}\Big )X\right) {{\left( \frac{J(\tau _{{{\mathrm {new}}}})}{\tau _{\mathrm {new}}}\right) ^{-1}}} - X. \end{aligned}$$

As $X\big (\frac{2}{\tau _{{{\mathrm {first}}}}}I_p + X^{\mathsf{T}}G\big ) - G$ is already computed in $Y(\tau _{{{\mathrm {first}}}}; X)$, we store this matrix and hence only need $2np^2 + 3np + 2p^3/3$ to compute $Y(\tau _{{{\mathrm {new}}}}; X).$

To analyze the computational cost for the Wen-Yin update scheme (2.3), we see that it takes $3np^2$ and $2np^2 + np$ flops to form $I_{2p} + \frac{\tau }{2} V^{\mathsf{T}} U$ and $P_X D_{\rho }$, respectively. The final assembly for $Y(\tau )$ needs $4np^2 + np + 40p^3/3$ flops. Hence, the total cost for the Wen-Yin update scheme is $9np^2 + 2np + 40p^3/3 $. When $\rho = 1/2,$ we see from [53] that $U = [G, X]$, $V = [X, -G]$ in (2.3) which implies that there is no need to compute $P_XD_{1/2}$ any more, and the total cost for forming the Wen-Yin update scheme can be reduced to $7np^2 + np + 40p^3/3$. The cost for updating $Y(\tau )$ with a new $\tau $ is $4np^2 + np + 40p^3/3$ since $V^{\mathsf{T}}U$ can be stored and the main work involves solving a matrix equation and the final assembly.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, B., Dai, YH. A framework of constraint preserving update schemes for optimization on Stiefel manifold. Math. Program. 153, 535–575 (2015). https://doi.org/10.1007/s10107-014-0816-7

Download citation

Received: 26 December 2012
Accepted: 02 September 2014
Published: 17 September 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s10107-014-0816-7

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework of constraint preserving update schemes for optimization on Stiefel manifold

Abstract

Access this article

Similar content being viewed by others

Nonmonotone feasible arc search algorithm for minimization on Stiefel manifold

A matrix-free implementation of Riemannian Newton’s method on the Stiefel manifold

Maximization of the sum of the trace ratio on the Stiefel manifold, II: Computation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Details of approach I and II in Sect. (3)

1.1 Details of approach I

1.2 Details of approach II

Appendix 2: Proof of lemma 3.1

Appendix 3: Proof of lemma 3.2

Appendix 4: Proof of proposition 3.1

Fact 11.1

Appendix 5: Some details of Table 1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A framework of constraint preserving update schemes for optimization on Stiefel manifold

Abstract

Access this article

Similar content being viewed by others

Nonmonotone feasible arc search algorithm for minimization on Stiefel manifold

A matrix-free implementation of Riemannian Newton’s method on the Stiefel manifold

Maximization of the sum of the trace ratio on the Stiefel manifold, II: Computation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Details of approach I and II in Sect. (3)

1.1 Details of approach I

1.2 Details of approach II

Appendix 2: Proof of lemma 3.1

Appendix 3: Proof of lemma 3.2

Appendix 4: Proof of proposition 3.1

Fact 11.1

Appendix 5: Some details of Table 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation