Abstract
This paper considers optimization problems on the Stiefel manifold \(X^{\mathsf{T}}X=I_p\), where \(X\in \mathbb {R}^{n \times p}\) is the variable and \(I_p\) is the \(p\)-by-\(p\) identity matrix. A framework of constraint preserving update schemes is proposed by decomposing each feasible point into the range space of \(X\) and the null space of \(X^{\mathsf{T}}\). While this general framework can unify many existing schemes, a new update scheme with low complexity cost is also discovered. Then we study a feasible Barzilai–Borwein-like method under the new update scheme. The global convergence of the method is established with an adaptive nonmonotone line search. The numerical tests on the nearest low-rank correlation matrix problem, the Kohn–Sham total energy minimization and a specific problem from statistics demonstrate the efficiency of the new method. In particular, the new method performs remarkably well for the nearest low-rank correlation matrix problem in terms of speed and solution quality and is considerably competitive with the widely used SCF iteration for the Kohn–Sham total energy minimization.
Similar content being viewed by others
Notes
Manton [35] used the SVD decomposition of \(X - \tau D\) to obtain the projection, and the cost is higher than that of the polar decomposition.
It can be downloaded from http://www.math.nus.edu.sg/~matsundf/#Codes.
It can be downloaded from http://optman.blogs.rice.edu/.
Dr. Qingna Li provided us this matrix kindly.
References
Abrudan, T., Eriksson, J., Koivunen, V.: Steepest descent algorithms for optimization under unitary matrix constraint. IEEE Trans. Signal Process. 56(3), 1134–1147 (2008)
Abrudan, T., Eriksson, J., Koivunen, V.: Conjugate gradient algorithm for optimization under unitary matrix constraint. Sig. Process. 89(9), 1704–1714 (2009)
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008)
Absil, P.A., Malick, J.: Projection-like retractions on matrix manifolds. SIAM J. Optim. 22(1), 135–158 (2012)
Andreani, R., Birgin, E.G., Martínez, J.M., Schuverdt, M.L.: On augmented Lagrangian methods with general lower-level constraints. SIAM J. Optim. 18(4), 1286–1309 (2007)
Balogh, J., Csendes, T., Rapcsák, T.: Some global optimization problems on Stiefel manifolds. J. Global Optim. 30(1), 91–101 (2004)
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10(4), 1196–1211 (2000)
Bolla, M., Michaletzky, G., Tusnády, G., Ziermann, M.: Extrema of sums of heterogeneous quadratic forms. Linear Algebra Appl. 269(1), 331–365 (1998)
Borsdorf, R.: An Algorithm for Finding the Optimal Embedding of a Symmetric Matrix Into the Set of Diagonal Matrices. Technical report, University of Manchester (2012)
Dai, Y.H., Fletcher, R.: Projected Barzilai–Borwein methods for large-scale box-constrained quadratic programming. Numer. Math. 100(1), 21–47 (2005)
Dai, Y.H., Fletcher, R.: New algorithms for singly linearly constrained quadratic programs subject to lower and upper bounds. Math. Program. 106(3), 403–421 (2006)
Dai, Y.H., Hager, W.W., Schittkowski, K., Zhang, H.C.: The cyclic Barzilai–Borwein method for unconstrained optimization. IMA J. Numer. Anal. 26(3), 604–627 (2006)
Dai, Y.H., Liao, L.Z.: R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22, 1–10 (2002)
Dai, Y.H., Zhang, H.C.: Adaptive two-point stepsize gradient algorithm. Numer. Algorithms 27(4), 377–385 (2001)
d’Aspremont, A., Ei Ghaoui, L., Jordan, M.I., Lanckriet, G.R.: A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3), 434–448 (2007)
Davis, T., Hu, Y.F.: University of Florida sparse matrix collection. Technical report, University of Florida (2009)
Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303–353 (1998)
Eldén, L., Park, H.: A Procrustes problem on the Stiefel manifold. Numer. Math. 82(4), 599–619 (1999)
Fletcher, R.: On the Barzilai–Borwein method. In: Qi, L.Q., Teo, K., Yang, X.Q. (eds.) Optimization and Control with Applications, Applied Optimization, vol. 96, pp. 235–256. Springer, US (2005)
Flury, B.: Common principal components & related multivariate models. Wiley, New York (1988)
Gao, Y., Sun, D.F.: A majorized penalty approach for calibrating rank constrained correlation matrix problems. Technical report, National University of Signapore (2010)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins Univerisity Press, Maryland (1996)
Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 23, 707–716 (1986)
Grubišić, I., Pietersz, R.: Efficient rank reduction of correlation matrices. Linear Algebra Appl. 422(2), 629–653 (2007)
Jiang, B., Dai, Y.H.: A framework of constraint preserving update schemes for optimization on Stiefel manifold. Technical report, Institue of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Sciences, Chinese Academy of Sicences (2012)
Jiang, B., Dai, Y.H.: Feasible Barzilai–Borwein-like methods for extreme symmetric eigenvalue problems. Optim. Methods Softw. 28(4), 756–784 (2013)
Jiang, K.F., Sun, D.F., Toh, K.C.: An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP. SIAM J. Optim. 22(3), 1042–1064 (2012)
Joho, M., Mathis, H.: Joint diagonalization of correlation matrices by using gradient methods with application to blind signal separation. In: IEEE Proceedings of the Sensor Array and Multichannel Signal Processing Workshop, 2002, pp. 273–277 (2002)
Journée, M., Nesterov, Y., Richtárik, P., Sepulchre, R.: Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11, 517–553 (2010)
Lai, R., Wen, Z., Yin, W., Gu, X., Lui, L.M.: Folding-free global conformal mapping for genus-0 surfaces by harmonic energy minimization. J. Sci. Comput. 58(3), 705–725 (2014)
Li, L., Toh, K.C.: An inexact interior point method for L1-regularized sparse covariance selection. Math. Program. Comput. 2, 291–315 (2010)
Li, Q.N., Qi, H.D.: A sequential semismooth Newton method for the nearest low-rank correlation matrix problem. SIAM J. Optim. 21(4), 1–26 (2011)
Liu, Y.F., Dai, Y.H., Luo, Z.Q.: On the complexity of leakage interference minimization for interference alignment. In: 2011 IEEE 12th International Workshop on Signal Processing Advances in Wireless Communications, pp. 471–475 (2011)
Manton, J.H.: Optimization algorithms exploiting unitary constraints. IEEE Trans. Signal Process. 50(3), 635–650 (2002)
Nishimori, Y., Akaho, S.: Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold. Neurocomputing 67, 106–135 (2005)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer Series in Operations Research and Financial Engineering. Springer, New York (2006)
Peters, S.W., Heath, R.W.: Interference alignment via alternating inimization. In: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2445–2448. IEEE Computer Society (2009)
Pietersz, R., Groenen, P.J.: Rank reduction of correlation matrices by majorization. Quant. Financ. 4(6), 649–662 (2004)
Qi, H.D., Sun, D.F.: A quadratically convergent Newton method for computing the nearest correlation matrix. SIAM J. Matrix Anal. Appl. 28(2), 360–385 (2007)
Rapcsák, T.: On minimization of sums of heterogeneous quadratic functions on Stiefel manifolds. In: Migdalas, A., Pardalos, P.M., Värbrand, P. (eds.) From Local to Global Optimization, vol. 53, pp. 277–290. Kluwer Academic Publishers (2001)
Rapcsák, T.: On minimization on Stiefel manifolds. Eur. J. Oper. Res. 143(2), 365–376 (2002)
Raydan, M.: On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 13, 321–326 (1993)
Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7(1), 26–33 (1997)
Rebonato, R., Jäckel, P.: The most general methodology to creating a valid correlation matrix for risk management and option pricing purposes. J. Risk 2, 17–27 (1999)
Saad, Y.: Numerical Methods for Large Eigenvalue Problems. Manchester University Press, Manchester (1992)
Savas, B., Lim, L.H.: Quasi-Newton methods on Grassmannians and multilinear approximations of tensors. SIAM J. Sci. Comput. 32(6), 3352–3393 (2010)
Schönemann, P.H.: A generalized solution of the orthogonal Procrustes problem. Psychometrika 31(1), 1–10 (1966)
Stiefel, E.: Richtungsfelder und fernparallelismus in n-dimensionalen mannigfaltigkeiten. Comment. Math. Helv. 8(1), 305–353 (1935)
Sun, W.Y., Yuan, Y.X.: Optimization Theory and Methods, Springer Optimization and Its Applications, vol. 1. Springer, New York (2006)
Theis, F., Cason, T., Absil, P.A.: Soft dimension reduction for ica by joint diagonalization on the stiefel manifold. In: Adali, T., Jutten, C., Romano, J.M.T., Barros, A. (eds.) Independent Component Analysis and Signal Separation. Lecture Notes in Computer Science, vol. 5441, pp. 354–361. Springer, Berlin (2009)
Toint, P.L.: Non-monotone trust-region algorithms for nonlinear optimization subject to convex constraints. Math. Program. 77(3), 69–94 (1997)
Wen, Z., Yin, W.: A feasible method for optimization with orthogonality constraints. Math. Program. 142(1–2), 397–434 (2013)
Yang, C., Meza, J.C., Lee, B., Wang, L.W.: KSSOLV—a Matlab toolbox for solving the Kohn–Sham equations. ACM Trans. Math. Softw. 36, 1–35 (2009)
Yang, C., Meza, J.C., Wang, L.W.: A constrained optimization algorithm for total energy minimization in electronic structure calculations. J. Comput. Phys. 217(2), 709–721 (2006)
Yang, C., Meza, J.C., Wang, L.W.: A trust region direct constrained minimization algorithm for the Kohn–Sham equation. SIAM J. Sci. Comput. 29(5), 1854–1875 (2007)
Zhang, H.C., Hager, W.W.: A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim. 14(4), 1043–1056 (2004)
Zhang, L.H., Liao, L.Z.: An alternating variable method for the maximal correlation problem. J. Global Optim. 54(1), 199–218 (2012)
Zhou, B., Gao, L., Dai, Y.H.: Gradient methods with adaptive step-sizes. Comput. Optim. Appl. 35(1), 69–86 (2006)
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)
Acknowledgments
We are very grateful to Zaiwen Wen for useful discussions on the optimization on the Stiefel manifold and Qingna Li for discussions on the nearest low-rank correlation matrix problem. We thank Zaiwen Wen and Wotao Yin for sharing their code FOptM, Houduo Qi and Qingna Li for sharing their code SemiNewton, Defeng Sun and Yan Gao for sharing their code PenCorr. We also thank Kaifeng Jiang, Defeng Sun and Kim-Chuan Toh for sharing the gene correlation matrices for the nearest correlation matrix problem. Many thanks are also due to the associate editor, two anonymous referees and the editor, Prof. Alexander Shapiro, whose suggestions and comments greatly improved the quality of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was partly supported by the National Key Basic Research Program of China (Grant No. 2015CB856000), the China National Funds for Distinguished Young Scientists (no. 11125107), the Chinese NSF grants (nos. 10831106 and 81211130105) and the CAS grant (no. kjcx-yw-s7-03).
Appendices
Appendix 1: Details of approach I and II in Sect. (3)
1.1 Details of approach I
Using the condition that \(Z(\tau ) \equiv I_p\) and \(\widetilde{R}'(0) = - X^{\mathsf{T}} E\) and denoting \({\mathcal {A}}(\tau ) = \left[ \begin{array}{c} \widetilde{R}(\tau ) \\ W \widetilde{N}(\tau ) \end{array}\right] \), it follows from (3.4) and (3.6) that
where \(B \in {\mathbb {R}}^{n\times p}\), \(F \in {\mathbb {R}}^{p \times p}\). Solving the above ordinary differential equations, we get that
Since \(Z(\tau ) \equiv I_p\), we know from (3.3) and the definition of \({\mathcal {A}}(\tau )\) that
and \({\mathcal {A}}(\tau )^{\mathsf{T}} {\mathcal {A}}(\tau ) = I_p\). It follows that the matrix inside the brackets of (8.1) will be skew-symmetric. This means that \(B = -W^{\mathsf{T}}\), \(F + F^{\mathsf{T}} = 0\) and \(X^{\mathsf{T}}E+E^{\mathsf{T}}X=0\) (i.e., \(E \in {\mathcal {T}}_X\)). Let \(W = QR\) be the unique QR factorization of \(W\) and assume that \(F\) takes the form \(F = Q{\widetilde{F}}Q^{\mathsf{T}}\), where \({\widetilde{F}}\in {\mathbb {R}}^{p\times p}\) is any skew-symmetric matrix. Then we can write
where the second equality is due that \(Q\) has orthonormal columns. The update scheme (8.2) can be regarded as a generalized geodesic update scheme, since letting \({\widetilde{F}}= 0\), (8.2) reduces to the geodesic update scheme (2.1).
1.2 Details of approach II
Based on the choices that \(\widetilde{R}(\tau ) = I_p + \tau \widetilde{R}'(0)\) and \(\widetilde{N}(\tau ) = \tau I_p\), we can get \(Z(\tau )\) from (3.3) by the polar decomposition or the Cholesky factorization.
If by the polar decomposition, \(Z(\tau )\) and \(Z'(\tau )\) are always symmetric. In this case, it is easy to show that \(Y(\tau ; X) = \mathcal {P}_{{\mathcal {S}}_{n,p}}{{\big (X \widetilde{R}(\tau ) + W \widetilde{N}(\tau ) \big )}}\), which with (3.6), (3.7) and (3.8) further implies that
where \(Z'(0)\) is any \(p\)-by-\(p\) symmetric matrix. If \(Z'(0) = 0\), (8.3) reduces to the ordinary polar decomposition or Manton’s projection update scheme. If \(Z'(0) = \text{ sym }(X^{\mathsf{T}}G)\) and \(E = G - X \text{ sym }(X^{\mathsf{T}}G),\) it becomes the ordinary gradient projection update scheme.
If by the Cholesky factorization, \(Z(\tau )\) and \(Z'(\tau )\) are always upper triangular. Similarly, we can derive
where \(Z'(0)\) is any \(p\)-by-\(p\) upper triangular matrix. When \(Z'(0) = 0\), (8.4) reduces to the ordinary QR factorization update scheme.
Appendix 2: Proof of lemma 3.1
The fact that \(X^{\mathsf{T}}D_{\rho }\) is skew-symmetric implies that \(J\) is invertible, thus \(Y(\tau )\) is well-defined. (i) follows from the construction of the update scheme (3.11).
Meanwhile, we know that \(Y'(0) = -D_{\rho }\), which with the chain rule shows that
where the second equality is due to \(D_{\rho }\in {\mathcal {T}}_X\) and the definition (2.8) of \(\nabla {\mathcal {F}}\). Recall that \(P_X = I_n - \frac{1}{2} XX^{\mathsf{T}}\). Substituting (3.12) into the above equation yields
So (ii) is true.
We prove (iii) by contradiction. Multiplying \(X^{\mathsf{T}}\) from both sides the expression of \(Y(\tau )\) in (3.11) and using \(X^{\mathsf{T}}W = 0\), we get that \(2J^{-1} - I_p = X^{\mathsf{T}}Y\). Assume that there exists a \(p\)-by-\(p\) orthogonal matrix \(Q_p\) such that \(Y = XQ_p\). Then we have \(2J^{-1} - I_p = Q_p\); i.e., \(2I_p - J = Q_pJ\). It follows from \((2I_p - J)^{\mathsf{T}} (2I_p-J) = (Q_pJ)^{\mathsf{T}}Q_pJ\) and \(Q_p^\mathsf{T}Q_p = I_p\) that
which is a contradiction to the definition of \(J\). The contradiction shows the truth of (iii).
For (iv), it is obvious that \(\Vert J\Vert _2 \le 1 + \frac{\tau ^2}{4}\Vert W\Vert _{\mathsf{F}}^2 + \frac{\tau }{2}\Vert X^{\mathsf{T}}D_{\rho }\Vert _{\mathsf{F}}\), where \(\Vert \cdot \Vert _2\) is the spectral norm. It follows from \(-D_{\rho }= W - XX^{\mathsf{T}} D_{\rho }\) and \(X^{\mathsf{T}} W=0\) that \(\Vert D_{\rho }\Vert _{\mathsf{F}}^2 = \Vert W\Vert _{\mathsf{F}}^2 + \Vert X^{\mathsf{T}}D_{\rho }\Vert _{\mathsf{F}}^2.\) Notice that \(\upsilon = \tau \Vert D_{\rho }\Vert _{\mathsf{F}}\). Then we have
On the other hand, the relation \(2J^{-1} = X^{\mathsf{T}} Y + I_p\) indicates that \(\Vert J^{-1}\Vert _2 \le 1\). Thus (iv) is true.
In the case that \(p=1\) or \(n\), it is not difficult to simplify the corresponding update schemes and we omit the details here. Thus, we complete the proof.
Appendix 3: Proof of lemma 3.2
The relation \(\langle G, D^{(i)} \rangle = e_i^{\mathsf{T}}\left( G^{\mathsf{T}} \nabla {\mathcal {F}}\right) e_i\) can be easily verified from the definition of \(D^{(i)}\). With the definition of \(D^{(q)}\) and \(X^{\mathsf{T}} X = I_p\), we have that
Thus \(Y(\tau )\) is well-defined. Moreover, we have that
Hence, the matrix \(J = I_p + \frac{\tau ^2}{4} W^{\mathsf{T}}W + \frac{\tau }{2}X^{\mathsf{T}} D^{(q)}\) can be expressed as
where \(\xi = \frac{\tau ^2}{4}G_{(q)}^{\mathsf{T}}(I_n - XX^{\mathsf{T}}) G_{(q)}\). By the formulations of \(b\) and \(\xi \), we easily see that
Applying the SMW formula to (10.1) and using the above relations, we can obtain (3.15).
Moreover, notice that \(\nabla {\mathcal {F}}= \sum _{i=1}^{p} D^{(i)}\). Thus we have
where the first inequality follows from the choice of \(q\) in (3.14) and the second one is due to (ii) of Lemma 3.1. The proof is completed.
Appendix 4: Proof of proposition 3.1
Before going into the details of the proof, we recall the following fact on the inverse of a \(2 \times 2\) block matrix.
Fact 11.1
Assume that \(T = \left[ \begin{array}{cc} T_{11} &{} ~T_{12}\\ T_{21} &{}~T_{22} \end{array} \right] \), where \(T_{22}^{}\) and \(T_{11}^{} - T_{12}^{} T_{22}^{-1}T_{21}^{}\) are invertible. Then \(T\) is invertible and
First, we show that the update scheme (2.3) is well-defined, provided that \(I_p+\frac{\tau }{4}X^{\mathsf{T}}D\) is invertible. Consider the update scheme (2.3) with \(U=[P_X D, \,X]\) and \(V=[X,\, -P_XD]\). It follows from \(P_X = I_n - \frac{1}{2}XX^{\mathsf{T}}\) that \(X^{\mathsf{T}}P_X D = \frac{1}{2} X^{\mathsf{T}}D\). Combining this with \(X^{\mathsf{T}}X = I_p\) and \(X^{\mathsf{T}}D\) being skew-symmetric, we can rewrite
Moreover, we derive that
which with (3.11) implies that
Plugging this into (11.1) yields
where \(T_{ij} \in {\mathbb {R}}^{p\times p}\) \((i,j \in \{1,2\})\). By simple calculations, we know that
Thus it follows from Fact 11.1 that \(I_{2p}+\frac{\tau }{2}V^{\mathsf{T}}U\) is invertible and the update scheme (2.3) is well-defined.
We now prove the equivalence. With Lemma 4.1 and (11.2), some tedious manipulations yield that \((I_{2p}+\frac{\tau }{2}V^{\mathsf{T}}U)^{-1}=\left[ \begin{array}{cc}M_{11} &{} ~M_{12}\\ M_{21} &{} ~M_{22}\end{array}\right] \), where
Direct calculations show that
Finally, we can obtain
where the first equality uses (2.3) and \(X^{\mathsf{T}}X = I_p\), the second and third equalities are due to \(P_XD = \frac{1}{2} XX^{\mathsf{T}}D -W\) and (11.3), respectively. The proof is completed.
Appendix 5: Some details of Table 1
We first review the computational costs of some basic matrix operations. Given \(A \in {\mathbb {R}}^{n_1 \times n_2}\) and \(B \in {\mathbb {R}}^{n_2 \times n_2}\), computing \(AB\) needs \(2n_1^{}n_2^2\) flops while computing \(A^{\mathsf{T}} A\) only needs \(n_1^{}n_2^2\) flops. Computing \(\text{ qr }(A)\) by the modified Gram-Schmidt algorithm needs \(2n_1^{}n_2^2\) flops (here \(n_1\ge n_2\)). If \(B\) is symmetric, computing the eigenvalue decomposition \(B = P\Sigma P^{\mathsf{T}}\) with orthogonal \( P\in {\mathbb {R}}^{n_2 \times n_2}\) and diagonal \(\Sigma \in {\mathbb {R}}^{n_2 \times n_2}\) by the symmetric QR algorithm needs \(9n_2^3\) flops. If \(B\) is skew-symmetric, computing the exponential of \(B\) needs \(10n_2^3\) flops. If \(B\) is symmetric positive definite, computing \(B^{1/2}\) needs \(10n_2^3\) flops. In addition, if \(B\) is nonsingular, solving the matrix equation \(B T = A^{\mathsf{T}}\) by the Gauss elimination with partial pivoting needs \(2n_1^{}n_2^2 +2n_2^3/3\) flops. It follows that computing \(B^{-1}\) needs \(8n_2^3/3\) flops. We refer interested readers to [23] for more details.
To verify the computational costs for the corresponding update scheme in Table 1, we take the new update scheme (3.11) and the Wen-Yin update scheme (2.3) as examples. Notice that comparing with \(O(np^2)\) or \(O(n^3)\), the \(O(p^2)\) term is omitted for \(1\le p \le n\). For simplicity, we only consider the case that \(1< p <n\). In the case that \(p=1\) or \(n\), the cost for the update schemes (3.11) and (2.3) can be easily obtained by a similar analysis. We omit its details here.
To analyze the computational cost for (3.11), since \(E = D_{\rho }\) and hence \(W = -G + XX^{\mathsf{T}}G\), we can rewrite the feasible curve as
and
Forming \(J(\tau )\) needs \(3np^2 + p^3\) flops involving computing \(X^{\mathsf{T}}G\), \(G^{\mathsf{T}}G\) and \((G^{\mathsf{T}}X )(X^{\mathsf{T}}G )\). The final assembly for \(Y(\tau )\) consists of involving solving one matrix equation and performing one matrix multiplication and two matrix subtractions, and hence needs \(4np^2 + 2np + 2p^3/3\) flops. Therefore, the total cost of forming \(Y(\tau ; X)\) in (3.11) for any \(\tau \) is \(7np^2 + 2np + 5p^3/3\). While doing backtracking line searches, we need to update \(Y(\tau ; X)\) for a different \(\tau \). Denote the first trial and the new trial stepsizes by \(\tau _{{{\mathrm {first}}}}\) and \(\tau _{{{\mathrm {new}}}}\), respectively. It is easy to see that
As \(X\big (\frac{2}{\tau _{{{\mathrm {first}}}}}I_p + X^{\mathsf{T}}G\big ) - G\) is already computed in \(Y(\tau _{{{\mathrm {first}}}}; X)\), we store this matrix and hence only need \(2np^2 + 3np + 2p^3/3\) to compute \(Y(\tau _{{{\mathrm {new}}}}; X).\)
To analyze the computational cost for the Wen-Yin update scheme (2.3), we see that it takes \(3np^2\) and \(2np^2 + np\) flops to form \(I_{2p} + \frac{\tau }{2} V^{\mathsf{T}} U\) and \(P_X D_{\rho }\), respectively. The final assembly for \(Y(\tau )\) needs \(4np^2 + np + 40p^3/3\) flops. Hence, the total cost for the Wen-Yin update scheme is \(9np^2 + 2np + 40p^3/3 \). When \(\rho = 1/2,\) we see from [53] that \(U = [G, X]\), \(V = [X, -G]\) in (2.3) which implies that there is no need to compute \(P_XD_{1/2}\) any more, and the total cost for forming the Wen-Yin update scheme can be reduced to \(7np^2 + np + 40p^3/3\). The cost for updating \(Y(\tau )\) with a new \(\tau \) is \(4np^2 + np + 40p^3/3\) since \(V^{\mathsf{T}}U\) can be stored and the main work involves solving a matrix equation and the final assembly.
Rights and permissions
About this article
Cite this article
Jiang, B., Dai, YH. A framework of constraint preserving update schemes for optimization on Stiefel manifold. Math. Program. 153, 535–575 (2015). https://doi.org/10.1007/s10107-014-0816-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-014-0816-7
Keywords
- Stiefel manifold
- Orthogonality constraint
- Sphere constraint
- Range space
- Null space
- Barzilai–Borwein-like method
- Feasible
- Adaptive nonmonotone line search
- Low-rank correlation matrix
- Kohn–Sham total energy minimization
- Heterogeneous quadratic functions