Abstract
Recently, several authors have shown local and global convergence rate results for Douglas–Rachford splitting under strong monotonicity, Lipschitz continuity, and cocoercivity assumptions. Most of these focus on the convex optimization setting. In the more general monotone inclusion setting, Lions and Mercier showed a linear convergence rate bound under the assumption that one of the two operators is strongly monotone and Lipschitz continuous. We show that this bound is not tight, meaning that no problem from the considered class converges exactly with that rate. In this paper, we present tight global linear convergence rate bounds for that class of problems. We also provide tight linear convergence rate bounds under the assumptions that one of the operators is strongly monotone and cocoercive, and that one of the operators is strongly monotone and the other is cocoercive. All our linear convergence results are obtained by proving the stronger property that the Douglas–Rachford operator is contractive.
Similar content being viewed by others
1 Introduction
Douglas–Rachford splitting [12, 24] is an algorithm that solves monotone inclusion problems of the form
where A and B are maximally monotone operators. A class of problems that falls under this category is composite convex optimization problems of the form
where f and g are proper, closed, and convex functions. This holds since the subdifferential of proper, closed, and convex functions are maximally monotone operators, and since Fermat’s rule says that the optimality condition for solving (1.1) is \(0\in \partial f(x)+\partial g(x)\), under a suitable qualification condition. The algorithm has shown great potential in many applications such as signal processing [6], image denoising [32], and statistical estimation [5] (where the dual algorithm ADMM is discussed).
It has long been known that Douglas–Rachford splitting converges under quite mild assumptions, see [13, 14, 24]. However, the rate of convergence in the general case has just recently been shown to be \(O(1/\sqrt{k})\) for the fixed-point residual, [9, 10, 18]. For general maximal monotone operator problems, where one of the operators is strongly monotone and Lipschitz continuous, Lions and Mercier showed in [24] that the Douglas–Rachford algorithm enjoys a linear convergence rate. To the author’s knowledge, this was the sole linear convergence rate results for a long period of time for these methods. Recently, however, many works have shown linear convergence rates for Douglas–Rachford splitting and its dual version, ADMM, see [1, 2, 4, 8, 10, 11, 15,16,17, 19,20,21,22, 28, 30, 33]. The works in [4, 10, 19, 20, 30] concern local linear convergence under different assumptions. The works in [21, 22, 33] consider distributed formulations, while the works in [1, 2, 8, 11, 15,16,17, 24, 27, 28, 31] show global convergence rate bounds under various assumptions. Of these, the works in [1, 2, 16] show tight linear convergence rate bounds. The works in [1, 2] show tight convergence rate results for problem of finding a point in the intersection of two subspaces. In [16] it is shown that the linear convergence rate bounds in [17] (which are generalizations of the bounds in [15]) are tight for composite convex optimization problems where one function is strongly convex and smooth. All these results, except the one by Lions and Mercier, are stated in the convex optimization setting. In this paper, we will provide tight linear convergence rate bounds for monotone inclusion problems.
We consider three different sets of assumptions under which we provide linear convergence rate bounds. In all cases, the properties of Lipschitz continuity or cocoercivity, and strong monotonicity, are attributed to the operators. In the first case, we assume that one operator is strongly monotone and the other is cocoercive. In the second case, we assume that one operator is both strongly monotone and Lipschitz continuous. This is the setting considered by Lions and Mercier in [24], where a non-tight linear convergence rate bound is presented. In the third case, we assume that one operator is both strongly monotone and cocoercive. We show in all these settings that our bounds are tight, meaning that there exists problems from the respective classes that converge exactly with the provided rate bound. In the second and third cases, the rates are tight for all feasible algorithm parameters, while in the first case, the rate is tight for many algorithm parameters.
2 Background
In this section, we introduce some notations and define some operator and function properties.
2.1 Notation
We denote by \(\mathbb {R}\) the set of real numbers and by \(\overline{\mathbb {R}}:=\mathbb {R}\cup \{\infty \}\) the extended real line. Throughout this paper, \(\mathcal {H}\) denotes a separable real Hilbert space. Its inner product is denoted by \(\langle \cdot ,\cdot \rangle \), its induced norm by \(\Vert \cdot \Vert \). We denote by \(\{\phi _i\}_{i=1}^K\) any orthonormal basis in \(\mathcal {H}\), where K is the dimension of \(\mathcal {H}\) (possibly \(\infty \)). The gradient to \(f:~\mathcal {X}\rightarrow \mathbb {R}\) is denoted by \(\nabla f\) and the subdifferential operator to \(f:~\mathcal {X}\rightarrow \overline{\mathbb {R}}\) is denoted by \(\partial f\) and is defined as \(\partial f(x_1):=\{u~|~f(x_2)\ge f(x_1)+\langle u,x_2-x_1\rangle {\hbox { for all }} x_2\in \mathcal {X}\}\). The conjugate function of f is denoted and defined by \(f^{*}(y)\triangleq \sup _{x}\left\{ \langle y,x\rangle -f(x)\right\} \). The power set of a set \(\mathcal {X}\), i.e., the set of all subsets of \(\mathcal {X}\), is denoted by \(2^{\mathcal {X}}\). The graph of an (set-valued) operator \(A:~\mathcal {X}\rightarrow 2^{\mathcal {Y}}\) is defined and denoted by \(\mathrm{{gph}}A = \{(x,y)\in \mathcal {X}\times \mathcal {Y}~|~y\in Ax\}\). The inverse operator \(A^{-1}\) is defined through its graph by \(\mathrm{{gph}}A^{-1} = \{(y,x)\in \mathcal {Y}\times \mathcal {X}~|~y\in Ax\}\). The identity operator is denoted by \(\mathrm{{Id}}\) and the resolvent of a monotone operator A is defined and denoted by \(J_A=(\mathrm{{Id}}+A)^{-1}\). Finally, the class of closed, proper, and convex functions \(f:~\mathcal {H}\rightarrow \overline{\mathbb {R}}\) is denoted by \(\Gamma _0(\mathcal {H})\).
2.2 Operator properties
Definition 2.1
(Strong monotonicity) Let \(\sigma > 0\). An operator \(A:~\mathcal {H}\rightarrow 2^{\mathcal {H}}\) is \(\sigma \)-strongly monotone if
holds for all \((x,u)\in \mathrm{{gph}}(A)\) and \((y,v)\in \mathrm{{gph}}(A)\).
The operator is merely monotone if \(\sigma =0\) in the above definition. In the following three definitions, we state some properties for single-valued operators \(T:~\mathcal {H}\rightarrow \mathcal {H}\). We state the properties for operators with full domain, but they can also be stated for operators with any nonempty domain \(\mathcal {D}\subseteq \mathcal {H}\).
Definition 2.2
(Lipschitz continuity) Let \(\beta \ge 0\). A mapping \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\beta \)-Lipschitz continuous if
holds for all \(x,y\in \mathcal {H}\).
Definition 2.3
(Nonexpansiveness) A mapping \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is nonexpansive if it is 1-Lipschitz continuous.
Definition 2.4
(Contractiveness) A mapping \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\delta \)-contractive if it is \(\delta \)-Lipschitz continuous with \(\delta \in [0,1)\).
Definition 2.5
(Averaged mappings) A mapping \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\alpha \)-averaged if there exists a nonexpansive mapping \(R:~\mathcal {H}\rightarrow \mathcal {H}\) and \(\alpha \in (0,1)\) such that \(T=(1-\alpha )\mathrm{{Id}}+\alpha R\).
From [3, Proposition 4.25], we know that an operator \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\alpha \)-averaged if and only if it satisfies
for all \(x,y\in \mathcal {H}\).
Definition 2.6
(Cocoercivity) Let \(\beta > 0\). A mapping \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\tfrac{1}{\beta }\)-cocoercive if
holds for all \(x,y\in \mathcal {H}\).
3 Preliminaries
In this section, we state and show preliminary results that are needed to prove the linear convergence rate bounds. We state some lemmas that describe how cocoercivity, Lipschitz continuity, as well as averagedness relate to each other. We also introduce negatively averaged operators, T, that are defined by that \(-T\) is averaged. We show different properties of such operators, including that averaged maps of negatively averaged operators are contractive. This result will be used to show linear convergence in the case where the strong monotonicity and Lipschitz continuity properties are split between the operators.
3.1 Useful lemmas
Proofs to the following three lemmas are found in Appendix 8.
Lemma 3.1
Assume that \(\beta >0\) and let \(T:~\mathcal {H}\rightarrow \mathcal {H}\). Then \(\tfrac{1}{2\beta }\)-cocoercivity of \(\beta \mathrm{{Id}}+T\) is equivalent to \(\beta \)-Lipschitz continuity of T.
Lemma 3.2
Assume that \(\beta \in (0,1)\). Then \(\tfrac{1}{\beta }\)-cocoercivity of \(R:~\mathcal {H}\rightarrow \mathcal {H}\) is equivalent to \(\tfrac{\beta }{2}\)-averagedness of \(T=R+(1-\beta )\mathrm{{Id}}\).
Lemma 3.3
Let \(T:~\mathcal {H}\rightarrow \mathcal {H}\) be \(\delta \)-contractive with \(\delta \in [0,1)\). Then \(R=(1-\alpha )\mathrm{{Id}}+\alpha T\) is contractive for all \(\alpha \in (0,\tfrac{2}{1+\delta })\). The contraction factor is \(|1-\alpha |+\alpha \delta \).
For easier reference, we also record special cases of some results in [3] that will be used later. Specifically, we record, in order, special cases of [3, Proposition 4.33], [3, Proposition 4.28], and [3, Proposition 23.11].
Lemma 3.4
Let \(\beta \in (0,1)\) and let \(T:~\mathcal {H}\rightarrow \mathcal {H}\) be \(\tfrac{1}{\beta }\)-cocoercive. Then \((\mathrm{{Id}}-T)\) is \(\tfrac{\beta }{2}\)-averaged.
Lemma 3.5
Let \(T:~\mathcal {H}\rightarrow \mathcal {H}\) be \(\alpha \)-averaged with \(\alpha \in (0,\tfrac{1}{2})\). Then \((2T-\mathrm{{Id}})\) is \(2\alpha \)-averaged.
Lemma 3.6
Let \(A:~\mathcal {H}\rightarrow 2^{\mathcal {H}}\) be maximally monotone and \(\sigma \)-strongly monotone with \(\sigma >0\). Then \(J_A=(\mathrm{{Id}}+A)^{-1}\) is \((1+\sigma )\)-cocoercive.
3.2 Negatively averaged operators
In this section we define negatively averaged operators and show various properties for these.
Definition 3.7
An operator \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\theta \)-negatively averaged with \(\theta \in (0,1)\) if \(-T\) is \(\theta \)-averaged.
This definition implies that an operator T is \(\theta \)-negatively averaged if and only if it satisfies
where \(\bar{R}\) is nonexpansive and \(R:=-\bar{R}\) is, therefore, also nonexpansive. Since \(-T\) is averaged, it is also nonexpansive, and so is T.
Since negatively averaged operators are nonexpansive, they can be averaged.
Definition 3.8
An \(\alpha \)-averaged \(\theta \)-negatively averaged operator \(S:~\mathcal {H}\rightarrow \mathcal {H}\) is defined as \(S=(1-\alpha )\mathrm{{Id}}+\alpha T\) where \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\theta \)-negatively averaged.
Next, we show that averaged negatively averaged operators are contractive.
Proposition 3.9
An \(\alpha \)-averaged \(\theta \)-negatively averaged operator \(S:~\mathcal {H}\rightarrow \mathcal {H}\) is \(|1-2\alpha +\alpha \theta |+\alpha \theta \)-contractive.
Proof
Let \(T=(\theta -1)\mathrm{{Id}}+\theta R\) (for some nonexpansive R) be the \(\theta \)-negatively averaged operator, which implies that \(S=(1-\alpha )\mathrm{{Id}}+\alpha T\). Then
since R is nonexpansive. It is straightforward to verify that \(|1-2\theta +\alpha \theta |+\alpha \theta <1\) for all combinations of \(\alpha \in (0,1)\) and \(\theta \in (0,1)\). Hence, S is contractive and the proof is complete. \(\square \)
Next, we optimize the contraction factor w.r.t. \(\alpha \).
Proposition 3.10
Assume that \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\theta \)-negatively averaged. Then the \(\alpha \) that optimizes the contraction factor for the \(\alpha \)-averaged \(\theta \)-negatively averaged operator \(S=(1-\alpha )\mathrm{{Id}}+\alpha T\) is \(\alpha = \tfrac{1}{2-\theta }\). The corresponding optimal contraction constant is \(\tfrac{\theta }{2-\theta }\).
Proof
Due to the absolute value, Proposition 3.9 states that the contraction factor \(\delta \) of T can be written as
where the kink in the absolute value term is at \(\alpha =\tfrac{1}{2-\theta }\). Since \(\theta \in (0,1)\), we get negative slope for \(\alpha \le \tfrac{1}{2-\theta }\) and positive slope for \(\alpha \ge \tfrac{1}{2-\theta }\). Therefore, the optimal \(\alpha \) is in the kink at \(\alpha =\tfrac{1}{2-\theta }\), which satisfies \(\alpha \in (\tfrac{1}{2},1)\) since \(\theta \in (0,1)\). Inserting this into the contraction factor expression gives \(\tfrac{\theta }{2-\theta }\). This concludes the proof. \(\square \)
Remark 3.11
The optimal contraction factor \(\tfrac{\theta }{2-\theta }\) is strictly increasing in \(\theta \) on the interval \(\theta \in (0,1)\). Therefore, the contraction factor becomes smaller the smaller \(\theta \) is.
We conclude this section by showing that the composition of an averaged and a negatively averaged operator is negatively averaged. Before we state the result, we need a characterization of \(\theta \)-negatively averaged operators T. This follows directly from the definition of averaged operators in (2.1) since \(-T\) is \(\theta \)-averaged:
Proposition 3.12
Assume that \(T_{\theta }:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\theta \)-negatively averaged and \(T_{\alpha }:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\alpha \)-averaged. Then \(T_{\theta }T_{\alpha }\) is \(\tfrac{\kappa }{\kappa +1}\)-negatively averaged where \(\kappa =\tfrac{\theta }{1-\theta }+\tfrac{\alpha }{1-\alpha }\).
Proof
Let \(\kappa _{\theta }=\tfrac{\theta }{1-\theta }\) and \(\kappa _{\alpha }=\tfrac{\alpha }{1-\alpha }\), then \(\kappa = \kappa _{\theta }+\kappa _{\alpha }\). We have
where the first inequality follows from convexity of \(\Vert \cdot \Vert ^2\). More precisely, let \(t\in [0,1]\), then, by convexity of \(\Vert \cdot \Vert ^2\), we conclude that
Letting \(t=\tfrac{\kappa _{\alpha }}{\kappa _{\theta }+\kappa _{\alpha }}\in [0,1]\), which implies that \(1-t=\tfrac{\kappa _{\theta }}{\kappa _{\theta }+\kappa _{\alpha }}\in [0,1]\), gives the first inequality in (3.2). The second inequality in (3.2) follows from (2.1) and (3.1). The relation in (3.2) coincides with the definition of negative averagedness in (3.1). Thus, \(T_{\theta }T_{\alpha }\) is \(\phi \)-negatively averaged with \(\phi \) satisfying \(\tfrac{1-\phi }{\phi }=\tfrac{1}{\kappa }\). This gives \(\phi = \tfrac{\kappa }{\kappa +1}\) and the proof is complete. \(\square \)
Remark 3.13
This result can readily be extended to show averagedness of \(T=T_1T_2\cdots T_N\) where \(T_i\) are \(\alpha _i\)-(negatively) averaged for \(i=1,\ldots ,N\). We get that T is \(\tfrac{\kappa }{1+\kappa }\)-negatively averaged with \(\kappa =\sum _{i=1}^N\tfrac{\alpha _i}{1-\alpha _i}\) if the number of negatively averaged \(T_i\):s is odd, and that T is \(\tfrac{\kappa }{1+\kappa }\)-averaged if the number of negatively averaged \(T_i\):s is even. Similar results have been presented, e.g., in [3, Proposition 4.32] which is improved in [7]. Our result extends these results in that it allows also for negatively averaged operators and reduces to the result in [7] for averaged operators.
4 Douglas–Rachford splitting
Douglas–Rachford splitting can be applied to solve monotone inclusion problems of the form
where \(A,B:~\mathcal {H}\rightarrow 2^{\mathcal {H}}\) are maximally monotone operators. The algorithm separates A and B by only touching the corresponding resolvents, where the resolvent \(J_A:~\mathcal {H}\rightarrow \mathcal {H}\) is defined as
The resolvent has full domain since A is assumed maximally monotone, see [26] and [3, Proposition 23.7]. If \(A=\partial f\) where f is a proper, closed, and convex function, then \(J_A = {\mathrm{prox}}_{f}\) where the prox operator \({\mathrm{prox}}_{f}\) is defined as
That this holds follows directly from Fermat’s rule [3, Theorem 16.2] applied to the proximal operator definition.
The Douglas–Rachford algorithm is defined by the iteration
where \(\alpha \in (0,1)\) (we will see that also \(\alpha \ge 1\) can sometimes be used) and \(R_A:~\mathcal {H}\rightarrow \mathcal {H}\) is the reflected resolvent, which is defined as
(Note that what is traditionally called Douglas–Rachford splitting is when \(\alpha =1/2\) in (4.3). The case with \(\alpha =1\) in (4.3) is often referred to as the Peaceman–Rachford algorithm, see [29]. We will use the term Douglas–Rachford splitting for all feasible choices of \(\alpha \).)
Since the reflected resolvent is nonexpansive in the general case [3, Corollary 23.10], and since compositions of nonexpansive operators are nonexpansive, the Douglas–Rachford algorithm is an averaged iteration of a nonexpansive mapping when \(\alpha \in (0,1)\). Therefore, Douglas–Rachford splitting is a special case of the Krasnosel’skiĭ–Mann iteration [23, 25], which is known to converge to a fixed-point of the nonexpansive operator, in this case \(R_AR_B\), see [3, Theorem 5.14]. Since an \(x\in \mathcal {H}\) solves (4.1) if and only if \(x=J_Az\) where \(z=R_AR_Bz\), see [3, Proposition 25.1] this algorithm can be used to solve monotone inclusion problems of the form (4.1). Note that to solve (4.1) is equivalent to solving
for any \(\gamma \in (0,\infty )\). Then we can define \(A_{\gamma } = \gamma A\) and (4.1) can also be solved by the iteration
Therefore, \(\gamma \) is an algorithm parameter that affects the progress of the iterations.
The objective of this paper is to provide tight linear convergence rate bounds for the Douglas–Rachford algorithm under various assumptions. Using these bounds, we will show how to select the algorithm parameters \(\gamma \) and \(\alpha \) that optimize these bounds. The first setting we consider is when A is strongly monotone and B is cocoercive.
5 A strongly monotone and B cocoercive
In this section, we show linear convergence for Douglas–Rachford splitting in the case where A and B are maximally monotone, A is strongly monotone, and B is cocoercive; that is, we make the following assumptions.
Assumption 5.1
Suppose that:
-
(i)
\(A:~\mathcal {H}\rightarrow 2^{\mathcal {H}}\) is maximally monotone and \(\sigma \)-strongly monotone.
-
(ii)
\(B:~\mathcal {H}\rightarrow \mathcal {H}\) is maximally monotone and \(\tfrac{1}{\beta }\)-cocoercive.
Before we can state the main linear convergence result, we need to characterize the properties of the resolvent, the reflected resolvent, and the composition between reflected resolvents. This is done in the following series of propositions, this first of which is proven in Appendix 8.
Proposition 5.2
The resolvent \(J_B\) of a \(\tfrac{1}{\beta }\)-cocoercive operator \(B:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\tfrac{\beta }{2(1+\beta )}\)-averaged.
This implies that also the reflected resolvent is averaged.
Proposition 5.3
The reflected resolvent of a \(\tfrac{1}{\beta }\)-cocoercive operator \(B:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\tfrac{\beta }{1+\beta }\)-averaged.
Proof
This follows directly from the Proposition 5.2 and Lemma 3.5. \(\square \)
If the operator instead is strongly monotone, the reflected resolvent is negatively averaged.
Proposition 5.4
The reflected resolvent of a \(\sigma \)-strongly monotone and maximal monotone operator \(A:~\mathcal {H}\rightarrow 2^\mathcal {H}\) is \(\tfrac{1}{1+\sigma }\)-negatively averaged.
Proof
From Lemma 3.6, we have that the resolvent \(J_A\) is \((1+\sigma )\)-cocoercive. Using Lemma 3.4, this implies that \(\mathrm{{Id}}-J_A\) is \(\tfrac{1}{2(1+\sigma )}\)-averaged. Then using Lemma 3.5, this implies that \(2(\mathrm{{Id}}-J_A)-\mathrm{{Id}}=\mathrm{{Id}}-2J_A=-R_A\) is \(\tfrac{1}{1+\sigma }\)-averaged, hence \(R_A\) is \(\tfrac{1}{1+\sigma }\)-negatively averaged. This completes the proof. \(\square \)
The composition of the reflected resolvents of a strongly monotone operator and a cocoercive operator is negatively averaged.
Proposition 5.5
Suppose that Assumption 5.1 holds. Then, the composition \(R_AR_B\) is \(\tfrac{\tfrac{1}{\sigma }+\beta }{1+\tfrac{1}{\sigma }+\beta }\)-negatively averaged.
Proof
Since \(R_A\) is \(\tfrac{1}{1+\sigma }\)-negatively averaged and \(R_B\) is \(\tfrac{\beta }{1+\beta }\)-averaged, see Propositions 5.3 and 5.4, we can apply Proposition 3.12. We get that \(\kappa = \tfrac{\tfrac{1}{1+\sigma }}{1-\tfrac{1}{1+\sigma }}+\tfrac{\tfrac{\beta }{1+\beta }}{1-\tfrac{\beta }{1+\beta }}=\tfrac{1}{\sigma }+\beta \) and that the averagedness parameter of the negatively averaged operator \(R_AR_B\) is given by \(\tfrac{\kappa }{\kappa +1}=\tfrac{\tfrac{1}{\sigma }+\beta }{\tfrac{1}{\sigma }+\beta +1}\). This concludes the proof. \(\square \)
With these results, we can now show the following linear convergence rate bounds for Douglas–Rachford splitting under Assumption 5.1. The theorem is proven in Appendix 8.
Theorem 5.6
Suppose that Assumption 5.1 holds, that \(\alpha \in (0,1)\), that \(\gamma \in (0,\infty )\), and that the Douglas–Rachford algorithm (4.3) is applied to solve \(0\in \gamma Ax+\gamma Bx\). Then the algorithm converges at least with rate factor
Optimizing this rate bound w.r.t. \(\alpha \) and \(\gamma \) gives \(\gamma = \tfrac{1}{\sqrt{\beta \sigma }}\) and \(\alpha =\tfrac{\sqrt{\beta /\sigma }+1/2}{1+\sqrt{\beta /\sigma }}\). The corresponding optimal rate bound is \(\tfrac{\sqrt{\beta /\sigma }}{\sqrt{\beta /\sigma }+1}\).
5.1 Tightness
In this section, we present an example that shows tightness of the linear convergence rate bounds in Theorem 5.6 for many algorithm parameters. We consider a two-dimensional Euclidean example, which is given by the following convex optimization problem:
where
and \(x=(x_1,x_2)\), and \(\beta >0\). The gradient \(\nabla f=\beta x_1\), so it is cocoercive with factor \(\tfrac{1}{\beta }\). According to [3, Theorem 18.15] this is equivalent to that \(f^*\) is \(\tfrac{1}{\beta }\)-strongly convex and, therefore, \(\partial f^*\) is \(\sigma :=\tfrac{1}{\beta }\)-strongly monotone.
The following proposition shows that when solving (5.2) with f defined in (5.3) using Douglas–Rachford splitting, the upper linear convergence rate bound is exactly attained. The result is proven in Appendix 8.
Proposition 5.7
Suppose that the Douglas–Rachford algorithm (4.3) is applied to solve (5.2) with f in (5.3). Further suppose that the parameters \(\gamma \) and \(\alpha \) satisfy \(\gamma \in (0,\infty )\) and \(\alpha \in [c,1)\) where \(c=\tfrac{1+\gamma \sigma +\gamma ^2\sigma \beta }{1+2\gamma \sigma +\gamma ^2\sigma \beta }\) and that \(z^0=(0,z_2^0)\) with \(z_2^0\ne 0\). Then the \(z^k\) sequence in (4.3) converges exactly with rate (5.1) in Theorem 5.6.
So, for all \(\gamma \) parameters and some \(\alpha \) parameters, the provided bound is tight. Especially, the optimal parameter choices \(\gamma =\tfrac{1}{\sqrt{\beta \sigma }}\) and \(\alpha =\tfrac{1+2\sqrt{\beta /\sigma }}{2(1+\sqrt{\beta /\sigma })}\) give a tight bound.
It is interesting to note that although we have considered a more general class of problems than convex optimization problems, a convex optimization problem is used to attain the worst case rate.
5.2 Comparison to other bounds
In [17], it was shown that Douglas–Rachford splitting converges as \(\tfrac{\sqrt{\beta /\sigma }-1}{\sqrt{\beta /\sigma }+1}\) when solving composite optimization problems of the form \(0\in \gamma \nabla f+\gamma \partial g\), where \(\nabla f\) is \(\sigma \)-strongly monotone and \(\tfrac{1}{\beta }\)-cocoercive and the algorithm parameters are chosen as \(\alpha =1\) and \(\gamma = \tfrac{1}{\sqrt{\beta \sigma }}\). In our setting, with \(\partial f\) being \(\sigma \)-strongly monotone and \(\partial g\) being \(\tfrac{1}{\beta }\)-cocoercive, we can instead pose the equivalent problem \(0\in \gamma \partial \hat{f}(x)+\gamma \partial \hat{g}(x)\) where \(\hat{f} = f-\tfrac{\sigma }{2}\Vert \cdot \Vert ^2\) and \(\hat{g}=g+\tfrac{\sigma }{2}\Vert \cdot \Vert ^2\). Then \(\partial \hat{f}\) is merely monotone and \(\hat{g}\) is \(\sigma \)-strongly monotone and \(\tfrac{1}{\beta +\sigma }\)-cocoercive. For that problem, [17] shows a linear convergence rate of at least rate \(\tfrac{\sqrt{(\beta +\sigma )/\sigma }-1}{\sqrt{(\beta +\sigma )/\sigma }+1}\) (when optimal parameters are used). This rate turns out to be better than the rate provided in Theorem 5.6, i.e., \(\tfrac{\sqrt{\beta /\sigma }}{\sqrt{\beta /\sigma }+1}\), which assumes that the strong convexity and smoothness properties are split between the two operators. This is shown by the following chain of equivalences which departs from the fact that the square root is sub-additive, i.e., that \(\sqrt{\beta +\sigma }\le \sqrt{\sigma }+\sqrt{\beta }\) for \(\beta ,\sigma \ge 0\):
This implies that, from a worst case perspective, it is better to shift both properties into one operator. This is also always possible, without increasing the computational cost in the algorithm, since the prox-operator is just shifted slightly:
A similar relation holds for \({\mathrm{prox}}_{\gamma \hat{g}}\) with the sign in front of \(\gamma \sigma \) flipped.
6 A strongly monotone and Lipschitz continuous
In this section, we consider the case where one of the operators is \(\sigma \)-strongly monotone and \(\beta \)-Lipschitz continuous. This is assumption is stated next.
Assumption 6.1
Suppose that:
-
(i)
The operators \(A:~\mathcal {H}\rightarrow \mathcal {H}\) and \(B:~\mathcal {H}\rightarrow 2^{\mathcal {H}}\) are maximally monotone.
-
(ii)
A is \(\sigma \)-strongly monotone and \(\beta \)-Lipschitz continuous.
First, we state a result that characterizes the resolvent of A. It is proven in Appendix 8.
Proposition 6.2
Assume that \(A:~\mathcal {H}\rightarrow \mathcal {H}\) is a maximal monotone \(\beta \)-Lipschitz continuous operator. Then the resolvent \(J_{A}=(\mathrm{{Id}}+A)^{-1}\) satisfies
This resolvent property is used when proving the following contraction factor of the reflected resolvent. The result is proven in Appendix 8.
Theorem 6.3
Suppose that \(A:~\mathcal {H}\rightarrow \mathcal {H}\) is a \(\sigma \)-strongly monotone and \(\beta \)-Lipschitz continuous operator. Then the reflected resolvent \(R_{A} = 2J_A-\mathrm{{Id}}\) is \(\sqrt{1-\tfrac{4\sigma }{1+2\sigma +\beta ^2}}\)-contractive.
The parameter \(\gamma \) that optimizes the contraction factor for \(R_{\gamma A}\) is the minimizer of \(h(\gamma ):=1-\tfrac{4\gamma \sigma }{1+\gamma \sigma +(\gamma \beta )^2}\) (\(\gamma A\) is \(\gamma \sigma \)-strongly monotone and \(\gamma \beta \)-Lipschitz continuous). The gradient \(\nabla h(\gamma ) =\tfrac{4\sigma (\beta ^2\gamma ^2-1)}{(\beta ^2\gamma ^2+2\sigma \gamma +1)^2}\), which implies that the extreme points are given by \(\gamma =\pm \tfrac{1}{\beta }\). Since \(\gamma >0\) and the gradient is positive for \(\gamma >\tfrac{1}{\beta }\) and negative for \(\gamma \in (0,\tfrac{1}{\beta })\), \(\gamma =\tfrac{1}{\beta }\) optimizes the contraction factor. The corresponding rate is
This is summarized in the following proposition.
Proposition 6.4
The parameter \(\gamma \) that optimizes the contraction factor of \(R_{\gamma A}\) is given by \(\gamma = \tfrac{1}{\beta }\). The corresponding contraction factor is \(\sqrt{\tfrac{\beta /\sigma -1}{\beta /\sigma +1}}\).
Now, we are ready to state the convergence rate results for Douglas–Rachford splitting.
Theorem 6.5
Suppose that Assumption 6.1 holds and that the Douglas–Rachford algorithm (4.3) is applied to solve \(0\in \gamma Ax+\gamma Bx\). Let\(\delta =\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}\), then the algorithm converges at least with rate factor
for all \(\alpha \in (0,\tfrac{2}{1+\delta })\). Optimizing this bound w.r.t. \(\alpha \) and \(\gamma \) gives \(\alpha =1\) and \(\gamma =\tfrac{1}{\beta }\) and corresponding optimal rate bound \(\sqrt{\tfrac{\beta /\sigma -1}{\beta /\sigma +1}}\).
Proof
It follows immediately from Theorem 6.3, Lemma 3.3, and Proposition 6.4 by noting that \(\alpha =1\) minimizes (6.2). \(\square \)
In the following section, we will see that there exists a problem from the considered class that converges exactly with the provided rate.
6.1 Tightness
We consider a problem where A is a scaled rotation operator, i.e,:
where \(0\le \psi <\tfrac{\pi }{2}\) and \(d\in (0,\infty )\). First, we show that A is strongly monotone and Lipschitz continuous.
Proposition 6.6
The operator A in (6.3) is \(d\cos {\psi }\)-strongly monotone and d-Lipschitz continuous.
Proof
We first show that A is \(d\cos {\psi }\)-strongly monotone. Since A is linear, we have
That is, A is \(d\cos {\psi }\)-strongly monotone. Since A is a scaled (with d) rotation operator, its largest eigenvalue is d, and hence A is d-Lipschitz. This concludes the proof. \(\square \)
We need an explicit form of the reflected resolvent of A to show that the rate is tight. To state it, we define the following alternative arctan definition that is valid when \(\tan {\xi }=\tfrac{x}{y}\) and \(x\ge 0\):
This arctan is defined for nonnegative numerators x only, and outputs an angle in the interval \([0,\pi ]\).
Next, we provide the expression for the reflected resolvent. To simplify its notation, we let \(\sigma \) denote the strong convexity modulus and \(\beta \) the Lipschitz constant of A, i.e.,
The following result is proven in Appendix 8.
Proposition 6.7
The reflected resolvent of \(\gamma A\), with A in (6.3) and \(\gamma \in (0,\infty )\), is
where \(\sigma \) and \(\beta \) are defined in (6.5), and \(\xi \) satisfies \(\xi =\arctan _2\left( \tfrac{2\gamma \sqrt{\beta ^2-\sigma ^2}}{1-(\gamma \beta )^2}\right) \) with \(\arctan _2\) defined in (6.4).
That is, the reflected resolvent is first a rotation then a contraction. The contraction factor is exactly the upper bound on the contraction factor in Theorem 6.5. Therefore, the A in (6.3) can be used to show tightness of the results in Theorem 6.5. To do so, we need another operator B that cancels the rotation introduced by A. For \(\alpha \in (0,1]\), we will need \(R_{\gamma A}R_{\gamma B}=\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}I\) and for \(\alpha >1\), we will need \(R_{\gamma A}R_{\gamma B}=-\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}I\). This is clearly achieved if \(R_{\gamma B}\) is another rotation operator. Using the following straightforward consequence of Minty’s theorem (see [26]) we conclude that any rotation operator (since they are nonexpansive) is the reflected resolvent of a maximally monotone operator.
Proposition 6.8
An operator \(R:~\mathcal {H}\rightarrow \mathcal {H}\) is nonexpansive if and only if it is the reflected resolvent of a maximally monotone operator.
Proof
It follows immediately from [3, Corollary 23.8] and [3, Proposition 4.2]. \(\square \)
With this in mind, we can state the tightness claim.
Proposition 6.9
Let \(\gamma \in (0,\infty )\), \(\delta =\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}\), and \(\xi \) be defined as in Proposition 6.7. Suppose that A is as in (6.3) and B is maximally monotone and satisfies either of the following:
-
(i)
if \(\alpha \in (0,1]\): \(B=B_1\) with \(R_{\gamma B_1}=\left[ \begin{matrix} \cos {\xi }&{}-\sin {\xi }\\ \sin {\xi }&{}\cos {\xi }\end{matrix}\right] \),
-
(ii)
\(\alpha \in (1,\tfrac{2}{1+\delta })\): \(B=B_2\) with \(R_{\gamma B_2}=\left[ \begin{matrix} \cos {(\pi -\xi )}&{}\sin {(\pi -\xi )}\\ -\sin {(\pi -\xi )}&{}\cos {(\pi -\xi )}\end{matrix}\right] \).
Then the \(z^k\) sequence for solving \(0\in \gamma Ax+\gamma Bx\) using (4.3) converges exactly with the rate \(|1-\alpha |+\alpha \delta \).
Proof
Case (i): Using the reflected resolvent \(R_{\gamma A}\) in Proposition 6.7 and that \(\alpha \in (0,1]\), we conclude that
Case (ii): Using the reflected resolvent \(R_{\gamma A}\) in Proposition 6.7 and that \(\alpha \ge 1\), we conclude that
In both cases, the convergence rate is exactly \(|1-\alpha |+\alpha \sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}\). This completes the proof. \(\square \)
Remark 6.10
It can be shown that the maximally monotone operator \(B_1\) that gives \(R_{\gamma B_1}\) satisfies \(B_1=\tfrac{1}{\gamma (1+\cos {\xi })} \left[ \begin{matrix} 0&{}-\sin {\xi }\\ \sin {\xi }&{}0 \end{matrix}\right] \) if \(\xi \in [0,\pi )\) and \(B_1=\partial \iota _0\) (that is, \(B_1\) is the subdifferential operator of the indicator function \(\iota _0\) of the origin) if \(\xi =\pi \). Similarly, the maximally monotone operator \(B_2\) that gives \(R_{\gamma B_2}\) satisfies \(B_2=\tfrac{1}{\gamma (1-\cos {\xi })} \left[ \begin{matrix} 0&{}-\sin {\xi }\\ \sin {\xi }&{}0 \end{matrix}\right] \) if \(\xi \in (0,\pi ]\) and \(B_2=0\) if \(\xi =0\).
We have shown that the rate provided in Theorem 6.5 is tight for all feasible \(\alpha \) and \(\gamma \).
6.2 Comparison to other bounds
In Fig. 1, we have compared the linear convergence rate result in Theorem 6.5 to the convergence rate result in [24]. The comparison is made with optimal \(\gamma \)-parameters for both bounds. The result in [24] is provided in the standard Douglas–Rachford setting, i.e., with \(\alpha =1/2\). By instead letting \(\alpha =1\), this rate can be improved, see [8] (which shows an improved rate in the composite convex optimization case, but the same rate can be shown to hold also for monotone inclusion problems). Also this improved rate is added to the comparison in Fig. 1. We see that both rates that follow from [24] are suboptimal and worse than the rate bound in Theorem 6.5.
7 A strongly monotone and cocoercive
In this section, we consider the case where A is strongly monotone and cocoercive; that is, we assume the following.
Assumption 7.1
Suppose that:
-
(i)
The operators \(A:~\mathcal {H}\rightarrow \mathcal {H}\) and \(B:~\mathcal {H}\rightarrow 2^{\mathcal {H}}\) are maximally monotone.
-
(ii)
A is \(\sigma \)-strongly monotone and \(\tfrac{1}{\beta }\)-cocoercive.
The linear convergence result for Douglas–Rachford splitting will follow from the contraction factor of the reflected resolvent of A. The contraction factor is provided in the following theorem, which is proven in Appendix 8.
Theorem 7.2
Suppose that \(A:~\mathcal {H}\rightarrow \mathcal {H}\) is a \(\sigma \)-strongly monotone and \(\tfrac{1}{\beta }\)-cocoercive operator. Then its reflected resolvent \(R_{A} = 2J_A-\mathrm{{Id}}\) is contractive with factor \(\sqrt{1-\tfrac{4\sigma }{1+2\sigma +\sigma \beta }}\).
When considering the reflected resolvent of \(\gamma A\) where \(\gamma \in (0,\infty )\), the \(\gamma \)-parameter can be chosen to optimize the contraction factor of \(R_{\gamma A}\). The operator \(\gamma A\) is \(\gamma \sigma \)-strongly monotone and \(\tfrac{1}{\gamma \beta }\)-cocoercive, so the optimal \(\gamma >0\) minimizes \(h(\gamma ):=1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }\). The gradient of h satisfies \(\nabla h(\gamma ) = \tfrac{4\sigma (\beta \sigma \gamma ^2-1)}{(\beta \sigma \gamma ^2+2\sigma \gamma +1)^2}\), so the extreme points of h are given by \(\gamma =\pm \tfrac{1}{\sqrt{\beta \sigma }}\). Since \(\gamma >0\) and the gradient is negative for \(\gamma \in (0,\tfrac{1}{\sqrt{\beta \sigma }})\) and positive for \(\gamma >\tfrac{1}{\sqrt{\beta \sigma }}\), the parameter \(\gamma =\tfrac{1}{\sqrt{\beta \sigma }}\) minimizes the contraction factor. The corresponding contraction factor is
This is summarized in the following proposition.
Proposition 7.3
The parameter \(\gamma \in (0,\infty )\) that optimizes the contraction factor for \(R_{\gamma A}\) is \(\gamma \!=\!\tfrac{1}{\sqrt{\beta \sigma }}\). The corresponding contraction factor is \(\sqrt{\tfrac{\sqrt{\beta /\sigma }-1}{\sqrt{\beta /\sigma }+1}}\).
Now we are ready to state the linear convergence rate result for the Douglas–Rachford algorithm.
Theorem 7.4
Suppose that Assumption 7.1 holds and that the Douglas–Rachford algorithm (4.3) is applied to solve \(0\in \gamma Ax+\gamma Bx\). Let\(\delta = \sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }}\), then the algorithm converges at least with rate factor
for all \(\alpha \in (0,\tfrac{2}{1+\delta })\). Optimizing this bound w.r.t. \(\alpha \) and \(\gamma \) gives \(\alpha =1\) and \(\gamma =\tfrac{1}{\sqrt{\beta \sigma }}\) and corresponding optimal rate bound \(\sqrt{\tfrac{\sqrt{\beta /\sigma }-1}{\sqrt{\beta /\sigma }+1}}\).
Proof
It follows immediately from Theorem 7.2, Lemma 3.3, and Proposition 7.3 by noting that \(\alpha =1\) minimizes (7.1). \(\square \)
7.1 Tightness
In this section, we provide a two-dimensional example that shows that the provided bounds are tight. We let A be the resolvent of a scaled rotation operator to achieve this. Let C be that scaled rotation operator, i.e.,
with \(c\in (1,\infty )\) and \(\psi \in [0,\tfrac{\pi }{2})\). We will let A satisfy \(A=dJ_{C}\) for some \(d\in (0,\infty )\); that is
In the following proposition, we state the strong monotonicity and cocoercivity properties of A.
Proposition 7.5
The operator A in (7.3) is \(\tfrac{1+c\cos {\psi }}{d}\)-cocoercive and strongly monotone with modulus \(\tfrac{d(1+c\cos {\psi })}{1+2c\cos {\psi }+c^2}\).
Proof
The matrix C in (7.2) is \(c\cos {\psi }\)-strongly monotone (see Proposition 6.6), so \(J_C\) is \((1+c\cos {\psi })\)-cocoercive (see [3, Definition 4.4]) and the operator \(A=d(I+C)^{-1}\) is \(\tfrac{1+c\cos {\psi }}{d}\)-cocoercive. Further, since C is monotone and c-Lipschitz continuous (see Proposition 6.6), the following holds (see Proposition 6.2):
Since \(J_C\) is \((1+c\cos {\psi })\)-cocoercive, we have
We add (7.5) multiplied by \(-\tfrac{1-c^2}{1+c\cos {\psi }}\) (which is positive since \(c\in (1,\infty )\)) to (7.4) to get
that is, \(J_C\) is \(\sigma \)-strongly monotone with
so A is strongly monotone with parameter \(d\tfrac{1+c\cos {\psi }}{1+2c\cos {\psi }+c^2}\). This concludes the proof. \(\square \)
This shows that the assumptions needed for the linear convergence rate result in Theorem 7.4 hold. To prove the tightness claim, we need an expression for the reflected resolvent of A. This is easier expressed in the strong convexity modulus, which we define as \(\sigma \) and the inverse cocoercivity constant, which we define as \(\beta \), i.e.,
The following results is proven in Appendix 8.
Proposition 7.6
The reflected resolvent \(R_{\gamma A}\) of \(\gamma A\), where A is defined in (7.3) and \(\gamma \in (0,\infty )\), is given by
where \(\sigma \) and \(\beta \) are defined in (7.6), and \(\xi \) satisfies \(\xi =\arctan _2\left( \tfrac{2\gamma \sqrt{\sigma (\beta -\sigma )}}{1-\sigma \beta \gamma ^2}\right) \) with \(\arctan _2\) defined in (6.4).
Based on this reflected resolvent, we can show that the rate bound in Theorem 7.4 is indeed tight. The proof of the following result is the same as the proof to Proposition 6.9.
Proposition 7.7
Let \(\gamma \in (0,\infty )\), \(\delta =\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }}\), and let \(\xi \) be as defined in Proposition 7.6. Suppose that A is as in (7.3) and B is maximally monotone and satisfies either of the following:
-
(i)
if \(\alpha \in (0,1]\): \(B=B_1\) with \(R_{\gamma B_1}= \left[ \begin{matrix} \cos {\xi }&{}\sin {\xi }\\ -\sin {\xi }&{}\cos {\xi } \end{matrix}\right] \),
-
(ii)
\(\alpha \in (1,\tfrac{2}{1+\delta })\): \(B=B_2\) with \(R_{\gamma B_2}=\left[ \begin{matrix} \cos {(\pi -\xi )}&{}-\sin {(\pi -\xi )}\\ \sin {(\pi -\xi )}&{}\cos {(\pi -\xi )} \end{matrix}\right] \).
Then the \(z^k\) sequence for solving \(0\in \gamma Ax+\gamma Bx\) using (4.3) converges exactly with the rate \(|1-\alpha |+\alpha \delta \).
So, we have shown that the rate in Theorem 7.4 is tight for all feasible algorithm parameters \(\alpha \) and \(\gamma \).
7.2 Comparison to other bounds
We have shown tight convergence rate estimates for Douglas–Rachford splitting when the monotone operator A is cocoercive and strongly monotone (Theorem 7.4). In Sect. 6, we showed tight estimates when A is Lipschitz and strongly monotone (Theorem 6.5). In [17], tight convergence rate estimates are proven for the case when A and B are subdifferential operators of proper closed and convex functions and A is strongly monotone and Lipschitz continuous (which in this case is equivalent to cocoercive). The class of problems considered in [17] is a subclass of the problems considered in this section, which in turn is a subclass of the problems considered in Sect. 6. The optimal rates for these classes of problems are shown in Fig. 2. By restricting the problem classes, the rate bounds get tighter. This is in contrast to the case in Sect. 5, where a convex optimization problem achieved the worst case estimate.
8 Conclusions
We have shown linear convergence rate bounds for Douglas–Rachford splitting for monotone inclusion problems with three different sets of assumptions. One setting was the one used by Lions and Mercier [24], for which we provided a tighter bound. We also stated linear convergence rate bounds under two other assumptions, for which no other linear rate bounds were previously available. In addition, we have shown that all our rate bounds are tight for, in two cases all feasible algorithm parameters, and in the remaining case many algorithm parameters.
References
Bauschke, H.H., Bello Cruz, J.Y., Nghia, T.T.A., Phan, H.M., Wang, X.: The rate of linear convergence of the Douglas–Rachford algorithm for subspaces is the cosine of the Friedrichs angle. J. Approx. Theory 185, 63–79 (2014)
Bauschke, H.H., BelloCruz, J.Y., Nghia, T.T.A., Phan, H.M., Wang, X.: Optimal rates of linear convergence of relaxed alternating projections and generalized Douglas-Rachford methods for two subspaces. Numer. Algorithms 73(1), 33–76 (2016)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Bauschke, H.H., Burachik, R.S., Combettes, P.L., Elser, V., Luke, D.R., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering, volume 49 of Springer Optimization and Its Applications, pp. 185–212. Springer, New York (2011)
Combettes, P.L., Yamada, I.: Compositions and convex combinations of averaged nonexpansive operators. J. Math. Anal. Appl. 425(1), 55–70 (2015)
Davis, D., Yin, W.: Faster convergence rates of relaxed Peaceman–Rachford and ADMM under regularity assumptions. http://arxiv.org/abs/1407.5210 (July) (2014)
Davis, D., Yin, W.: Convergence Rate Analysis of Several Splitting Schemes, pp. 115–163. Springer, Berlin (2016)
Demanet, L., Zhang, X.: Eventual linear convergence of the Douglas–Rachford iteration for basis pursuit. Math. Comput. 85, 209–238 (2016)
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. PhD thesis, MIT (1989)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Ghadimi, E., Teixeira, A., Shames, I., Johansson, M.: Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems. IEEE Trans. Autom. Control 60(3), 644–658 (2015)
Giselsson, P.: Tight linear convergence rate bounds for Douglas–Rachford splitting and ADMM. In: Proceedings of 54th Conference on Decision and Control, Osaka, Japan (2015)
Giselsson, P., Boyd, S.: Linear convergence and metric selection for Douglas–Rachford splitting and ADMM. IEEE Trans. Autom. Control 62(2), 532–544 (2017)
He, B., Yuan, X.: On the \(o(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Hesse, R., Luke, D.R.: Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems. SIAM J. Optim. 23(4), 2397–2419 (2013)
Hesse, R., Luke, D.R., Neumann, P.: Alternating projections and Douglas–Rachford for sparse affine feasibility. IEEE Trans. Signal Process. 62(18), 4868–4881 (2014)
Hong, M., Luo, Z.-Q.: On the linear convergence of the alternating direction method of multipliers. Math. Programm. 162(1), 165–199 (2016)
Iutzeler, F., Bianchi, P., Ciblat, P., Hachem, W.: Explicit convergence rate of a distributed alternating direction method of multipliers. IEEE Trans. Autom. Control 61(4), 892–904 (2016)
Krasnoselskii, M.A.: Two remarks on the method of successive approximations. Uspehi Mat. Nauk 10, 123–127 (1955)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Mann, W.R.: Mean value methods in iteration. Proc. Am. Math. Soc. 4, 506–510 (1953)
Minty, G.J.: Monotone (nonlinear) operators in Hilbert space. Duke Math. J. 29(3), 341–346 (1962). 09
Nishihara, R., Lessard, L., Recht, B., Packard, A., Jordan, M.: A general analysis of the convergence of ADMM. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 343–352 (2015)
Patrinos, P., Stella, L., Bemporad, A.: Douglas–Rachford splitting: Complexity estimates and accelerated variants. In: Proceedings of the 53rd IEEE Conference on Decision and Control, pp. 4234–4239, Los Angeles, CA, December (2014)
Peaceman, D.W., Rachford, H.H.: The numerical solution of parabolic and elliptic differential equations. J. Soc. Ind. Appl. Math. 3(1), 28–41 (1955)
Phan, H.M.: Linear convergence of the Douglas–Rachford method for two closed sets. Optimization 65(2), 369–385 (2016)
Raghunathan, A., Di Cairano, S.: ADMM for convex quadratic programs: Linear convergence and infeasibility detection. http://arxiv.org/abs/1411.7288
Setzer, S.: Split bregman algorithm, Douglas–Rachford splitting and frame shrinkage. In: Tai, X.-C., Mrken, K., Lysaker, M., Lie, K.-A. (eds.) Scale Space and Variational Methods in Computer Vision. Lecture Notes in Computer Science, vol. 5567, pp. 464–476. Springer, Berlin (2009)
Shi, W., Ling, Q., Yuan, K., Wu, G., Yin, W.: On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Signal Process. 62(7), 1750–1761 (2014)
Acknowledgements
The author would like to thank Heinz Bauschke for suggesting the term negatively averaged operators.
Author information
Authors and Affiliations
Corresponding author
Additional information
This project is financially supported by the Swedish Foundation for Strategic Research.
Appendices
Appendix A: Proofs to Lemmas in Section 3.1
1.1 Proof to Lemma 3.1
From the definition of cocoercivity, Definition 2.6, it follows directly that \(\beta \mathrm{{Id}}+T\) is \(\tfrac{1}{2\beta }\)-cocoercive if and only if \(\tfrac{1}{2\beta }(\beta \mathrm{{Id}}+T)\) is 1-cocoercive. This, in turn is equivalent to that \(2\tfrac{1}{2\beta }(\beta \mathrm{{Id}}+T)-\mathrm{{Id}}=\tfrac{1}{\beta }T\) is nonexpansive [3, Proposition 4.2 and Definition 4.4]. Finally, from the definition of Lipschitz continuity, Definition 2.2, it follows directly that \(\tfrac{1}{\beta } T\) is nonexpansive if and only if T is \(\beta \)-Lipschitz continuous. This concludes the proof.
1.2 Proof to Lemma 3.2
Let \(T_1 = R-\tfrac{\beta }{2}\mathrm{{Id}}\). Then Lemma 3.1 states that \(\tfrac{1}{\beta }\)-cocoercivity of \(T_1+\tfrac{\beta }{2}\mathrm{{Id}}=R\) is equivalent to \(\tfrac{\beta }{2}\)-Lipschitz continuity of \(T_1=R-\tfrac{\beta }{2}\mathrm{{Id}}\). By definition of Lipschitz continuity, this is equivalent to that \(T_1 = \tfrac{\beta }{2}T_2\) for some nonexpansive operator \(T_2\). Therefore, \(T=R+(1-\beta )\mathrm{{Id}}= T_1+(1-\tfrac{\beta }{2})\mathrm{{Id}}=\tfrac{\beta }{2}T_2+(1-\tfrac{\beta }{2})\mathrm{{Id}}\). Since \(\beta \in (0,1)\), this is equivalent to that T is \(\tfrac{\beta }{2}\)-averaged. This concludes the proof.
1.3 Proof to Lemma 3.3
Let x and y be any points in \(\mathcal {H}\). Then
So R is \((|1-\alpha |\Vert x-y\Vert +|\alpha |\delta )\)-Lipschitz continuous. The Lipschitz constant is less than 1 if \(\alpha \in (0,\tfrac{2}{1+\delta })\). For such \(\alpha \), R is contractive. Since \(\alpha >0\), the contraction factor is \((|1-\alpha |+\alpha \delta )\). This concludes the proof.
Appendix B: Proofs to results in Section 5
1.1 Proof to Proposition 5.2
Since B is \(\tfrac{1}{\beta }\)-cocoercive, it satisfies
Adding \(\Vert u-v\Vert ^2\) to both sides gives
Letting \(x= (B+\mathrm{{Id}})u\) and \(y= (B+\mathrm{{Id}})v\) implies that \(u= J_{B}x\) and \(v= J_By\). Therefore, we get the equivalent expression
Expansion of the second square gives
or equivalently
This is, by [3, Proposition 4.25], equivalent to that \(J_B\) is \(\tfrac{\beta }{2(\beta +1)}\)-averaged. This concludes the proof.
1.2 Proof to Theorem 5.6
Since \(R_{\gamma A}R_{\gamma B}\) is \(\tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{\tfrac{1}{\gamma \sigma }+\gamma \beta +1}\)-negatively averaged, see Proposition 5.5, theDouglas–Rachford iteration is defined by an \(\alpha \)-averaged \(\tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{\tfrac{1}{\gamma \sigma }+\gamma \beta +1}\)-negatively averaged operator. The rate in (5.1) follows directly from Proposition 3.9. The optimal parameters follow from Proposition 3.10. It shows that the rate factor is increasing in \(\tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{\tfrac{1}{\gamma \sigma }+\gamma \beta +1}\), which in turn is increasing in \(\tfrac{1}{\gamma \sigma }+\gamma \beta \). Therefore, this should be minimized to optimize the rate. The optimal \(\gamma =\tfrac{1}{\sqrt{\beta \sigma }}\) gives negative averagedness factor \(\tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{\tfrac{1}{\gamma \sigma }+\gamma \beta +1}=\tfrac{2\sqrt{\beta /\sigma }}{1+2\sqrt{\beta /\sigma }}\). Proposition 3.10 further gives that the optimal averagedness factor is
and that the optimal bound on the contraction factor is
This concludes the proof.
1.3 Proof to Proposition 5.7
The proximal and reflected proximal operators of f are trivially given by
Linearity of the proximal operator and Moreau’s decomposition [3, Theorem 14.3] imply that the reflected resolvent of \(f^*\) is given by
This gives the following Douglas–Rachford iteration:
Since we start at a point \(z^0=(0,z_2^0)\), we will get \(z_1^k=0\) for all \(k\ge 1\), and the Douglas–Rachford iteration becomes
with contraction factor given by \(|1-2\alpha |\).
When \(\alpha \in [c,1)\), the absolute value term in (5.1) is nonpositive since
Therefore, for such \(\alpha \), the rate in (5.1) is \(|1-2\alpha |\). This coincides with the rate for the provided example for any \(\gamma >0\), and the proof is completed.
Appendix C: Proofs to results in Section 6
1.1 Proof to Proposition 6.2
\(\beta \)-Lipschitz continuity of A implies that \(\beta \mathrm{{Id}}+A\) is \(\tfrac{1}{2\beta }\)-cocoercive, see Lemma 3.1. That is
Using \(\beta \mathrm{{Id}}= \mathrm{{Id}}+(\beta -1)\mathrm{{Id}}\), this is equivalent to that
Using that \(x=(\mathrm{{Id}}+A)u\) if and only if \(u= (\mathrm{{Id}}+A)^{-1}x\) and \(y=(\mathrm{{Id}}+A)v\) if and only if \(v= (\mathrm{{Id}}+A)^{-1}y\) (that hold by definition of the inverse and single-valuedness), this is equivalent to
Identifying the resolvent \(J_A=(\mathrm{{Id}}+A)^{-1}\) and expanding the first square give:
By rearranging the terms, we conclude that
The result follows by multiplying by \(2\beta \), since \(1+\tfrac{1-\beta }{\beta }=\tfrac{1}{\beta }\). This concludes the proof.
1.2 Proof to Theorem 6.3
We divide the proof into two cases, \(\beta \ge 1\) and \(\beta \le 1\).
1.3 Case \(\beta \ge 1\)
From [3, Proposition 23.11], we get that \(J_A\) is \((1+\sigma )\)-cocoercive, i.e., that
Adding \((\beta ^2-1)(\ge 0)\) of (C.1) to \((1+\sigma )\) of (6.1), we get
or equivalently
since the \(\Vert J_Ax-J_Ay\Vert \) terms cancel. We get
where (C.1) and (C.2) are used in the inequalities. Thus, the said result holds for \(\beta \ge 1\).
1.4 Case \(\beta \le 1\)
To prove the result for \(\beta \le 1\), we define the set \(\mathcal {R}\) of pairs of points \((x,y)\in \mathcal {H}\times \mathcal {H}\) as follows:
We also define the closure of the remaining pairs of points \(\mathcal {R}_c=\overline{(\mathcal {H}\times \mathcal {H})\backslash \mathcal {R}}\), i.e.,
Obviously, \(\mathcal {H}\times \mathcal {H}\subseteq \mathcal {R}+\mathcal {R}_c\) which implies that the contraction factor of the resolvent is the worst case contraction factor for \(\mathcal {R}\) and \(\mathcal {R}_c\). We first show the contraction factor for \(\mathcal {R}\). Since (C.2) is the definition of the set \(\mathcal {R}\) in (C.4), the contraction factor for \((x,y)\in \mathcal {R}\) is shown exactly as in (C.3). For \((x,y)\in \mathcal {R}_c\), we have
where (6.1) is used in the first inequality and the definition of \(\mathcal {R}_c\) in (C.5) in the second; that is, the worst case contraction factor is \(\sqrt{1-\tfrac{4\sigma }{1+2\sigma +\beta ^2}}\) also for \(\beta \le 1\).
It remains to show that the contraction factor is in the interval [0, 1). We show that the square of the contraction factor is in [0, 1). We have \(1-\tfrac{4\sigma }{1+2\sigma +\beta ^2} = \tfrac{1-2\sigma +\beta ^2}{1+2\sigma +\beta ^2}<1\). Further, since \(\sigma \le \beta \), we have \(1-2\sigma +\beta ^2\ge 1-2\sigma +\sigma ^2=(1-\sigma )^2\ge 0\). So the numerator is nonnegative and the denominator is positive, which gives a nonnegative contraction factor. This concludes the proof.
1.5 Proof to Proposition 6.7
First, we compute the resolvent \(J_{\gamma A}\). It satisfies
The reflected resolvent is
where we have used
Since \(\gamma \beta \sin {\psi }\) is nonnegative, this implies
Therefore, the reflected resolvent is
Now, let us introduce polar coordinates of the elements:
which gives reflected resolvent
The angle \(\xi \) in the polar coordinate satisfies
and since the numerator is nonnegative, \(\xi =\arctan _2\left( \tfrac{2\gamma \sqrt{\beta ^2-\sigma ^2}}{1-(\gamma \beta )^2}\right) \) where \(\arctan _2\) is defined in (6.4). For the radius \(\delta \) in the polar coordinate, we get
and (since \(\delta >0\))
It remains to compute the factor in (C.6). Using (C.7), we get
This completes the proof.
Appendix D: Proofs to results in Section 7
1.1 Proof to Theorem 7.2
We know from Lemma 3.6 and Definition 2.6 that \(J_A\) is \((1+\sigma )\)-cocoercive, i.e., that it satisfies
for all \(x,y\in \mathcal {H}\). From Proposition 5.2, we know that \(J_A\) is \(\tfrac{\beta }{2(1+\beta )}\)-averaged, i.e., that it satisfies (see [3, Proposition 4.25(iv)])
for all \(x,y\in \mathcal {H}\). Let \(\alpha =\tfrac{\beta }{2(1+\beta )}\) and \(\delta =\tfrac{1}{1+\sigma }\) and define the set \(\mathcal {R}\) of pairs of points \((x,y)\in \mathcal {H}\times \mathcal {H}\) as:
We also define the closure of set of remaining pairs of points \(\mathcal {R}_c=\overline{(\mathcal {H}\times \mathcal {H})\backslash \mathcal {R}}\), i.e.,
Obviously, the contraction factor for \(R_A\) is the worst case contraction factor for pairs of points in \(\mathcal {R}\) and \(\mathcal {R}_c\).
1.2 Contraction factor on \(\mathcal {R}\)
First, we provide a contraction factor for pairs of points in \(\mathcal {R}\). Since \(R_A=2J_A-\mathrm{{Id}}\), we have
where \(\delta =\tfrac{1}{1+\sigma }\in (0,1)\) and \(\alpha =\tfrac{\beta }{2(1+\beta )}\) and the inequalities follow from (D.1) and the definition of \(\mathcal {R}\) in (D.3).
1.3 Contraction factor on \(\mathcal {R}_c\)
Next, we provide a contraction factor for pairs of points in \(\mathcal {R}_c\). Since \(R_A=2J_A-\mathrm{{Id}}\), we conclude that
where we have used that \(\alpha \in (0,\tfrac{1}{2})\), (D.2), and the definition of \(\mathcal {R}_c\) in (D.4).
1.4 Contraction factor of \(R_A\)
Here, we show that the contraction factors on \(\mathcal {R}\) and \(\mathcal {R}_c\) are identical, and we simplify the expression to get a final contraction factor for the reflected resolvent \(R_A\). That the contraction factors are identical is shown by verifying that the difference between is zero:
Next, we simplify this contraction factor by inserting \(\delta =\tfrac{1}{1+\sigma }\) and \(\alpha =\tfrac{\beta }{2(1+\beta )}\). We get
Taking the square root concludes the proof.
1.5 Proof to Proposition 7.6
We start by computing the resolvent and reflected resolvent of \(\gamma A\). The resolvent of \(\gamma A\) is given by
where \(\sigma \) and \(\beta \) are defined in (7.6). The reflected resolvent \(R_{\gamma A}\) is given by
To simplify this expression, we note that
This implies that
Using this equality, we can simplify the reflected resolvent expression:
since \(\gamma c\sigma \beta \sin {\psi }/d>0\). Now, let us write the matrix elements using polar coordinates:
This gives the reflected resolvent:
The angle \(\xi \) in the polar coordinates satisfies
The numerator is always nonnegative, so \(\xi \) is given by \(\xi \!=\!\arctan _2\!\left( \!\tfrac{2\gamma \sqrt{\sigma (\beta -\sigma )}}{1-\sigma \beta \gamma ^2}\!\right) \) with \(\arctan _2\) defined in (6.4). The radius \(\delta \) in the polar coordinates satisfies
and (since \(\delta >0\))
It remains to compute the factor in (D.6). Using (D.7), we conclude
This completes the proof.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Giselsson, P. Tight global linear convergence rate bounds for Douglas–Rachford splitting. J. Fixed Point Theory Appl. 19, 2241–2270 (2017). https://doi.org/10.1007/s11784-017-0417-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11784-017-0417-1