Tight global linear convergence rate bounds for Douglas–Rachford splitting

Giselsson, Pontus

doi:10.1007/s11784-017-0417-1

Tight global linear convergence rate bounds for Douglas–Rachford splitting

Open access
Published: 13 March 2017

Volume 19, pages 2241–2270, (2017)
Cite this article

Download PDF

You have full access to this open access article

Journal of Fixed Point Theory and Applications Aims and scope Submit manuscript

Tight global linear convergence rate bounds for Douglas–Rachford splitting

Download PDF

Pontus Giselsson ORCID: orcid.org/0000-0003-1602-1204¹

2589 Accesses
36 Citations
Explore all metrics

Abstract

Recently, several authors have shown local and global convergence rate results for Douglas–Rachford splitting under strong monotonicity, Lipschitz continuity, and cocoercivity assumptions. Most of these focus on the convex optimization setting. In the more general monotone inclusion setting, Lions and Mercier showed a linear convergence rate bound under the assumption that one of the two operators is strongly monotone and Lipschitz continuous. We show that this bound is not tight, meaning that no problem from the considered class converges exactly with that rate. In this paper, we present tight global linear convergence rate bounds for that class of problems. We also provide tight linear convergence rate bounds under the assumptions that one of the operators is strongly monotone and cocoercive, and that one of the operators is strongly monotone and the other is cocoercive. All our linear convergence results are obtained by proving the stronger property that the Douglas–Rachford operator is contractive.

Douglas–Rachford Splitting for the Sum of a Lipschitz Continuous and a Strongly Monotone Operator

Article 27 April 2019

Convergence of Two-Stage Method with Bregman Divergence for Solving Variational Inequalities*

Article 15 May 2019

Local Linear Convergence of an ADMM-Type Splitting Framework for Equality Constrained Optimization

Article 19 October 2019

1 Introduction

Douglas–Rachford splitting [12, 24] is an algorithm that solves monotone inclusion problems of the form

$$\begin{aligned} 0\in Ax+Bx \end{aligned}$$

where A and B are maximally monotone operators. A class of problems that falls under this category is composite convex optimization problems of the form

$$\begin{aligned} \mathrm{minimize}~&f(x)+g(x) \end{aligned}$$

(1.1)

where f and g are proper, closed, and convex functions. This holds since the subdifferential of proper, closed, and convex functions are maximally monotone operators, and since Fermat’s rule says that the optimality condition for solving (1.1) is $0\in \partial f(x)+\partial g(x)$, under a suitable qualification condition. The algorithm has shown great potential in many applications such as signal processing [6], image denoising [32], and statistical estimation [5] (where the dual algorithm ADMM is discussed).

It has long been known that Douglas–Rachford splitting converges under quite mild assumptions, see [13, 14, 24]. However, the rate of convergence in the general case has just recently been shown to be $O(1/\sqrt{k})$ for the fixed-point residual, [9, 10, 18]. For general maximal monotone operator problems, where one of the operators is strongly monotone and Lipschitz continuous, Lions and Mercier showed in [24] that the Douglas–Rachford algorithm enjoys a linear convergence rate. To the author’s knowledge, this was the sole linear convergence rate results for a long period of time for these methods. Recently, however, many works have shown linear convergence rates for Douglas–Rachford splitting and its dual version, ADMM, see [1, 2, 4, 8, 10, 11, 15,16,17, 19,20,21,22, 28, 30, 33]. The works in [4, 10, 19, 20, 30] concern local linear convergence under different assumptions. The works in [21, 22, 33] consider distributed formulations, while the works in [1, 2, 8, 11, 15,16,17, 24, 27, 28, 31] show global convergence rate bounds under various assumptions. Of these, the works in [1, 2, 16] show tight linear convergence rate bounds. The works in [1, 2] show tight convergence rate results for problem of finding a point in the intersection of two subspaces. In [16] it is shown that the linear convergence rate bounds in [17] (which are generalizations of the bounds in [15]) are tight for composite convex optimization problems where one function is strongly convex and smooth. All these results, except the one by Lions and Mercier, are stated in the convex optimization setting. In this paper, we will provide tight linear convergence rate bounds for monotone inclusion problems.

We consider three different sets of assumptions under which we provide linear convergence rate bounds. In all cases, the properties of Lipschitz continuity or cocoercivity, and strong monotonicity, are attributed to the operators. In the first case, we assume that one operator is strongly monotone and the other is cocoercive. In the second case, we assume that one operator is both strongly monotone and Lipschitz continuous. This is the setting considered by Lions and Mercier in [24], where a non-tight linear convergence rate bound is presented. In the third case, we assume that one operator is both strongly monotone and cocoercive. We show in all these settings that our bounds are tight, meaning that there exists problems from the respective classes that converge exactly with the provided rate bound. In the second and third cases, the rates are tight for all feasible algorithm parameters, while in the first case, the rate is tight for many algorithm parameters.

2 Background

In this section, we introduce some notations and define some operator and function properties.

2.1 Notation

We denote by $\mathbb {R}$ the set of real numbers and by $\overline{\mathbb {R}}:=\mathbb {R}\cup \{\infty \}$ the extended real line. Throughout this paper, $\mathcal {H}$ denotes a separable real Hilbert space. Its inner product is denoted by $\langle \cdot ,\cdot \rangle $, its induced norm by $\Vert \cdot \Vert $. We denote by $\{\phi _i\}_{i=1}^K$ any orthonormal basis in $\mathcal {H}$, where K is the dimension of $\mathcal {H}$ (possibly $\infty $). The gradient to $f:~\mathcal {X}\rightarrow \mathbb {R}$ is denoted by $\nabla f$ and the subdifferential operator to $f:~\mathcal {X}\rightarrow \overline{\mathbb {R}}$ is denoted by $\partial f$ and is defined as $\partial f(x_1):=\{u~|~f(x_2)\ge f(x_1)+\langle u,x_2-x_1\rangle {\hbox { for all }} x_2\in \mathcal {X}\}$. The conjugate function of f is denoted and defined by $f^{*}(y)\triangleq \sup _{x}\left\{ \langle y,x\rangle -f(x)\right\} $. The power set of a set $\mathcal {X}$, i.e., the set of all subsets of $\mathcal {X}$, is denoted by $2^{\mathcal {X}}$. The graph of an (set-valued) operator $A:~\mathcal {X}\rightarrow 2^{\mathcal {Y}}$ is defined and denoted by $\mathrm{{gph}}A = \{(x,y)\in \mathcal {X}\times \mathcal {Y}~|~y\in Ax\}$. The inverse operator $A^{-1}$ is defined through its graph by $\mathrm{{gph}}A^{-1} = \{(y,x)\in \mathcal {Y}\times \mathcal {X}~|~y\in Ax\}$. The identity operator is denoted by $\mathrm{{Id}}$ and the resolvent of a monotone operator A is defined and denoted by $J_A=(\mathrm{{Id}}+A)^{-1}$. Finally, the class of closed, proper, and convex functions $f:~\mathcal {H}\rightarrow \overline{\mathbb {R}}$ is denoted by $\Gamma _0(\mathcal {H})$.

2.2 Operator properties

Definition 2.1

(Strong monotonicity) Let $\sigma > 0$. An operator $A:~\mathcal {H}\rightarrow 2^{\mathcal {H}}$ is $\sigma $-strongly monotone if

$$\begin{aligned} \langle u-v,x-y\rangle \ge \sigma \Vert x-y\Vert ^2 \end{aligned}$$

holds for all $(x,u)\in \mathrm{{gph}}(A)$ and $(y,v)\in \mathrm{{gph}}(A)$.

The operator is merely monotone if $\sigma =0$ in the above definition. In the following three definitions, we state some properties for single-valued operators $T:~\mathcal {H}\rightarrow \mathcal {H}$. We state the properties for operators with full domain, but they can also be stated for operators with any nonempty domain $\mathcal {D}\subseteq \mathcal {H}$.

Definition 2.2

(Lipschitz continuity) Let $\beta \ge 0$. A mapping $T:~\mathcal {H}\rightarrow \mathcal {H}$ is $\beta $-Lipschitz continuous if

$$\begin{aligned} \Vert Tx-Ty\Vert \le \beta \Vert x-y\Vert \end{aligned}$$

holds for all $x,y\in \mathcal {H}$.

Definition 2.3

(Nonexpansiveness) A mapping $T:~\mathcal {H}\rightarrow \mathcal {H}$ is nonexpansive if it is 1-Lipschitz continuous.

Definition 2.4

(Contractiveness) A mapping $T:~\mathcal {H}\rightarrow \mathcal {H}$ is $\delta $-contractive if it is $\delta $-Lipschitz continuous with $\delta \in [0,1)$.

Definition 2.5

(Averaged mappings) A mapping $T:~\mathcal {H}\rightarrow \mathcal {H}$ is $\alpha $-averaged if there exists a nonexpansive mapping $R:~\mathcal {H}\rightarrow \mathcal {H}$ and $\alpha \in (0,1)$ such that $T=(1-\alpha )\mathrm{{Id}}+\alpha R$.

From [3, Proposition 4.25], we know that an operator $T:~\mathcal {H}\rightarrow \mathcal {H}$ is $\alpha $-averaged if and only if it satisfies

$$\begin{aligned} \tfrac{1-\alpha }{\alpha }\Vert (\mathrm{{Id}}-T)x-(\mathrm{{Id}}-T)y\Vert ^2+\Vert Tx-Ty\Vert ^2\le \Vert x-y\Vert ^2 \end{aligned}$$

(2.1)

for all $x,y\in \mathcal {H}$.

Definition 2.6

(Cocoercivity) Let $\beta > 0$. A mapping $T:~\mathcal {H}\rightarrow \mathcal {H}$ is $\tfrac{1}{\beta }$-cocoercive if

$$\begin{aligned} \langle Tx-Ty,x-y\rangle \ge \tfrac{1}{\beta }\Vert Tx-Ty\Vert ^2 \end{aligned}$$

holds for all $x,y\in \mathcal {H}$.

3 Preliminaries

In this section, we state and show preliminary results that are needed to prove the linear convergence rate bounds. We state some lemmas that describe how cocoercivity, Lipschitz continuity, as well as averagedness relate to each other. We also introduce negatively averaged operators, T, that are defined by that $-T$ is averaged. We show different properties of such operators, including that averaged maps of negatively averaged operators are contractive. This result will be used to show linear convergence in the case where the strong monotonicity and Lipschitz continuity properties are split between the operators.

3.1 Useful lemmas

Proofs to the following three lemmas are found in Appendix 8.

Lemma 3.1

Assume that $\beta >0$ and let $T:~\mathcal {H}\rightarrow \mathcal {H}$. Then $\tfrac{1}{2\beta }$-cocoercivity of $\beta \mathrm{{Id}}+T$ is equivalent to $\beta $-Lipschitz continuity of T.

Lemma 3.2

Assume that $\beta \in (0,1)$. Then $\tfrac{1}{\beta }$-cocoercivity of $R:~\mathcal {H}\rightarrow \mathcal {H}$ is equivalent to $\tfrac{\beta }{2}$-averagedness of $T=R+(1-\beta )\mathrm{{Id}}$.

Lemma 3.3

Let $T:~\mathcal {H}\rightarrow \mathcal {H}$ be $\delta $-contractive with $\delta \in [0,1)$. Then $R=(1-\alpha )\mathrm{{Id}}+\alpha T$ is contractive for all $\alpha \in (0,\tfrac{2}{1+\delta })$. The contraction factor is $|1-\alpha |+\alpha \delta $.

For easier reference, we also record special cases of some results in [3] that will be used later. Specifically, we record, in order, special cases of [3, Proposition 4.33], [3, Proposition 4.28], and [3, Proposition 23.11].

Lemma 3.4

Let $\beta \in (0,1)$ and let $T:~\mathcal {H}\rightarrow \mathcal {H}$ be $\tfrac{1}{\beta }$-cocoercive. Then $(\mathrm{{Id}}-T)$ is $\tfrac{\beta }{2}$-averaged.

Lemma 3.5

Let $T:~\mathcal {H}\rightarrow \mathcal {H}$ be $\alpha $-averaged with $\alpha \in (0,\tfrac{1}{2})$. Then $(2T-\mathrm{{Id}})$ is $2\alpha $-averaged.

Lemma 3.6

Let $A:~\mathcal {H}\rightarrow 2^{\mathcal {H}}$ be maximally monotone and $\sigma $-strongly monotone with $\sigma >0$. Then $J_A=(\mathrm{{Id}}+A)^{-1}$ is $(1+\sigma )$-cocoercive.

3.2 Negatively averaged operators

In this section we define negatively averaged operators and show various properties for these.

Definition 3.7

An operator $T:~\mathcal {H}\rightarrow \mathcal {H}$ is $\theta $-negatively averaged with $\theta \in (0,1)$ if $-T$ is $\theta $-averaged.

This definition implies that an operator T is $\theta $-negatively averaged if and only if it satisfies

$$\begin{aligned} T = -((1-\theta )\mathrm{{Id}}+\theta \bar{R})=(\theta -1)\mathrm{{Id}}+\theta R \end{aligned}$$

where $\bar{R}$ is nonexpansive and $R:=-\bar{R}$ is, therefore, also nonexpansive. Since $-T$ is averaged, it is also nonexpansive, and so is T.

Since negatively averaged operators are nonexpansive, they can be averaged.

Definition 3.8

An $\alpha $-averaged $\theta $-negatively averaged operator $S:~\mathcal {H}\rightarrow \mathcal {H}$ is defined as $S=(1-\alpha )\mathrm{{Id}}+\alpha T$ where $T:~\mathcal {H}\rightarrow \mathcal {H}$ is $\theta $-negatively averaged.

Next, we show that averaged negatively averaged operators are contractive.

Proposition 3.9

An $\alpha $-averaged $\theta $-negatively averaged operator $S:~\mathcal {H}\rightarrow \mathcal {H}$ is $|1-2\alpha +\alpha \theta |+\alpha \theta $-contractive.

Proof

Let $T=(\theta -1)\mathrm{{Id}}+\theta R$ (for some nonexpansive R) be the $\theta $-negatively averaged operator, which implies that $S=(1-\alpha )\mathrm{{Id}}+\alpha T$. Then

$$\begin{aligned} \Vert Sx-Sy\Vert&=\Vert ((1-\alpha )\mathrm{{Id}}+\alpha T)x-((1-\alpha )\mathrm{{Id}}+\alpha T)y\Vert \\&= \Vert (1-2\alpha +\alpha \theta )(x-y)+\alpha \theta (Rx-Ry))\Vert \\&\le |1-2\alpha +\alpha \theta |\Vert x-y\Vert +\alpha \theta \Vert x-y\Vert \\&= (|1-2\alpha +\alpha \theta |+\alpha \theta )\Vert x-y\Vert \end{aligned}$$

since R is nonexpansive. It is straightforward to verify that $|1-2\theta +\alpha \theta |+\alpha \theta <1$ for all combinations of $\alpha \in (0,1)$ and $\theta \in (0,1)$. Hence, S is contractive and the proof is complete. $\square $

Next, we optimize the contraction factor w.r.t. $\alpha $.

Proposition 3.10

Assume that $T:~\mathcal {H}\rightarrow \mathcal {H}$ is $\theta $-negatively averaged. Then the $\alpha $ that optimizes the contraction factor for the $\alpha $-averaged $\theta $-negatively averaged operator $S=(1-\alpha )\mathrm{{Id}}+\alpha T$ is $\alpha = \tfrac{1}{2-\theta }$. The corresponding optimal contraction constant is $\tfrac{\theta }{2-\theta }$.

Proof

Due to the absolute value, Proposition 3.9 states that the contraction factor $\delta $ of T can be written as

$$\begin{aligned} \delta&={\left\{ \begin{array}{ll} 1-2\alpha +\alpha \theta +\alpha \theta &{} {\hbox {if }} \alpha \le \tfrac{1}{2-\theta }\\ -(1-2\alpha +\alpha \theta )+\alpha \theta &{} {\hbox {if }} \alpha \ge \tfrac{1}{2-\theta }\\ \end{array}\right. } ={\left\{ \begin{array}{ll} 1-2(1-\theta )\alpha &{} {\hbox {if }} \alpha \le \tfrac{1}{2-\theta }\\ 2\alpha -1 &{} {\hbox {if }} \alpha \ge \tfrac{1}{2-\theta } \end{array}\right. } \end{aligned}$$

where the kink in the absolute value term is at $\alpha =\tfrac{1}{2-\theta }$. Since $\theta \in (0,1)$, we get negative slope for $\alpha \le \tfrac{1}{2-\theta }$ and positive slope for $\alpha \ge \tfrac{1}{2-\theta }$. Therefore, the optimal $\alpha $ is in the kink at $\alpha =\tfrac{1}{2-\theta }$, which satisfies $\alpha \in (\tfrac{1}{2},1)$ since $\theta \in (0,1)$. Inserting this into the contraction factor expression gives $\tfrac{\theta }{2-\theta }$. This concludes the proof. $\square $

Remark 3.11

The optimal contraction factor $\tfrac{\theta }{2-\theta }$ is strictly increasing in $\theta $ on the interval $\theta \in (0,1)$. Therefore, the contraction factor becomes smaller the smaller $\theta $ is.

We conclude this section by showing that the composition of an averaged and a negatively averaged operator is negatively averaged. Before we state the result, we need a characterization of $\theta $-negatively averaged operators T. This follows directly from the definition of averaged operators in (2.1) since $-T$ is $\theta $-averaged:

$$\begin{aligned} \tfrac{1-\theta }{\theta }\Vert (\mathrm{{Id}}+T)x-(\mathrm{{Id}}+T)y\Vert ^2+\Vert Tx-Ty\Vert ^2\le \Vert x-y\Vert ^2. \end{aligned}$$

(3.1)

Proposition 3.12

Assume that $T_{\theta }:~\mathcal {H}\rightarrow \mathcal {H}$ is $\theta $-negatively averaged and $T_{\alpha }:~\mathcal {H}\rightarrow \mathcal {H}$ is $\alpha $-averaged. Then $T_{\theta }T_{\alpha }$ is $\tfrac{\kappa }{\kappa +1}$-negatively averaged where $\kappa =\tfrac{\theta }{1-\theta }+\tfrac{\alpha }{1-\alpha }$.

Proof

Let $\kappa _{\theta }=\tfrac{\theta }{1-\theta }$ and $\kappa _{\alpha }=\tfrac{\alpha }{1-\alpha }$, then $\kappa = \kappa _{\theta }+\kappa _{\alpha }$. We have

$$\begin{aligned}&\Vert (\mathrm{{Id}}+T_{\theta }T_{\alpha })x-(\mathrm{{Id}}+T_{\theta } T_{\alpha })y\Vert ^2\nonumber \\&\quad =\Vert (x-y)-(T_{\alpha } x-T_{\alpha } y)+(T_{\alpha }x-T_{\alpha }y)+(T_{\theta }T_{\alpha }x-T_{\theta }T_{\alpha }y)\Vert ^2\nonumber \\&\quad =\Vert (\mathrm{{Id}}-T_{\alpha })x-(\mathrm{{Id}}-T_{\alpha })y+(\mathrm{{Id}}+T_{\theta })T_{\alpha }x-(\mathrm{{Id}}+T_{\theta })T_{\alpha }y\Vert ^2\nonumber \\&\quad \le \tfrac{\kappa _{\theta }+\kappa _{\alpha }}{\kappa _{\alpha }}\Vert (\mathrm{{Id}}-T_{\alpha })x-(\mathrm{{Id}}-T_{\alpha })y\Vert ^2\nonumber \\&\qquad +\;\tfrac{\kappa _{\theta }+\kappa _{\alpha }}{\kappa _{\theta }}\Vert (\mathrm{{Id}}+T_{\theta })T_{\alpha }x-(\mathrm{{Id}}+T_{\theta })T_{\alpha }y\Vert ^2\nonumber \\&\quad \le (\kappa _{\theta }+\kappa _{\alpha }) (\Vert x-y\Vert ^2-\Vert T_{\alpha }x-T_{\alpha }y\Vert ^2)\nonumber \\&\qquad +\;(\kappa _{\theta }+\kappa _{\alpha }) (\Vert T_{\alpha }x-T_{\alpha }y\Vert ^2-\Vert T_{\theta }T_{\alpha }x-T_{\theta }T_{\alpha }y\Vert ^2)\nonumber \\&\quad =(\kappa _{\theta }+\kappa _{\alpha })(\Vert x-y\Vert ^2-\Vert T_{\theta }T_{\alpha }x-T_{\theta }T_{\alpha }y\Vert ^2)\nonumber \\&\quad =\kappa (\Vert x-y\Vert ^2-\Vert T_{\theta }T_{\alpha }x-T_{\theta }T_{\alpha }y\Vert ^2) \end{aligned}$$

(3.2)

where the first inequality follows from convexity of $\Vert \cdot \Vert ^2$. More precisely, let $t\in [0,1]$, then, by convexity of $\Vert \cdot \Vert ^2$, we conclude that

$$\begin{aligned} \Vert x+y\Vert ^2&=\Vert t\tfrac{1}{t}x+(1-t)\tfrac{1}{1-t}y\Vert ^2\le t\Vert \tfrac{1}{t}x\Vert ^2+(1-t)\tfrac{1}{1-t}\Vert y\Vert ^2\\&=\tfrac{1}{t}\Vert x\Vert ^2+\tfrac{1}{1-t}\Vert y\Vert ^2. \end{aligned}$$

Letting $t=\tfrac{\kappa _{\alpha }}{\kappa _{\theta }+\kappa _{\alpha }}\in [0,1]$, which implies that $1-t=\tfrac{\kappa _{\theta }}{\kappa _{\theta }+\kappa _{\alpha }}\in [0,1]$, gives the first inequality in (3.2). The second inequality in (3.2) follows from (2.1) and (3.1). The relation in (3.2) coincides with the definition of negative averagedness in (3.1). Thus, $T_{\theta }T_{\alpha }$ is $\phi $-negatively averaged with $\phi $ satisfying $\tfrac{1-\phi }{\phi }=\tfrac{1}{\kappa }$. This gives $\phi = \tfrac{\kappa }{\kappa +1}$ and the proof is complete. $\square $

Remark 3.13

This result can readily be extended to show averagedness of $T=T_1T_2\cdots T_N$ where $T_i$ are $\alpha _i$-(negatively) averaged for $i=1,\ldots ,N$. We get that T is $\tfrac{\kappa }{1+\kappa }$-negatively averaged with $\kappa =\sum _{i=1}^N\tfrac{\alpha _i}{1-\alpha _i}$ if the number of negatively averaged $T_i$:s is odd, and that T is $\tfrac{\kappa }{1+\kappa }$-averaged if the number of negatively averaged $T_i$:s is even. Similar results have been presented, e.g., in [3, Proposition 4.32] which is improved in [7]. Our result extends these results in that it allows also for negatively averaged operators and reduces to the result in [7] for averaged operators.

4 Douglas–Rachford splitting

Douglas–Rachford splitting can be applied to solve monotone inclusion problems of the form

$$\begin{aligned} 0\in Ax+Bx \end{aligned}$$

(4.1)

where $A,B:~\mathcal {H}\rightarrow 2^{\mathcal {H}}$ are maximally monotone operators. The algorithm separates A and B by only touching the corresponding resolvents, where the resolvent $J_A:~\mathcal {H}\rightarrow \mathcal {H}$ is defined as

$$\begin{aligned} J_{A} := (A+\mathrm{{Id}})^{-1}. \end{aligned}$$

The resolvent has full domain since A is assumed maximally monotone, see [26] and [3, Proposition 23.7]. If $A=\partial f$ where f is a proper, closed, and convex function, then $J_A = {\mathrm{prox}}_{f}$ where the prox operator ${\mathrm{prox}}_{f}$ is defined as

$$\begin{aligned} {\mathrm{prox}}_{f}(z) = {{\mathrm{argmin}}}_x\left\{ f(x)+\tfrac{1}{2}\Vert x-z\Vert ^2\right\} . \end{aligned}$$

(4.2)

That this holds follows directly from Fermat’s rule [3, Theorem 16.2] applied to the proximal operator definition.

The Douglas–Rachford algorithm is defined by the iteration

$$\begin{aligned} z^{k+1} = ((1-\alpha )\mathrm{{Id}}+\alpha R_AR_B)z^{k} \end{aligned}$$

(4.3)

where $\alpha \in (0,1)$ (we will see that also $\alpha \ge 1$ can sometimes be used) and $R_A:~\mathcal {H}\rightarrow \mathcal {H}$ is the reflected resolvent, which is defined as

$$\begin{aligned} R_A:=2J_A-\mathrm{{Id}}. \end{aligned}$$

(Note that what is traditionally called Douglas–Rachford splitting is when $\alpha =1/2$ in (4.3). The case with $\alpha =1$ in (4.3) is often referred to as the Peaceman–Rachford algorithm, see [29]. We will use the term Douglas–Rachford splitting for all feasible choices of $\alpha $.)

Since the reflected resolvent is nonexpansive in the general case [3, Corollary 23.10], and since compositions of nonexpansive operators are nonexpansive, the Douglas–Rachford algorithm is an averaged iteration of a nonexpansive mapping when $\alpha \in (0,1)$. Therefore, Douglas–Rachford splitting is a special case of the Krasnosel’skiĭ–Mann iteration [23, 25], which is known to converge to a fixed-point of the nonexpansive operator, in this case $R_AR_B$, see [3, Theorem 5.14]. Since an $x\in \mathcal {H}$ solves (4.1) if and only if $x=J_Az$ where $z=R_AR_Bz$, see [3, Proposition 25.1] this algorithm can be used to solve monotone inclusion problems of the form (4.1). Note that to solve (4.1) is equivalent to solving

$$\begin{aligned} 0\in \gamma Ax+\gamma Bx \end{aligned}$$

for any $\gamma \in (0,\infty )$. Then we can define $A_{\gamma } = \gamma A$ and (4.1) can also be solved by the iteration

$$\begin{aligned} z^{k+1} = ((1-\alpha )\mathrm{{Id}}+\alpha R_{A_\gamma }R_{B_\gamma })z^{k}. \end{aligned}$$

(4.4)

Therefore, $\gamma $ is an algorithm parameter that affects the progress of the iterations.

The objective of this paper is to provide tight linear convergence rate bounds for the Douglas–Rachford algorithm under various assumptions. Using these bounds, we will show how to select the algorithm parameters $\gamma $ and $\alpha $ that optimize these bounds. The first setting we consider is when A is strongly monotone and B is cocoercive.

5 A strongly monotone and B cocoercive

In this section, we show linear convergence for Douglas–Rachford splitting in the case where A and B are maximally monotone, A is strongly monotone, and B is cocoercive; that is, we make the following assumptions.

Assumption 5.1

Suppose that:

(i)
$A:~\mathcal {H}\rightarrow 2^{\mathcal {H}}$ is maximally monotone and $\sigma $-strongly monotone.
(ii)
$B:~\mathcal {H}\rightarrow \mathcal {H}$ is maximally monotone and $\tfrac{1}{\beta }$-cocoercive.

Before we can state the main linear convergence result, we need to characterize the properties of the resolvent, the reflected resolvent, and the composition between reflected resolvents. This is done in the following series of propositions, this first of which is proven in Appendix 8.

Proposition 5.2

The resolvent $J_B$ of a $\tfrac{1}{\beta }$-cocoercive operator $B:~\mathcal {H}\rightarrow \mathcal {H}$ is $\tfrac{\beta }{2(1+\beta )}$-averaged.

This implies that also the reflected resolvent is averaged.

Proposition 5.3

The reflected resolvent of a $\tfrac{1}{\beta }$-cocoercive operator $B:~\mathcal {H}\rightarrow \mathcal {H}$ is $\tfrac{\beta }{1+\beta }$-averaged.

Proof

This follows directly from the Proposition 5.2 and Lemma 3.5. $\square $

If the operator instead is strongly monotone, the reflected resolvent is negatively averaged.

Proposition 5.4

The reflected resolvent of a $\sigma $-strongly monotone and maximal monotone operator $A:~\mathcal {H}\rightarrow 2^\mathcal {H}$ is $\tfrac{1}{1+\sigma }$-negatively averaged.

Proof

From Lemma 3.6, we have that the resolvent $J_A$ is $(1+\sigma )$-cocoercive. Using Lemma 3.4, this implies that $\mathrm{{Id}}-J_A$ is $\tfrac{1}{2(1+\sigma )}$-averaged. Then using Lemma 3.5, this implies that $2(\mathrm{{Id}}-J_A)-\mathrm{{Id}}=\mathrm{{Id}}-2J_A=-R_A$ is $\tfrac{1}{1+\sigma }$-averaged, hence $R_A$ is $\tfrac{1}{1+\sigma }$-negatively averaged. This completes the proof. $\square $

The composition of the reflected resolvents of a strongly monotone operator and a cocoercive operator is negatively averaged.

Proposition 5.5

Suppose that Assumption 5.1 holds. Then, the composition $R_AR_B$ is $\tfrac{\tfrac{1}{\sigma }+\beta }{1+\tfrac{1}{\sigma }+\beta }$-negatively averaged.

Proof

Since $R_A$ is $\tfrac{1}{1+\sigma }$-negatively averaged and $R_B$ is $\tfrac{\beta }{1+\beta }$-averaged, see Propositions 5.3 and 5.4, we can apply Proposition 3.12. We get that $\kappa = \tfrac{\tfrac{1}{1+\sigma }}{1-\tfrac{1}{1+\sigma }}+\tfrac{\tfrac{\beta }{1+\beta }}{1-\tfrac{\beta }{1+\beta }}=\tfrac{1}{\sigma }+\beta $ and that the averagedness parameter of the negatively averaged operator $R_AR_B$ is given by $\tfrac{\kappa }{\kappa +1}=\tfrac{\tfrac{1}{\sigma }+\beta }{\tfrac{1}{\sigma }+\beta +1}$. This concludes the proof. $\square $

With these results, we can now show the following linear convergence rate bounds for Douglas–Rachford splitting under Assumption 5.1. The theorem is proven in Appendix 8.

Theorem 5.6

Suppose that Assumption 5.1 holds, that $\alpha \in (0,1)$, that $\gamma \in (0,\infty )$, and that the Douglas–Rachford algorithm (4.3) is applied to solve $0\in \gamma Ax+\gamma Bx$. Then the algorithm converges at least with rate factor

$$\begin{aligned} \left| 1-2\alpha +\alpha \tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{1+\tfrac{1}{\gamma \sigma }+\gamma \beta }\right| +\alpha \tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{1+\tfrac{1}{\gamma \sigma }+\gamma \beta }. \end{aligned}$$

(5.1)

Optimizing this rate bound w.r.t. $\alpha $ and $\gamma $ gives $\gamma = \tfrac{1}{\sqrt{\beta \sigma }}$ and $\alpha =\tfrac{\sqrt{\beta /\sigma }+1/2}{1+\sqrt{\beta /\sigma }}$. The corresponding optimal rate bound is $\tfrac{\sqrt{\beta /\sigma }}{\sqrt{\beta /\sigma }+1}$.

5.1 Tightness

In this section, we present an example that shows tightness of the linear convergence rate bounds in Theorem 5.6 for many algorithm parameters. We consider a two-dimensional Euclidean example, which is given by the following convex optimization problem:

$$\begin{aligned} \mathrm{minimize}&f(x)+f^*(x) \end{aligned}$$

(5.2)

where

$$\begin{aligned} f(x)=\tfrac{\beta }{2}x_1^2, \end{aligned}$$

(5.3)

and $x=(x_1,x_2)$, and $\beta >0$. The gradient $\nabla f=\beta x_1$, so it is cocoercive with factor $\tfrac{1}{\beta }$. According to [3, Theorem 18.15] this is equivalent to that $f^*$ is $\tfrac{1}{\beta }$-strongly convex and, therefore, $\partial f^*$ is $\sigma :=\tfrac{1}{\beta }$-strongly monotone.

The following proposition shows that when solving (5.2) with f defined in (5.3) using Douglas–Rachford splitting, the upper linear convergence rate bound is exactly attained. The result is proven in Appendix 8.

Proposition 5.7

Suppose that the Douglas–Rachford algorithm (4.3) is applied to solve (5.2) with f in (5.3). Further suppose that the parameters $\gamma $ and $\alpha $ satisfy $\gamma \in (0,\infty )$ and $\alpha \in [c,1)$ where $c=\tfrac{1+\gamma \sigma +\gamma ^2\sigma \beta }{1+2\gamma \sigma +\gamma ^2\sigma \beta }$ and that $z^0=(0,z_2^0)$ with $z_2^0\ne 0$. Then the $z^k$ sequence in (4.3) converges exactly with rate (5.1) in Theorem 5.6.

So, for all $\gamma $ parameters and some $\alpha $ parameters, the provided bound is tight. Especially, the optimal parameter choices $\gamma =\tfrac{1}{\sqrt{\beta \sigma }}$ and $\alpha =\tfrac{1+2\sqrt{\beta /\sigma }}{2(1+\sqrt{\beta /\sigma })}$ give a tight bound.

It is interesting to note that although we have considered a more general class of problems than convex optimization problems, a convex optimization problem is used to attain the worst case rate.

5.2 Comparison to other bounds

In [17], it was shown that Douglas–Rachford splitting converges as $\tfrac{\sqrt{\beta /\sigma }-1}{\sqrt{\beta /\sigma }+1}$ when solving composite optimization problems of the form $0\in \gamma \nabla f+\gamma \partial g$, where $\nabla f$ is $\sigma $-strongly monotone and $\tfrac{1}{\beta }$-cocoercive and the algorithm parameters are chosen as $\alpha =1$ and $\gamma = \tfrac{1}{\sqrt{\beta \sigma }}$. In our setting, with $\partial f$ being $\sigma $-strongly monotone and $\partial g$ being $\tfrac{1}{\beta }$-cocoercive, we can instead pose the equivalent problem $0\in \gamma \partial \hat{f}(x)+\gamma \partial \hat{g}(x)$ where $\hat{f} = f-\tfrac{\sigma }{2}\Vert \cdot \Vert ^2$ and $\hat{g}=g+\tfrac{\sigma }{2}\Vert \cdot \Vert ^2$. Then $\partial \hat{f}$ is merely monotone and $\hat{g}$ is $\sigma $-strongly monotone and $\tfrac{1}{\beta +\sigma }$-cocoercive. For that problem, [17] shows a linear convergence rate of at least rate $\tfrac{\sqrt{(\beta +\sigma )/\sigma }-1}{\sqrt{(\beta +\sigma )/\sigma }+1}$ (when optimal parameters are used). This rate turns out to be better than the rate provided in Theorem 5.6, i.e., $\tfrac{\sqrt{\beta /\sigma }}{\sqrt{\beta /\sigma }+1}$, which assumes that the strong convexity and smoothness properties are split between the two operators. This is shown by the following chain of equivalences which departs from the fact that the square root is sub-additive, i.e., that $\sqrt{\beta +\sigma }\le \sqrt{\sigma }+\sqrt{\beta }$ for $\beta ,\sigma \ge 0$:

This implies that, from a worst case perspective, it is better to shift both properties into one operator. This is also always possible, without increasing the computational cost in the algorithm, since the prox-operator is just shifted slightly:

$$\begin{aligned} {\mathrm{prox}}_{\gamma \hat{f}}(z)&= {{\mathrm{argmin}}}_x\left\{ \hat{f}(x)+\tfrac{1}{2\gamma }\Vert x-z\Vert ^2\right\} \\&= {{\mathrm{argmin}}}_x\left\{ f(x)-\tfrac{\sigma }{2}\Vert x\Vert ^2+\tfrac{1}{2\gamma }\Vert x-z\Vert ^2\right\} \\&= {{\mathrm{argmin}}}_x\left\{ f(x)+\tfrac{1-\gamma \sigma }{2\gamma }\Vert x-\tfrac{1}{1-\gamma \sigma }z\Vert ^2\right\} \\&= {\mathrm{prox}}_{\tfrac{\gamma }{1-\gamma \sigma } f}(\tfrac{1}{1-\gamma \sigma }z). \end{aligned}$$

A similar relation holds for ${\mathrm{prox}}_{\gamma \hat{g}}$ with the sign in front of $\gamma \sigma $ flipped.

6 A strongly monotone and Lipschitz continuous

In this section, we consider the case where one of the operators is $\sigma $-strongly monotone and $\beta $-Lipschitz continuous. This is assumption is stated next.

Assumption 6.1

Suppose that:

(i)
The operators $A:~\mathcal {H}\rightarrow \mathcal {H}$ and $B:~\mathcal {H}\rightarrow 2^{\mathcal {H}}$ are maximally monotone.
(ii)
A is $\sigma $-strongly monotone and $\beta $-Lipschitz continuous.

First, we state a result that characterizes the resolvent of A. It is proven in Appendix 8.

Proposition 6.2

Assume that $A:~\mathcal {H}\rightarrow \mathcal {H}$ is a maximal monotone $\beta $-Lipschitz continuous operator. Then the resolvent $J_{A}=(\mathrm{{Id}}+A)^{-1}$ satisfies

$$\begin{aligned} 2\langle J_Ax-J_Ay,x-y\rangle&\ge \Vert x-y\Vert ^2 +(1-\beta ^2)\Vert J_Ax-J_Ay\Vert ^2. \end{aligned}$$

(6.1)

This resolvent property is used when proving the following contraction factor of the reflected resolvent. The result is proven in Appendix 8.

Theorem 6.3

Suppose that $A:~\mathcal {H}\rightarrow \mathcal {H}$ is a $\sigma $-strongly monotone and $\beta $-Lipschitz continuous operator. Then the reflected resolvent $R_{A} = 2J_A-\mathrm{{Id}}$ is $\sqrt{1-\tfrac{4\sigma }{1+2\sigma +\beta ^2}}$-contractive.

The parameter $\gamma $ that optimizes the contraction factor for $R_{\gamma A}$ is the minimizer of $h(\gamma ):=1-\tfrac{4\gamma \sigma }{1+\gamma \sigma +(\gamma \beta )^2}$ ($\gamma A$ is $\gamma \sigma $-strongly monotone and $\gamma \beta $-Lipschitz continuous). The gradient $\nabla h(\gamma ) =\tfrac{4\sigma (\beta ^2\gamma ^2-1)}{(\beta ^2\gamma ^2+2\sigma \gamma +1)^2}$, which implies that the extreme points are given by $\gamma =\pm \tfrac{1}{\beta }$. Since $\gamma >0$ and the gradient is positive for $\gamma >\tfrac{1}{\beta }$ and negative for $\gamma \in (0,\tfrac{1}{\beta })$, $\gamma =\tfrac{1}{\beta }$ optimizes the contraction factor. The corresponding rate is

$$\begin{aligned} \sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}} = \sqrt{1-\tfrac{2\sigma /\beta }{1+\sigma /\beta }}=\sqrt{\tfrac{1-\sigma /\beta }{1+\sigma /\beta }}=\sqrt{\tfrac{\beta /\sigma -1}{\beta /\sigma +1}}. \end{aligned}$$

This is summarized in the following proposition.

Proposition 6.4

The parameter $\gamma $ that optimizes the contraction factor of $R_{\gamma A}$ is given by $\gamma = \tfrac{1}{\beta }$. The corresponding contraction factor is $\sqrt{\tfrac{\beta /\sigma -1}{\beta /\sigma +1}}$.

Now, we are ready to state the convergence rate results for Douglas–Rachford splitting.

Theorem 6.5

Suppose that Assumption 6.1 holds and that the Douglas–Rachford algorithm (4.3) is applied to solve $0\in \gamma Ax+\gamma Bx$. Let$\delta =\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}$, then the algorithm converges at least with rate factor

$$\begin{aligned} |1-\alpha |+\alpha \delta \end{aligned}$$

(6.2)

for all $\alpha \in (0,\tfrac{2}{1+\delta })$. Optimizing this bound w.r.t. $\alpha $ and $\gamma $ gives $\alpha =1$ and $\gamma =\tfrac{1}{\beta }$ and corresponding optimal rate bound $\sqrt{\tfrac{\beta /\sigma -1}{\beta /\sigma +1}}$.

Proof

It follows immediately from Theorem 6.3, Lemma 3.3, and Proposition 6.4 by noting that $\alpha =1$ minimizes (6.2). $\square $

In the following section, we will see that there exists a problem from the considered class that converges exactly with the provided rate.

6.1 Tightness

We consider a problem where A is a scaled rotation operator, i.e,:

$$\begin{aligned} A=d\begin{bmatrix} \cos {\psi }&\quad -\sin {\psi } \\ \sin {\psi }&\quad \cos {\psi } \end{bmatrix} \end{aligned}$$

(6.3)

where $0\le \psi <\tfrac{\pi }{2}$ and $d\in (0,\infty )$. First, we show that A is strongly monotone and Lipschitz continuous.

Proposition 6.6

The operator A in (6.3) is $d\cos {\psi }$-strongly monotone and d-Lipschitz continuous.

Proof

We first show that A is $d\cos {\psi }$-strongly monotone. Since A is linear, we have

$$\begin{aligned} \langle Av,v\rangle&=d\langle (\cos {\psi }v_1-\sin {\psi }v_2,\sin {\psi }v_1+\cos {\psi }v_2),(v_1,v_2)\rangle \\&=d\cos {\psi }(v_1^2+v_2^2)=d\cos {\psi }\Vert v\Vert ^2. \end{aligned}$$

That is, A is $d\cos {\psi }$-strongly monotone. Since A is a scaled (with d) rotation operator, its largest eigenvalue is d, and hence A is d-Lipschitz. This concludes the proof. $\square $

We need an explicit form of the reflected resolvent of A to show that the rate is tight. To state it, we define the following alternative arctan definition that is valid when $\tan {\xi }=\tfrac{x}{y}$ and $x\ge 0$:

$$\begin{aligned} \arctan _2\left( \tfrac{x}{y}\right) ={\left\{ \begin{array}{ll} \arctan (\tfrac{x}{y}) &{} {\hbox {if }} x\ge 0, y>0 \\ \arctan (\tfrac{x}{y})+\pi &{} {\hbox {if }} x\ge 0, y<0 \\ \tfrac{\pi }{2}&{} x \ge 0, y=0 \end{array}\right. } \end{aligned}$$

(6.4)

This arctan is defined for nonnegative numerators x only, and outputs an angle in the interval $[0,\pi ]$.

Next, we provide the expression for the reflected resolvent. To simplify its notation, we let $\sigma $ denote the strong convexity modulus and $\beta $ the Lipschitz constant of A, i.e.,

$$\begin{aligned} \sigma&= d\cos {\psi },&\beta&=d. \end{aligned}$$

(6.5)

The following result is proven in Appendix 8.

Proposition 6.7

The reflected resolvent of $\gamma A$, with A in (6.3) and $\gamma \in (0,\infty )$, is

$$\begin{aligned} R_{\gamma A} = \sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}} \begin{bmatrix} \cos {\xi }&\sin {\xi }\\ -\sin {\xi }&\cos {\xi } \end{bmatrix} \end{aligned}$$

where $\sigma $ and $\beta $ are defined in (6.5), and $\xi $ satisfies $\xi =\arctan _2\left( \tfrac{2\gamma \sqrt{\beta ^2-\sigma ^2}}{1-(\gamma \beta )^2}\right) $ with $\arctan _2$ defined in (6.4).

That is, the reflected resolvent is first a rotation then a contraction. The contraction factor is exactly the upper bound on the contraction factor in Theorem 6.5. Therefore, the A in (6.3) can be used to show tightness of the results in Theorem 6.5. To do so, we need another operator B that cancels the rotation introduced by A. For $\alpha \in (0,1]$, we will need $R_{\gamma A}R_{\gamma B}=\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}I$ and for $\alpha >1$, we will need $R_{\gamma A}R_{\gamma B}=-\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}I$. This is clearly achieved if $R_{\gamma B}$ is another rotation operator. Using the following straightforward consequence of Minty’s theorem (see [26]) we conclude that any rotation operator (since they are nonexpansive) is the reflected resolvent of a maximally monotone operator.

Proposition 6.8

An operator $R:~\mathcal {H}\rightarrow \mathcal {H}$ is nonexpansive if and only if it is the reflected resolvent of a maximally monotone operator.

Proof

It follows immediately from [3, Corollary 23.8] and [3, Proposition 4.2]. $\square $

With this in mind, we can state the tightness claim.

Proposition 6.9

Let $\gamma \in (0,\infty )$, $\delta =\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}$, and $\xi $ be defined as in Proposition 6.7. Suppose that A is as in (6.3) and B is maximally monotone and satisfies either of the following:

(i)
if $\alpha \in (0,1]$: $B=B_1$ with $R_{\gamma B_1}=\left[ \begin{matrix} \cos {\xi }&{}-\sin {\xi }\\ \sin {\xi }&{}\cos {\xi }\end{matrix}\right] $,
(ii)
$\alpha \in (1,\tfrac{2}{1+\delta })$: $B=B_2$ with $R_{\gamma B_2}=\left[ \begin{matrix} \cos {(\pi -\xi )}&{}\sin {(\pi -\xi )}\\ -\sin {(\pi -\xi )}&{}\cos {(\pi -\xi )}\end{matrix}\right] $.

Then the $z^k$ sequence for solving $0\in \gamma Ax+\gamma Bx$ using (4.3) converges exactly with the rate $|1-\alpha |+\alpha \delta $.

Proof

Case (i): Using the reflected resolvent $R_{\gamma A}$ in Proposition 6.7 and that $\alpha \in (0,1]$, we conclude that

$$\begin{aligned} z^{k+1}&= (1-\alpha )z^k+\alpha R_{\gamma A}R_{\gamma B}z^k\\&=(1-\alpha )z^k+\alpha \delta \begin{bmatrix} \cos {\xi }&\sin {\xi }\\ -\sin {\xi }&\cos {\xi } \end{bmatrix} \begin{bmatrix} \cos {\xi }&-\sin {\xi }\\ \sin {\xi }&\cos {\xi } \end{bmatrix}z^k\\&=(1-\alpha )z^k+\alpha \delta z^k\\&=|1-\alpha |z^k+\alpha \delta z^k \end{aligned}$$

Case (ii): Using the reflected resolvent $R_{\gamma A}$ in Proposition 6.7 and that $\alpha \ge 1$, we conclude that

$$\begin{aligned} z^{k+1}&= (1-\alpha )z^k+\alpha R_{\gamma A}R_{\gamma B}z^k\\&=(1-\alpha )z^k+\alpha \delta \begin{bmatrix} \cos {\xi }&\sin {\xi }\\ -\sin {\xi }&\cos {\xi } \end{bmatrix} \begin{bmatrix} \cos {(\pi -\xi )}&\sin {(\pi -\xi )}\\ -\sin {(\pi -\xi )}&\cos {(\pi -\xi )} \end{bmatrix}z^k\\&=(1-\alpha )z^k-\alpha \delta z^k\\&=-(|1-\alpha |z^k+\alpha \delta )z^k. \end{aligned}$$

In both cases, the convergence rate is exactly $|1-\alpha |+\alpha \sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}$. This completes the proof. $\square $

Remark 6.10

It can be shown that the maximally monotone operator $B_1$ that gives $R_{\gamma B_1}$ satisfies $B_1=\tfrac{1}{\gamma (1+\cos {\xi })} \left[ \begin{matrix} 0&{}-\sin {\xi }\\ \sin {\xi }&{}0 \end{matrix}\right] $ if $\xi \in [0,\pi )$ and $B_1=\partial \iota _0$ (that is, $B_1$ is the subdifferential operator of the indicator function $\iota _0$ of the origin) if $\xi =\pi $. Similarly, the maximally monotone operator $B_2$ that gives $R_{\gamma B_2}$ satisfies $B_2=\tfrac{1}{\gamma (1-\cos {\xi })} \left[ \begin{matrix} 0&{}-\sin {\xi }\\ \sin {\xi }&{}0 \end{matrix}\right] $ if $\xi \in (0,\pi ]$ and $B_2=0$ if $\xi =0$.

We have shown that the rate provided in Theorem 6.5 is tight for all feasible $\alpha $ and $\gamma $.

6.2 Comparison to other bounds

In Fig. 1, we have compared the linear convergence rate result in Theorem 6.5 to the convergence rate result in [24]. The comparison is made with optimal $\gamma $-parameters for both bounds. The result in [24] is provided in the standard Douglas–Rachford setting, i.e., with $\alpha =1/2$. By instead letting $\alpha =1$, this rate can be improved, see [8] (which shows an improved rate in the composite convex optimization case, but the same rate can be shown to hold also for monotone inclusion problems). Also this improved rate is added to the comparison in Fig. 1. We see that both rates that follow from [24] are suboptimal and worse than the rate bound in Theorem 6.5.

7 A strongly monotone and cocoercive

In this section, we consider the case where A is strongly monotone and cocoercive; that is, we assume the following.

Assumption 7.1

Suppose that:

(i)
The operators $A:~\mathcal {H}\rightarrow \mathcal {H}$ and $B:~\mathcal {H}\rightarrow 2^{\mathcal {H}}$ are maximally monotone.
(ii)
A is $\sigma $-strongly monotone and $\tfrac{1}{\beta }$-cocoercive.

The linear convergence result for Douglas–Rachford splitting will follow from the contraction factor of the reflected resolvent of A. The contraction factor is provided in the following theorem, which is proven in Appendix 8.

Theorem 7.2

Suppose that $A:~\mathcal {H}\rightarrow \mathcal {H}$ is a $\sigma $-strongly monotone and $\tfrac{1}{\beta }$-cocoercive operator. Then its reflected resolvent $R_{A} = 2J_A-\mathrm{{Id}}$ is contractive with factor $\sqrt{1-\tfrac{4\sigma }{1+2\sigma +\sigma \beta }}$.

When considering the reflected resolvent of $\gamma A$ where $\gamma \in (0,\infty )$, the $\gamma $-parameter can be chosen to optimize the contraction factor of $R_{\gamma A}$. The operator $\gamma A$ is $\gamma \sigma $-strongly monotone and $\tfrac{1}{\gamma \beta }$-cocoercive, so the optimal $\gamma >0$ minimizes $h(\gamma ):=1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }$. The gradient of h satisfies $\nabla h(\gamma ) = \tfrac{4\sigma (\beta \sigma \gamma ^2-1)}{(\beta \sigma \gamma ^2+2\sigma \gamma +1)^2}$, so the extreme points of h are given by $\gamma =\pm \tfrac{1}{\sqrt{\beta \sigma }}$. Since $\gamma >0$ and the gradient is negative for $\gamma \in (0,\tfrac{1}{\sqrt{\beta \sigma }})$ and positive for $\gamma >\tfrac{1}{\sqrt{\beta \sigma }}$, the parameter $\gamma =\tfrac{1}{\sqrt{\beta \sigma }}$ minimizes the contraction factor. The corresponding contraction factor is

$$\begin{aligned} \sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }} =\sqrt{1-\tfrac{2\sqrt{\sigma /\beta }}{1+\sqrt{\sigma }{\beta }}}=\sqrt{\tfrac{1-\sqrt{\sigma /\beta }}{1+\sqrt{\sigma /\beta }}}=\sqrt{\tfrac{\sqrt{\beta /\sigma }-1}{\sqrt{\beta /\sigma }+1}}. \end{aligned}$$

This is summarized in the following proposition.

Proposition 7.3

The parameter $\gamma \in (0,\infty )$ that optimizes the contraction factor for $R_{\gamma A}$ is $\gamma \!=\!\tfrac{1}{\sqrt{\beta \sigma }}$. The corresponding contraction factor is $\sqrt{\tfrac{\sqrt{\beta /\sigma }-1}{\sqrt{\beta /\sigma }+1}}$.

Now we are ready to state the linear convergence rate result for the Douglas–Rachford algorithm.

Theorem 7.4

Suppose that Assumption 7.1 holds and that the Douglas–Rachford algorithm (4.3) is applied to solve $0\in \gamma Ax+\gamma Bx$. Let$\delta = \sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }}$, then the algorithm converges at least with rate factor

$$\begin{aligned} |1-\alpha |+\alpha \delta \end{aligned}$$

(7.1)

for all $\alpha \in (0,\tfrac{2}{1+\delta })$. Optimizing this bound w.r.t. $\alpha $ and $\gamma $ gives $\alpha =1$ and $\gamma =\tfrac{1}{\sqrt{\beta \sigma }}$ and corresponding optimal rate bound $\sqrt{\tfrac{\sqrt{\beta /\sigma }-1}{\sqrt{\beta /\sigma }+1}}$.

Proof

It follows immediately from Theorem 7.2, Lemma 3.3, and Proposition 7.3 by noting that $\alpha =1$ minimizes (7.1). $\square $

7.1 Tightness

In this section, we provide a two-dimensional example that shows that the provided bounds are tight. We let A be the resolvent of a scaled rotation operator to achieve this. Let C be that scaled rotation operator, i.e.,

$$\begin{aligned} C=c\begin{bmatrix} \cos {\psi }&\quad -\sin {\psi }\\ \sin {\psi }&\quad \cos {\psi } \end{bmatrix} \end{aligned}$$

(7.2)

with $c\in (1,\infty )$ and $\psi \in [0,\tfrac{\pi }{2})$. We will let A satisfy $A=dJ_{C}$ for some $d\in (0,\infty )$; that is

$$\begin{aligned} A=d(C+I)^{-1}=\tfrac{d}{(1+c\cos {\psi })^2+c^2\sin ^2{\psi }} \begin{bmatrix} c\cos {\psi }+1&c\sin {\psi }\\ -c\sin {\psi }&c\cos {\psi }+1 \end{bmatrix}. \end{aligned}$$

(7.3)

In the following proposition, we state the strong monotonicity and cocoercivity properties of A.

Proposition 7.5

The operator A in (7.3) is $\tfrac{1+c\cos {\psi }}{d}$-cocoercive and strongly monotone with modulus $\tfrac{d(1+c\cos {\psi })}{1+2c\cos {\psi }+c^2}$.

Proof

The matrix C in (7.2) is $c\cos {\psi }$-strongly monotone (see Proposition 6.6), so $J_C$ is $(1+c\cos {\psi })$-cocoercive (see [3, Definition 4.4]) and the operator $A=d(I+C)^{-1}$ is $\tfrac{1+c\cos {\psi }}{d}$-cocoercive. Further, since C is monotone and c-Lipschitz continuous (see Proposition 6.6), the following holds (see Proposition 6.2):

$$\begin{aligned} 2\langle J_{C}x-J_Cy,x-y\rangle \ge \Vert x-y\Vert ^2+(1-c^2)\Vert J_Cx-J_Cy\Vert ^2. \end{aligned}$$

(7.4)

Since $J_C$ is $(1+c\cos {\psi })$-cocoercive, we have

$$\begin{aligned} \langle J_Cx-J_Cy,x-y\rangle \ge (1+c\cos {\psi })\Vert J_Cx-J_Cy\Vert ^2. \end{aligned}$$

(7.5)

We add (7.5) multiplied by $-\tfrac{1-c^2}{1+c\cos {\psi }}$ (which is positive since $c\in (1,\infty )$) to (7.4) to get

$$\begin{aligned} (2-\tfrac{1-c^2}{1+c\cos {\psi }})\langle J_{C}x-J_Cy,x-y\rangle \ge \Vert x-y\Vert ^2; \end{aligned}$$

that is, $J_C$ is $\sigma $-strongly monotone with

$$\begin{aligned} \sigma \frac{1}{2-\tfrac{1-c^2}{1+c\cos {\psi }}}=\frac{1+c\cos {\psi }}{2+2c\cos {\psi }-1+c^2}=\frac{1+c\cos {\psi }}{1+2c\cos {\psi }+c^2}, \end{aligned}$$

so A is strongly monotone with parameter $d\tfrac{1+c\cos {\psi }}{1+2c\cos {\psi }+c^2}$. This concludes the proof. $\square $

This shows that the assumptions needed for the linear convergence rate result in Theorem 7.4 hold. To prove the tightness claim, we need an expression for the reflected resolvent of A. This is easier expressed in the strong convexity modulus, which we define as $\sigma $ and the inverse cocoercivity constant, which we define as $\beta $, i.e.,

$$\begin{aligned} \sigma&=\tfrac{d(1+c\cos {\psi })}{1+2c\cos {\psi }+c^2},&\beta&=\tfrac{d}{1+c\cos {\psi }}. \end{aligned}$$

(7.6)

The following results is proven in Appendix 8.

Proposition 7.6

The reflected resolvent $R_{\gamma A}$ of $\gamma A$, where A is defined in (7.3) and $\gamma \in (0,\infty )$, is given by

$$\begin{aligned} R_{\gamma A}=\sqrt{1-\frac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }} \begin{bmatrix} \cos {\xi }&-\sin {\xi }\\ \sin {\xi }&\cos {\xi } \end{bmatrix} \end{aligned}$$

where $\sigma $ and $\beta $ are defined in (7.6), and $\xi $ satisfies $\xi =\arctan _2\left( \tfrac{2\gamma \sqrt{\sigma (\beta -\sigma )}}{1-\sigma \beta \gamma ^2}\right) $ with $\arctan _2$ defined in (6.4).

Based on this reflected resolvent, we can show that the rate bound in Theorem 7.4 is indeed tight. The proof of the following result is the same as the proof to Proposition 6.9.

Proposition 7.7

Let $\gamma \in (0,\infty )$, $\delta =\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }}$, and let $\xi $ be as defined in Proposition 7.6. Suppose that A is as in (7.3) and B is maximally monotone and satisfies either of the following:

(i)
if $\alpha \in (0,1]$: $B=B_1$ with $R_{\gamma B_1}= \left[ \begin{matrix} \cos {\xi }&{}\sin {\xi }\\ -\sin {\xi }&{}\cos {\xi } \end{matrix}\right] $,
(ii)
$\alpha \in (1,\tfrac{2}{1+\delta })$: $B=B_2$ with $R_{\gamma B_2}=\left[ \begin{matrix} \cos {(\pi -\xi )}&{}-\sin {(\pi -\xi )}\\ \sin {(\pi -\xi )}&{}\cos {(\pi -\xi )} \end{matrix}\right] $.

Then the $z^k$ sequence for solving $0\in \gamma Ax+\gamma Bx$ using (4.3) converges exactly with the rate $|1-\alpha |+\alpha \delta $.

So, we have shown that the rate in Theorem 7.4 is tight for all feasible algorithm parameters $\alpha $ and $\gamma $.

7.2 Comparison to other bounds

We have shown tight convergence rate estimates for Douglas–Rachford splitting when the monotone operator A is cocoercive and strongly monotone (Theorem 7.4). In Sect. 6, we showed tight estimates when A is Lipschitz and strongly monotone (Theorem 6.5). In [17], tight convergence rate estimates are proven for the case when A and B are subdifferential operators of proper closed and convex functions and A is strongly monotone and Lipschitz continuous (which in this case is equivalent to cocoercive). The class of problems considered in [17] is a subclass of the problems considered in this section, which in turn is a subclass of the problems considered in Sect. 6. The optimal rates for these classes of problems are shown in Fig. 2. By restricting the problem classes, the rate bounds get tighter. This is in contrast to the case in Sect. 5, where a convex optimization problem achieved the worst case estimate.

8 Conclusions

We have shown linear convergence rate bounds for Douglas–Rachford splitting for monotone inclusion problems with three different sets of assumptions. One setting was the one used by Lions and Mercier [24], for which we provided a tighter bound. We also stated linear convergence rate bounds under two other assumptions, for which no other linear rate bounds were previously available. In addition, we have shown that all our rate bounds are tight for, in two cases all feasible algorithm parameters, and in the remaining case many algorithm parameters.

References

Bauschke, H.H., Bello Cruz, J.Y., Nghia, T.T.A., Phan, H.M., Wang, X.: The rate of linear convergence of the Douglas–Rachford algorithm for subspaces is the cosine of the Friedrichs angle. J. Approx. Theory 185, 63–79 (2014)
Article MATH MathSciNet Google Scholar
Bauschke, H.H., BelloCruz, J.Y., Nghia, T.T.A., Phan, H.M., Wang, X.: Optimal rates of linear convergence of relaxed alternating projections and generalized Douglas-Rachford methods for two subspaces. Numer. Algorithms 73(1), 33–76 (2016)
Article MATH MathSciNet Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
Book MATH Google Scholar
Boley, D.: Local linear convergence of the alternating direction method of multipliers on quadratic or linear programs. SIAM J. Optim. 23(4), 2183–2207 (2013)
Article MATH MathSciNet Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: Bauschke, H.H., Burachik, R.S., Combettes, P.L., Elser, V., Luke, D.R., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering, volume 49 of Springer Optimization and Its Applications, pp. 185–212. Springer, New York (2011)
Google Scholar
Combettes, P.L., Yamada, I.: Compositions and convex combinations of averaged nonexpansive operators. J. Math. Anal. Appl. 425(1), 55–70 (2015)
Article MATH MathSciNet Google Scholar
Davis, D., Yin, W.: Faster convergence rates of relaxed Peaceman–Rachford and ADMM under regularity assumptions. http://arxiv.org/abs/1407.5210 (July) (2014)
Davis, D., Yin, W.: Convergence Rate Analysis of Several Splitting Schemes, pp. 115–163. Springer, Berlin (2016)
MATH Google Scholar
Demanet, L., Zhang, X.: Eventual linear convergence of the Douglas–Rachford iteration for basis pursuit. Math. Comput. 85, 209–238 (2016)
Article MATH MathSciNet Google Scholar
Deng, W., Yin, W.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66(3), 889–916 (2016)
Article MATH MathSciNet Google Scholar
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Article MATH MathSciNet Google Scholar
Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. PhD thesis, MIT (1989)
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Article MATH Google Scholar
Ghadimi, E., Teixeira, A., Shames, I., Johansson, M.: Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems. IEEE Trans. Autom. Control 60(3), 644–658 (2015)
Article MATH MathSciNet Google Scholar
Giselsson, P.: Tight linear convergence rate bounds for Douglas–Rachford splitting and ADMM. In: Proceedings of 54th Conference on Decision and Control, Osaka, Japan (2015)
Giselsson, P., Boyd, S.: Linear convergence and metric selection for Douglas–Rachford splitting and ADMM. IEEE Trans. Autom. Control 62(2), 532–544 (2017)
Article MATH MathSciNet Google Scholar
He, B., Yuan, X.: On the $o(1/n)$ convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50(2), 700–709 (2012)
Article MATH MathSciNet Google Scholar
Hesse, R., Luke, D.R.: Nonconvex notions of regularity and convergence of fundamental algorithms for feasibility problems. SIAM J. Optim. 23(4), 2397–2419 (2013)
Article MATH MathSciNet Google Scholar
Hesse, R., Luke, D.R., Neumann, P.: Alternating projections and Douglas–Rachford for sparse affine feasibility. IEEE Trans. Signal Process. 62(18), 4868–4881 (2014)
Article MathSciNet Google Scholar
Hong, M., Luo, Z.-Q.: On the linear convergence of the alternating direction method of multipliers. Math. Programm. 162(1), 165–199 (2016)
MATH MathSciNet Google Scholar
Iutzeler, F., Bianchi, P., Ciblat, P., Hachem, W.: Explicit convergence rate of a distributed alternating direction method of multipliers. IEEE Trans. Autom. Control 61(4), 892–904 (2016)
Article MATH MathSciNet Google Scholar
Krasnoselskii, M.A.: Two remarks on the method of successive approximations. Uspehi Mat. Nauk 10, 123–127 (1955)
MathSciNet Google Scholar
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Article MATH MathSciNet Google Scholar
Mann, W.R.: Mean value methods in iteration. Proc. Am. Math. Soc. 4, 506–510 (1953)
Article MATH MathSciNet Google Scholar
Minty, G.J.: Monotone (nonlinear) operators in Hilbert space. Duke Math. J. 29(3), 341–346 (1962). 09
Article MATH MathSciNet Google Scholar
Nishihara, R., Lessard, L., Recht, B., Packard, A., Jordan, M.: A general analysis of the convergence of ADMM. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 343–352 (2015)
Patrinos, P., Stella, L., Bemporad, A.: Douglas–Rachford splitting: Complexity estimates and accelerated variants. In: Proceedings of the 53rd IEEE Conference on Decision and Control, pp. 4234–4239, Los Angeles, CA, December (2014)
Peaceman, D.W., Rachford, H.H.: The numerical solution of parabolic and elliptic differential equations. J. Soc. Ind. Appl. Math. 3(1), 28–41 (1955)
Article MATH MathSciNet Google Scholar
Phan, H.M.: Linear convergence of the Douglas–Rachford method for two closed sets. Optimization 65(2), 369–385 (2016)
Article MATH MathSciNet Google Scholar
Raghunathan, A., Di Cairano, S.: ADMM for convex quadratic programs: Linear convergence and infeasibility detection. http://arxiv.org/abs/1411.7288
Setzer, S.: Split bregman algorithm, Douglas–Rachford splitting and frame shrinkage. In: Tai, X.-C., Mrken, K., Lysaker, M., Lie, K.-A. (eds.) Scale Space and Variational Methods in Computer Vision. Lecture Notes in Computer Science, vol. 5567, pp. 464–476. Springer, Berlin (2009)
Chapter Google Scholar
Shi, W., Ling, Q., Yuan, K., Wu, G., Yin, W.: On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Signal Process. 62(7), 1750–1761 (2014)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The author would like to thank Heinz Bauschke for suggesting the term negatively averaged operators.

Author information

Authors and Affiliations

Department of Automatic Control, Lund University, Box 118, , SE-22100, Lund, Sweden
Pontus Giselsson

Authors

Pontus Giselsson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pontus Giselsson.

Additional information

This project is financially supported by the Swedish Foundation for Strategic Research.

Appendices

Appendix A: Proofs to Lemmas in Section 3.1

1.1 Proof to Lemma 3.1

From the definition of cocoercivity, Definition 2.6, it follows directly that $\beta \mathrm{{Id}}+T$ is $\tfrac{1}{2\beta }$-cocoercive if and only if $\tfrac{1}{2\beta }(\beta \mathrm{{Id}}+T)$ is 1-cocoercive. This, in turn is equivalent to that $2\tfrac{1}{2\beta }(\beta \mathrm{{Id}}+T)-\mathrm{{Id}}=\tfrac{1}{\beta }T$ is nonexpansive [3, Proposition 4.2 and Definition 4.4]. Finally, from the definition of Lipschitz continuity, Definition 2.2, it follows directly that $\tfrac{1}{\beta } T$ is nonexpansive if and only if T is $\beta $-Lipschitz continuous. This concludes the proof.

1.2 Proof to Lemma 3.2

Let $T_1 = R-\tfrac{\beta }{2}\mathrm{{Id}}$. Then Lemma 3.1 states that $\tfrac{1}{\beta }$-cocoercivity of $T_1+\tfrac{\beta }{2}\mathrm{{Id}}=R$ is equivalent to $\tfrac{\beta }{2}$-Lipschitz continuity of $T_1=R-\tfrac{\beta }{2}\mathrm{{Id}}$. By definition of Lipschitz continuity, this is equivalent to that $T_1 = \tfrac{\beta }{2}T_2$ for some nonexpansive operator $T_2$. Therefore, $T=R+(1-\beta )\mathrm{{Id}}= T_1+(1-\tfrac{\beta }{2})\mathrm{{Id}}=\tfrac{\beta }{2}T_2+(1-\tfrac{\beta }{2})\mathrm{{Id}}$. Since $\beta \in (0,1)$, this is equivalent to that T is $\tfrac{\beta }{2}$-averaged. This concludes the proof.

1.3 Proof to Lemma 3.3

Let x and y be any points in $\mathcal {H}$. Then

$$\begin{aligned} \Vert Rx-Ry\Vert&=\Vert (1-\alpha )x+\alpha Tx-(1-\alpha )y-\alpha Ty\Vert \\&\le |1-\alpha |\Vert x-y\Vert +|\alpha |\Vert Tx-Ty\Vert \\&\le |1-\alpha |\Vert x-y\Vert +|\alpha |\delta \Vert x-y\Vert \\&= (|1-\alpha |+|\alpha |\delta )\Vert x-y\Vert . \end{aligned}$$

So R is $(|1-\alpha |\Vert x-y\Vert +|\alpha |\delta )$-Lipschitz continuous. The Lipschitz constant is less than 1 if $\alpha \in (0,\tfrac{2}{1+\delta })$. For such $\alpha $, R is contractive. Since $\alpha >0$, the contraction factor is $(|1-\alpha |+\alpha \delta )$. This concludes the proof.

Appendix B: Proofs to results in Section 5

1.1 Proof to Proposition 5.2

Since B is $\tfrac{1}{\beta }$-cocoercive, it satisfies

$$\begin{aligned} \langle Bu-Bv,u-v\rangle \ge \tfrac{1}{\beta }\Vert Bu-Bv\Vert ^2. \end{aligned}$$

Adding $\Vert u-v\Vert ^2$ to both sides gives

$$\begin{aligned} \langle (B+\mathrm{{Id}})u-(B+\mathrm{{Id}})v,u-v\rangle&\ge \Vert u-v\Vert ^2\\&\quad +\tfrac{1}{\beta }\Vert (B+\mathrm{{Id}})u-(B+\mathrm{{Id}})v-(u-v)\Vert ^2. \end{aligned}$$

Letting $x= (B+\mathrm{{Id}})u$ and $y= (B+\mathrm{{Id}})v$ implies that $u= J_{B}x$ and $v= J_By$. Therefore, we get the equivalent expression

$$\begin{aligned} \langle x-y,J_Bx-J_By\rangle \ge \Vert J_Bx-J_By\Vert ^2+\tfrac{1}{\beta }\Vert x-y-(J_Bx-J_By)\Vert ^2. \end{aligned}$$

Expansion of the second square gives

$$\begin{aligned}&\beta \langle x-y,J_Bx-J_By\rangle \\&\quad \ge \beta \Vert J_Bx-J_By\Vert ^2+\Vert x-y\Vert ^2-2\langle x-y,J_Bx-J_By\rangle +\Vert J_Bx-J_By\Vert ^2, \end{aligned}$$

or equivalently

$$\begin{aligned}&\quad (\beta +2)\langle x-y,J_Bx-J_By\rangle \ge \Vert x-y\Vert ^2+(\beta +1)\Vert J_Bx-J_By\Vert ^2\\ \Leftrightarrow&\quad \tfrac{\beta +2}{\beta +1}\langle x-y,J_Bx-J_By\rangle \ge \tfrac{1}{\beta +1}\Vert x-y\Vert ^2+\Vert J_Bx-J_By\Vert ^2\\ \Leftrightarrow&\quad (2(1-\tfrac{\beta }{2(1+\beta )}))\langle x-y,J_Bx-J_By\rangle \\&\quad \qquad \qquad \qquad \qquad \qquad \ge (1-2\tfrac{\beta }{2(\beta +1)})\Vert x-y\Vert ^2+\Vert J_Bx-J_By\Vert ^2. \end{aligned}$$

This is, by [3, Proposition 4.25], equivalent to that $J_B$ is $\tfrac{\beta }{2(\beta +1)}$-averaged. This concludes the proof.

1.2 Proof to Theorem 5.6

Since $R_{\gamma A}R_{\gamma B}$ is $\tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{\tfrac{1}{\gamma \sigma }+\gamma \beta +1}$-negatively averaged, see Proposition 5.5, theDouglas–Rachford iteration is defined by an $\alpha $-averaged $\tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{\tfrac{1}{\gamma \sigma }+\gamma \beta +1}$-negatively averaged operator. The rate in (5.1) follows directly from Proposition 3.9. The optimal parameters follow from Proposition 3.10. It shows that the rate factor is increasing in $\tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{\tfrac{1}{\gamma \sigma }+\gamma \beta +1}$, which in turn is increasing in $\tfrac{1}{\gamma \sigma }+\gamma \beta $. Therefore, this should be minimized to optimize the rate. The optimal $\gamma =\tfrac{1}{\sqrt{\beta \sigma }}$ gives negative averagedness factor $\tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{\tfrac{1}{\gamma \sigma }+\gamma \beta +1}=\tfrac{2\sqrt{\beta /\sigma }}{1+2\sqrt{\beta /\sigma }}$. Proposition 3.10 further gives that the optimal averagedness factor is

$$\begin{aligned} \alpha = \tfrac{1}{2-\tfrac{2\sqrt{\beta /\sigma }}{1+2\sqrt{\beta /\sigma }}}=\tfrac{1+2\sqrt{\beta /\sigma }}{2+2\sqrt{\beta /\sigma }}=\tfrac{1/2+\sqrt{\beta /\sigma }}{1+\sqrt{\beta /\sigma }} \end{aligned}$$

and that the optimal bound on the contraction factor is

$$\begin{aligned} \tfrac{\tfrac{2\sqrt{\beta /\sigma }}{2\sqrt{\beta /\sigma }+1}}{2-\tfrac{2\sqrt{\beta /\sigma }}{2\sqrt{\beta /\sigma }+1}}=\tfrac{2\sqrt{\beta /\sigma }}{2+2\sqrt{\beta /\sigma }}=\tfrac{\sqrt{\beta /\sigma }}{1+\sqrt{\beta /\sigma }}. \end{aligned}$$

This concludes the proof.

1.3 Proof to Proposition 5.7

The proximal and reflected proximal operators of f are trivially given by

$$\begin{aligned} {\mathrm{prox}}_{\gamma f}(y)=(\tfrac{1}{1+\gamma \beta }y_1,y_2), \quad R_{\gamma f}(y)=(\tfrac{1-\gamma \beta }{1+\gamma \beta }y_1,y_2). \end{aligned}$$

(B.1)

Linearity of the proximal operator and Moreau’s decomposition [3, Theorem 14.3] imply that the reflected resolvent of $f^*$ is given by

$$\begin{aligned} R_{\gamma f^*}&= 2{\mathrm{prox}}_{\gamma f^*}-\mathrm{{Id}}=2(\mathrm{{Id}}-\gamma {\mathrm{prox}}_{\gamma ^{-1}f}\circ (\gamma ^{-1}\mathrm{{Id}}))-\mathrm{{Id}}\\&=-(2{\mathrm{prox}}_{\gamma ^{-1}f}-\mathrm{{Id}}) \\&=-R_{\gamma ^{-1}f}. \end{aligned}$$

This gives the following Douglas–Rachford iteration:

$$\begin{aligned} z^{k+1}&=((1-\alpha )\mathrm{{Id}}+\alpha R_{\gamma f}R_{\gamma f^*})z^k\\&=(1-\alpha )z^k-\alpha R_{\gamma f}R_{\gamma ^{-1}f}z^k\\&=(1-\alpha )z^k-\alpha \left( \tfrac{(1-\gamma \beta )(1-\gamma ^{-1}\beta )}{(1+\gamma \beta )(1+\gamma \beta )}z_1^k,z_2^k\right) . \end{aligned}$$

Since we start at a point $z^0=(0,z_2^0)$, we will get $z_1^k=0$ for all $k\ge 1$, and the Douglas–Rachford iteration becomes

$$\begin{aligned} z^{k+1} = \left( 1-2\alpha \right) z^k \end{aligned}$$

with contraction factor given by $|1-2\alpha |$.

When $\alpha \in [c,1)$, the absolute value term in (5.1) is nonpositive since

$$\begin{aligned} (1-2\alpha +\alpha \tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{1+\tfrac{1}{\gamma \sigma }+\gamma \beta })&\le 0\Leftrightarrow & {} \alpha&\ge \tfrac{1}{2-\tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{1+\tfrac{1}{\gamma \sigma }+\gamma \beta }}=\tfrac{1+\tfrac{1}{\gamma \sigma }+\gamma \beta }{2+\tfrac{1}{\gamma \sigma }+\gamma \beta }=c. \end{aligned}$$

Therefore, for such $\alpha $, the rate in (5.1) is $|1-2\alpha |$. This coincides with the rate for the provided example for any $\gamma >0$, and the proof is completed.

Appendix C: Proofs to results in Section 6

1.1 Proof to Proposition 6.2

$\beta $-Lipschitz continuity of A implies that $\beta \mathrm{{Id}}+A$ is $\tfrac{1}{2\beta }$-cocoercive, see Lemma 3.1. That is

$$\begin{aligned} \langle (\beta \mathrm{{Id}}&+A)u-(\beta \mathrm{{Id}}+A)v,u-v\rangle \ge \tfrac{1}{2\beta }\Vert (\beta \mathrm{{Id}}+A)u-(\beta \mathrm{{Id}}+A)v\Vert ^2. \end{aligned}$$

Using $\beta \mathrm{{Id}}= \mathrm{{Id}}+(\beta -1)\mathrm{{Id}}$, this is equivalent to that

$$\begin{aligned} \langle (\mathrm{{Id}}+A)u-(\mathrm{{Id}}+A)v,u-v\rangle \ge&\tfrac{1}{2\beta }\Vert (\mathrm{{Id}}+A)u-(\mathrm{{Id}}+A)v+(\beta -1)(u-v)\Vert ^2\\&\quad +\;(1-\beta )\Vert u-v\Vert ^2. \end{aligned}$$

Using that $x=(\mathrm{{Id}}+A)u$ if and only if $u= (\mathrm{{Id}}+A)^{-1}x$ and $y=(\mathrm{{Id}}+A)v$ if and only if $v= (\mathrm{{Id}}+A)^{-1}y$ (that hold by definition of the inverse and single-valuedness), this is equivalent to

$$\begin{aligned}&\langle x-y,(\mathrm{{Id}}+A)^{-1}x-(\mathrm{{Id}}+A)^{-1}y\rangle \\&\quad \ge \tfrac{1}{2\beta }\Vert x-y+(\beta -1)((\mathrm{{Id}}+A)^{-1}x-(\mathrm{{Id}}+A)^{-1}y)\Vert ^2\\&\qquad +\;(1-\beta )\Vert (\mathrm{{Id}}+A)^{-1}x-(\mathrm{{Id}}+A)^{-1}y\Vert ^2. \end{aligned}$$

Identifying the resolvent $J_A=(\mathrm{{Id}}+A)^{-1}$ and expanding the first square give:

$$\begin{aligned} \langle&J_Ax-J_Ay,x-y\rangle \\&\ge \tfrac{1}{2\beta }\Vert x-y+(\beta -1)(J_Ax-J_Ay)\Vert ^2+(1-\beta )\Vert J_Ax-J_Ay\Vert ^2\\&=\tfrac{1}{2\beta }\left( \Vert x-y\Vert ^2+2(\beta -1)\langle J_Ax-J_Ay,x-y\rangle +(\beta -1)^2\Vert J_Ax-J_Ay\Vert ^2\right) \\&\quad +\;(1-\beta )\Vert J_Ax-J_Ay\Vert ^2 \end{aligned}$$

By rearranging the terms, we conclude that

$$\begin{aligned} (1+\tfrac{1-\beta }{\beta })\langle J_Ax-J_Ay,x-y\rangle&\ge \tfrac{1}{2\beta }\left( \Vert x-y\Vert ^2+(\beta -1)^2\Vert J_Ax-J_Ay\Vert ^2\right) \\&\quad +\;(1-\beta )\Vert J_Ax-J_Ay\Vert ^2\\&=\tfrac{1}{2\beta }\Vert x-y\Vert ^2+\tfrac{(\beta -1)^2+2\beta (1-\beta )}{2\beta }\Vert J_Ax-J_Ay\Vert ^2\\&=\tfrac{1}{2\beta }\Vert x-y\Vert ^2+\tfrac{1-\beta ^2}{2\beta }\Vert J_Ax-J_Ay\Vert ^2. \end{aligned}$$

The result follows by multiplying by $2\beta $, since $1+\tfrac{1-\beta }{\beta }=\tfrac{1}{\beta }$. This concludes the proof.

1.2 Proof to Theorem 6.3

We divide the proof into two cases, $\beta \ge 1$ and $\beta \le 1$.

1.3 Case $\beta \ge 1$

From [3, Proposition 23.11], we get that $J_A$ is $(1+\sigma )$-cocoercive, i.e., that

$$\begin{aligned} \langle J_Ax-J_Ay,x-y\rangle \ge (1+\sigma )\Vert J_Ax-J_Ay\Vert ^2. \end{aligned}$$

(C.1)

Adding $(\beta ^2-1)(\ge 0)$ of (C.1) to $(1+\sigma )$ of (6.1), we get

$$\begin{aligned} (2(1+\sigma )+(\beta ^2-1))\langle J_Ax-&J_Ay,x-y\rangle \ge (1+\sigma )\Vert x-y\Vert ^2 \end{aligned}$$

or equivalently

$$\begin{aligned} \langle J_Ax-&J_Ay,x-y\rangle \ge \tfrac{1+\sigma }{1+2\sigma +\beta ^2}\Vert x-y\Vert ^2 \end{aligned}$$

(C.2)

since the $\Vert J_Ax-J_Ay\Vert $ terms cancel. We get

$$\begin{aligned} \Vert R_Ax-R_Ay\Vert ^2&= \Vert 2J_Ax-2J_Ay-(x-y)\Vert ^2\nonumber \\&=4\Vert J_Ax-J_Ay\Vert ^2-4\langle J_Ax-J_Ay,x-y\rangle +\Vert x-y\Vert ^2\nonumber \\&\le 4(\tfrac{1}{1+\sigma }-1)\langle J_Ax-J_Ay,x-y\rangle +\Vert x-y\Vert ^2\nonumber \\&= -\tfrac{4\sigma }{1+\sigma }\langle J_Ax-J_Ay,x-y\rangle +\Vert x-y\Vert ^2\nonumber \\&\le -\tfrac{4\sigma }{1+\sigma }\tfrac{1+\sigma }{1+2\sigma +\beta ^2}\Vert x-y\Vert ^2+\Vert x-y\Vert ^2\nonumber \\&=(1-\tfrac{4\sigma }{1+2\sigma +\beta ^2})\Vert x-y\Vert ^2 \end{aligned}$$

(C.3)

where (C.1) and (C.2) are used in the inequalities. Thus, the said result holds for $\beta \ge 1$.

1.4 Case $\beta \le 1$

To prove the result for $\beta \le 1$, we define the set $\mathcal {R}$ of pairs of points $(x,y)\in \mathcal {H}\times \mathcal {H}$ as follows:

$$\begin{aligned} \mathcal {R} = \left\{ (x,y)~|~\langle J_Ax-J_Ay,x-y\rangle \ge \tfrac{1+\sigma }{1+2\sigma +\beta ^2}\Vert x-y\Vert ^2\right\} . \end{aligned}$$

(C.4)

We also define the closure of the remaining pairs of points $\mathcal {R}_c=\overline{(\mathcal {H}\times \mathcal {H})\backslash \mathcal {R}}$, i.e.,

$$\begin{aligned} \mathcal {R}_c = \left\{ (x,y)~|~\langle J_Ax-J_Ay,x-y\rangle \le \tfrac{1+\sigma }{1+2\sigma +\beta ^2}\Vert x-y\Vert ^2\right\} . \end{aligned}$$

(C.5)

Obviously, $\mathcal {H}\times \mathcal {H}\subseteq \mathcal {R}+\mathcal {R}_c$ which implies that the contraction factor of the resolvent is the worst case contraction factor for $\mathcal {R}$ and $\mathcal {R}_c$. We first show the contraction factor for $\mathcal {R}$. Since (C.2) is the definition of the set $\mathcal {R}$ in (C.4), the contraction factor for $(x,y)\in \mathcal {R}$ is shown exactly as in (C.3). For $(x,y)\in \mathcal {R}_c$, we have

$$\begin{aligned} \Vert R_Ax-R_Ay\Vert ^2&= \Vert 2J_Ax-2J_Ay-(x-y)\Vert ^2\\&=4\Vert J_Ax-J_Ay\Vert ^2-4\langle J_Ax-J_Ay,x-y\rangle +\Vert x-y\Vert ^2\\&\le \tfrac{4}{1-\beta ^2}\left( 2\langle J_Ax-J_Ay,x-y\rangle -\Vert x-y\Vert ^2\right) \\&\quad -\;4\langle J_Ax-J_Ay,x-y\rangle +\Vert x-y\Vert ^2\\&=4\left( \tfrac{2}{1-\beta ^2}-1\right) \langle J_Ax-J_Ay,x-y\rangle +\left( 1-\tfrac{4}{1-\beta ^2}\right) \Vert x-y\Vert ^2\\&\le \left( 1-\tfrac{4}{1-\beta ^2}+4\tfrac{1+\beta ^2}{1-\beta ^2}\tfrac{1+\sigma }{1+2\sigma +\beta ^2}\right) \Vert x-y\Vert ^2\\&=\left( 1-\tfrac{4(1+2\sigma +\beta ^2)-4(1+\beta ^2)(1+\sigma )}{(1-\beta ^2)(1+2\sigma +\beta ^2)}\right) \Vert x-y\Vert ^2\\&=\left( 1-\tfrac{4+8\sigma +4\beta ^2-(4+4\beta ^2+4\sigma +4\sigma \beta ^2)}{(1-\beta ^2)(1+2\sigma +\beta ^2)}\right) \Vert x-y\Vert ^2\\&=\left( 1-\tfrac{4\sigma (1-\beta ^2)}{(1-\beta ^2)(1+2\sigma +\beta ^2)}\right) \Vert x-y\Vert ^2\\&=\left( 1-\tfrac{4\sigma }{1+2\sigma +\beta ^2}\right) \Vert x-y\Vert ^2 \end{aligned}$$

where (6.1) is used in the first inequality and the definition of $\mathcal {R}_c$ in (C.5) in the second; that is, the worst case contraction factor is $\sqrt{1-\tfrac{4\sigma }{1+2\sigma +\beta ^2}}$ also for $\beta \le 1$.

It remains to show that the contraction factor is in the interval [0, 1). We show that the square of the contraction factor is in [0, 1). We have $1-\tfrac{4\sigma }{1+2\sigma +\beta ^2} = \tfrac{1-2\sigma +\beta ^2}{1+2\sigma +\beta ^2}<1$. Further, since $\sigma \le \beta $, we have $1-2\sigma +\beta ^2\ge 1-2\sigma +\sigma ^2=(1-\sigma )^2\ge 0$. So the numerator is nonnegative and the denominator is positive, which gives a nonnegative contraction factor. This concludes the proof.

1.5 Proof to Proposition 6.7

First, we compute the resolvent $J_{\gamma A}$. It satisfies

$$\begin{aligned} J_{\gamma A}&= (I+\gamma A)^{-1}= \begin{bmatrix} 1+\gamma d\cos {\psi }&-\gamma d\sin {\psi }\\ \gamma d\sin {\psi }&1+\gamma d\cos {\psi } \end{bmatrix}^{-1}\\&=\frac{1}{1+2\gamma d\cos {\psi }+(\gamma d)^2} \begin{bmatrix} 1+\gamma d\cos {\psi }&\gamma d\sin {\psi }\\ -\gamma d\sin {\psi }&1+\gamma d\cos {\psi } \end{bmatrix}\\&=\frac{1}{1+2\gamma \sigma +(\gamma \beta )^2} \begin{bmatrix} 1+\gamma \sigma&\gamma \beta \sin {\psi }\\ -\gamma \beta \sin {\psi }&1+\gamma \sigma \end{bmatrix} \end{aligned}$$

The reflected resolvent is

$$\begin{aligned} R_{\gamma A}&=2J_{\gamma A}-I\\&=\frac{2}{1+2\gamma \sigma +(\gamma \beta )^2} \begin{bmatrix} 1+\gamma \sigma -\tfrac{1+2\gamma \sigma +(\gamma \beta )^2}{2}&\gamma \beta \sin {\psi }\\ -\gamma \beta \sin {\psi }&1+\gamma \sigma -\tfrac{1+2\gamma \sigma +(\gamma \beta )^2}{2} \end{bmatrix}\\&=\frac{2}{1+2\gamma \sigma +(\gamma \beta )^2} \begin{bmatrix} \tfrac{1}{2}(1-(\gamma \beta )^2)&\gamma \beta \sin {\psi }\\ -\gamma \beta \sin {\psi }&\tfrac{1}{2}(1-(\gamma \beta )^2) \end{bmatrix} \end{aligned}$$

where we have used

$$\begin{aligned} (1+\gamma \sigma )^2+(\gamma \beta )^2\sin ^2{\psi }=(1+\gamma \beta \cos {\psi })^2+(\gamma \beta )^2\sin ^2{\psi }=1+2\gamma \sigma +(\gamma \beta )^2. \end{aligned}$$

Since $\gamma \beta \sin {\psi }$ is nonnegative, this implies

$$\begin{aligned} \gamma \beta \sin {\psi }=\sqrt{1-2\gamma \sigma +(\gamma \beta )^2-(1+\gamma \sigma )^2} = \gamma \sqrt{\beta ^2-\sigma ^2}. \end{aligned}$$

Therefore, the reflected resolvent is

$$\begin{aligned} R_{\gamma A}&=\frac{2}{1+2\gamma \sigma +(\gamma \beta )^2} \begin{bmatrix} \tfrac{1}{2}(1-(\gamma \beta )^2)&\gamma \sqrt{\beta ^2-\sigma ^2}\\ -\gamma \sqrt{\beta ^2-\sigma ^2}&\tfrac{1}{2}(1-(\gamma \beta )^2) \end{bmatrix}. \end{aligned}$$

Now, let us introduce polar coordinates of the elements:

$$\begin{aligned} \delta (\cos {\xi },\sin {\xi }) = \left( \tfrac{1}{2}(1-(\gamma \beta )^2),\gamma \sqrt{\beta ^2-\sigma ^2}\right) , \end{aligned}$$

which gives reflected resolvent

$$\begin{aligned} R_{\gamma A}&=\frac{2\delta }{1+2\gamma \sigma +(\gamma \beta )^2} \begin{bmatrix} \cos {\xi }&\sin {\xi }\\ -\sin {\xi }&\cos {\xi } \end{bmatrix}. \end{aligned}$$

(C.6)

The angle $\xi $ in the polar coordinate satisfies

$$\begin{aligned} \tan {\xi } = \tfrac{2\gamma \sqrt{\beta ^2-\sigma ^2}}{1-(\gamma \beta )^2} \end{aligned}$$

and since the numerator is nonnegative, $\xi =\arctan _2\left( \tfrac{2\gamma \sqrt{\beta ^2-\sigma ^2}}{1-(\gamma \beta )^2}\right) $ where $\arctan _2$ is defined in (6.4). For the radius $\delta $ in the polar coordinate, we get

$$\begin{aligned} \delta ^2&=\delta ^2(\cos ^2{\psi }+\sin ^2{\psi })\\&=\tfrac{1}{4}(1-(\gamma \beta )^2)^2+\gamma ^2(\beta ^2-\sigma ^2)\\&=\tfrac{1}{4}(1-2(\gamma \beta )^2+(\gamma \beta )^4)+\gamma ^2(\beta ^2-\sigma ^2)\\&=\tfrac{1}{4}(1-2\gamma \sigma +(\gamma \beta )^2)(1+2\gamma \sigma +(\gamma \beta )^2) \end{aligned}$$

and (since $\delta >0$)

$$\begin{aligned} 2\delta&=\sqrt{(1-2\gamma \sigma +(\gamma \beta )^2)(1+2\gamma \sigma +(\gamma \beta )^2)} \end{aligned}$$

(C.7)

It remains to compute the factor in (C.6). Using (C.7), we get

$$\begin{aligned} \frac{2\delta }{1+2\gamma \sigma +(\gamma \beta )^2}&=\frac{\sqrt{(1-2\gamma \sigma +(\gamma \beta )^2)(1+2\gamma \sigma +(\gamma \beta )^2)}}{1+2\gamma \sigma +(\gamma \beta )^2}\\&=\sqrt{\frac{1-2\gamma \sigma +(\gamma \beta )^2}{1+2\gamma \sigma +(\gamma \beta )^2}}\\&=\sqrt{1-\frac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}} \end{aligned}$$

This completes the proof.

Appendix D: Proofs to results in Section 7

1.1 Proof to Theorem 7.2

We know from Lemma 3.6 and Definition 2.6 that $J_A$ is $(1+\sigma )$-cocoercive, i.e., that it satisfies

$$\begin{aligned} \langle J_Ax-J_Ay,x-y\rangle \ge (1+\sigma )\Vert J_Ax-J_Ay\Vert ^2 \end{aligned}$$

(D.1)

for all $x,y\in \mathcal {H}$. From Proposition 5.2, we know that $J_A$ is $\tfrac{\beta }{2(1+\beta )}$-averaged, i.e., that it satisfies (see [3, Proposition 4.25(iv)])

$$\begin{aligned} 2(1-\tfrac{\beta }{2(1+\beta )})\langle J_Ax-J_Ay,x-y\rangle \ge (1-\tfrac{\beta }{1+\beta })\Vert x-y\Vert ^2+\Vert J_Ax-J_Ay\Vert ^2 \end{aligned}$$

(D.2)

for all $x,y\in \mathcal {H}$. Let $\alpha =\tfrac{\beta }{2(1+\beta )}$ and $\delta =\tfrac{1}{1+\sigma }$ and define the set $\mathcal {R}$ of pairs of points $(x,y)\in \mathcal {H}\times \mathcal {H}$ as:

$$\begin{aligned} \mathcal {R}=\left\{ (x,y)~|~\langle J_Ax-J_Ay,x-y\rangle \ge \left( \tfrac{\delta }{2}+\tfrac{(1-\alpha -\delta /2)^2-\alpha ^2+(\delta /2)^2}{2(1-\alpha -\delta /2)}\right) \Vert x-y\Vert ^2\right\} . \end{aligned}$$

(D.3)

We also define the closure of set of remaining pairs of points $\mathcal {R}_c=\overline{(\mathcal {H}\times \mathcal {H})\backslash \mathcal {R}}$, i.e.,

$$\begin{aligned} \mathcal {R}_c = \left\{ (x,y)~|~\langle J_Ax-J_Ay,x-y\rangle \le (\tfrac{\delta }{2}+\tfrac{(1-\alpha -\delta /2)^2-\alpha ^2+(\delta /2)^2}{2(1-\alpha -\delta /2)})\Vert x-y\Vert ^2\right\} . \end{aligned}$$

(D.4)

Obviously, the contraction factor for $R_A$ is the worst case contraction factor for pairs of points in $\mathcal {R}$ and $\mathcal {R}_c$.

1.2 Contraction factor on $\mathcal {R}$

First, we provide a contraction factor for pairs of points in $\mathcal {R}$. Since $R_A=2J_A-\mathrm{{Id}}$, we have

$$\begin{aligned} \Vert R_Ax-R_Ay\Vert ^2&=4\Vert J_Ax-J_Ay\Vert ^2-4\langle J_Ax-J_Ay,x-y\rangle +\Vert x-y\Vert ^2\\&\le (4\delta -4)\langle J_Ax-J_Ay,x-y\rangle +\Vert x-y\Vert ^2\\&\le \left( 4(\delta -1)\left( \tfrac{\delta }{2}+\tfrac{(1-\alpha -\delta /2)^2-\alpha ^2+(\delta /2)^2}{2(1-\alpha -\delta /2)}\right) +1\right) \Vert x-y\Vert ^2 \end{aligned}$$

where $\delta =\tfrac{1}{1+\sigma }\in (0,1)$ and $\alpha =\tfrac{\beta }{2(1+\beta )}$ and the inequalities follow from (D.1) and the definition of $\mathcal {R}$ in (D.3).

1.3 Contraction factor on $\mathcal {R}_c$

Next, we provide a contraction factor for pairs of points in $\mathcal {R}_c$. Since $R_A=2J_A-\mathrm{{Id}}$, we conclude that

$$\begin{aligned}&\Vert R_Ax-R_Ay\Vert ^2\\&\quad =4\Vert J_Ax-J_Ay\Vert ^2-4\langle J_Ax-J_Ay,x-y\rangle +\Vert x-y\Vert ^2\\&\quad \le (4(2(1-\alpha )-1)\langle J_Ax-J_Ay,x-y\rangle +(1-4(1-2\alpha )))\Vert x-y\Vert ^2\\&\quad \le \left( 4(1-2\alpha )\left( \tfrac{\delta }{2}+\tfrac{(1-\alpha -\delta /2)^2-\alpha ^2+(\delta /2)^2}{2(1-\alpha -\delta /2)}\right) +1-4+8\alpha \right) \Vert x-y\Vert ^2, \end{aligned}$$

where we have used that $\alpha \in (0,\tfrac{1}{2})$, (D.2), and the definition of $\mathcal {R}_c$ in (D.4).

1.4 Contraction factor of $R_A$

Here, we show that the contraction factors on $\mathcal {R}$ and $\mathcal {R}_c$ are identical, and we simplify the expression to get a final contraction factor for the reflected resolvent $R_A$. That the contraction factors are identical is shown by verifying that the difference between is zero:

$$\begin{aligned}&4(1-2\alpha )\left( \tfrac{\delta }{2}+\tfrac{(1-\alpha -\delta /2)^2-\alpha ^2+(\delta /2)^2}{2(1-\alpha -\delta /2)}\right) +1-4+8\alpha \\&\qquad -4(\delta -1)\left( \tfrac{\delta }{2}+\tfrac{(1-\alpha -\delta /2)^2-\alpha ^2+(\delta /2)^2}{2(1-\alpha -\delta /2)}\right) -1\\&\quad =4(2-2\alpha -\delta )\left( \tfrac{\delta }{2}+\tfrac{(1-\alpha -\delta /2)^2-\alpha ^2+(\delta /2)^2}{2(1-\alpha -\delta /2)}\right) -4+8\alpha \\&\quad =2(2-2\alpha -\delta )\delta +4\left( (1-\alpha -\delta /2)^2-\alpha ^2+(\delta /2)^2\right) -4+8\alpha \\&\quad =2(2-2\alpha -\delta )\delta +4\left( (1-2\alpha -\delta +\alpha \delta +\alpha ^2+(\delta /2)^2-\alpha ^2+(\delta /2)^2\right) \\&\qquad -4+8\alpha \\&\quad =4\delta -4\alpha \delta -2\delta ^2+4-8\alpha -4\delta +4\alpha \delta +2\delta ^2-4+8\alpha =0. \end{aligned}$$

Next, we simplify this contraction factor by inserting $\delta =\tfrac{1}{1+\sigma }$ and $\alpha =\tfrac{\beta }{2(1+\beta )}$. We get

$$\begin{aligned}&4(\delta -1)\left( \tfrac{\delta }{2}+\tfrac{(1-\alpha -\delta /2)^2-\alpha ^2+(\delta /2)^2}{2(1-\alpha -\delta /2)}\right) +1\\&\quad =4(\tfrac{1}{1+\sigma }-1)\left( \tfrac{1}{2(1+\sigma )}+\tfrac{\left( 1-\tfrac{\beta }{2(1+\beta )}-\tfrac{1}{2(1+\sigma )}\right) ^2-\left( \tfrac{\beta }{2(1+\beta )}\right) ^2+\left( \tfrac{1}{2(1+\sigma )}\right) ^2}{2\left( 1-\tfrac{\beta }{2(1+\beta )}-\tfrac{1}{2(1+\sigma )}\right) }\right) +1\\&\quad =4(\tfrac{1}{1+\sigma }-1)\bigg (\tfrac{1}{2(1+\sigma )}\\&\qquad +\;\tfrac{1-\tfrac{\beta }{1+\beta }-\tfrac{1}{1+\sigma }+\tfrac{\beta }{2(1+\sigma )(1+\beta )}+\left( \tfrac{\beta }{2(1+\beta )}\right) ^2+\left( \tfrac{1}{2(1+\sigma )}\right) ^2-\left( \tfrac{\beta }{2(1+\beta )}\right) ^2+\left( \tfrac{1}{2(1+\sigma )}\right) ^2}{2\left( 1-\tfrac{\beta }{2(1+\beta )}-\tfrac{1}{2(1+\sigma )}\right) }\bigg )+1\\&\quad =4(\tfrac{1}{1+\sigma }-1)\left( \tfrac{1}{2(1+\sigma )}+\tfrac{1-\tfrac{\beta }{1+\beta }-\tfrac{1}{1+\sigma }+\tfrac{\beta }{2(1+\sigma )(1+\beta )}+\tfrac{1}{2(1+\sigma )^2}}{2\left( 1-\tfrac{\beta }{2(1+\beta )}-\tfrac{1}{2(1+\sigma )}\right) }\right) +1\\&\quad =4(\tfrac{1}{1+\sigma }-1)\left( \!\tfrac{2\left( 1-\tfrac{\beta }{2(1+\beta )}-\tfrac{1}{2(1+\sigma )}\right) +2(1+\sigma )\left( 1-\tfrac{\beta }{1+\beta }-\tfrac{1}{1+\sigma }+\tfrac{\beta }{2(1+\sigma )(1+\beta )}+\tfrac{1}{2(1+\sigma )^2}\right) }{4(1+\sigma )\left( 1-\tfrac{\beta }{2(1+\beta )}-\tfrac{1}{2(1+\sigma )}\right) }\!\right) \!\!+\!1\\&\quad =4(\tfrac{1}{1+\sigma }-1)\left( \tfrac{2\left( 1-\tfrac{\beta }{2(1+\beta )}-\tfrac{1}{2(1+\sigma )}\right) +2\left( (1+\sigma )-\tfrac{\beta (1+\sigma )}{1+\beta }-1+\tfrac{\beta }{2(1+\beta )}+\tfrac{1}{2(1+\sigma )}\right) }{4(1+\sigma )\left( 1-\tfrac{\beta }{2(1+\beta )}-\tfrac{1}{2(1+\sigma )}\right) }\right) +1\\&\quad =4(\tfrac{1}{1+\sigma }-1)\left( \tfrac{2+2\sigma -\tfrac{2\beta (\sigma +1)}{1+\beta }}{4(1+\sigma )\left( 1-\tfrac{\beta }{2(1+\beta )}-\tfrac{1}{2(1+\sigma )}\right) }\right) +1\\&\quad =4(\tfrac{1}{1+\sigma }-1)\left( \tfrac{2\beta +2+2\beta \sigma +2\sigma -2\beta (\sigma +1)}{4(1+\sigma )(1+\beta )\left( 1-\tfrac{\beta }{2(1+\beta )}-\tfrac{1}{2(1+\sigma )}\right) }\right) +1\\&\quad =4(\tfrac{1}{1+\sigma }-1)\left( \tfrac{2+2\sigma }{2\left( 2(1+\sigma )(1+\beta )-\beta (1+\sigma )-(1+\beta )\right) }\right) +1\\&\quad =4(\tfrac{1}{1+\sigma }-1)\left( \tfrac{2+2\sigma }{2(2+2\sigma +2\beta +2\beta \sigma -\beta -\beta \sigma -1-\beta )}\right) +1\\&\quad =4(\tfrac{1}{1+\sigma }-1)\left( \tfrac{2(1+\sigma )}{2(1+2\sigma +\beta \sigma )}\right) +1\\&\quad =4(\tfrac{-\sigma }{1+\sigma })\left( \tfrac{2+2\sigma }{2(1+2\sigma +\beta \sigma )}\right) +1\\&\quad =1-\tfrac{4\sigma }{1+2\sigma +\beta \sigma }. \end{aligned}$$

Taking the square root concludes the proof.

1.5 Proof to Proposition 7.6

We start by computing the resolvent and reflected resolvent of $\gamma A$. The resolvent of $\gamma A$ is given by

$$\begin{aligned} J_{\gamma A}&=(I+\gamma A)^{-1}\\&=\begin{bmatrix} \tfrac{\gamma d(1+c\cos {\psi })}{(1+c\cos {\psi })^2+c^2\sin ^2{\psi }}+1&\tfrac{\gamma dc\sin {\psi }}{(1+c\cos {\psi })^2+c^2\sin ^2{\psi }}\\ -\tfrac{\gamma dc\sin {\psi }}{(1+c\cos {\psi })^2+c^2\sin ^2{\psi }}&\tfrac{\gamma d(1+c\cos {\psi })}{(1+c\cos {\psi })^2+c^2\sin ^2{\psi }}+1 \end{bmatrix}^{-1}\\&=\begin{bmatrix} \tfrac{\gamma d(1+c\cos {\psi })}{1+2c\cos {\psi }+c^2}+1&\tfrac{\gamma dc\sin {\psi }}{1+2c\cos {\psi }+c^2}\\ -\tfrac{\gamma dc\sin {\psi }}{1+2c\cos {\psi }+c^2}&\tfrac{\gamma d(1+c\cos {\psi })}{1+2c\cos {\psi }+c^2}+1 \end{bmatrix}^{-1}\\&=\begin{bmatrix} \gamma \sigma +1&\tfrac{\gamma c\sin {\psi }\sigma }{1+c\cos {\psi }}\\ -\tfrac{\gamma dc\sin {\psi }\sigma }{1+c\cos {\psi }}&\gamma \sigma +1 \end{bmatrix}^{-1}\\&=\begin{bmatrix} \gamma \sigma +1&\gamma c\sin {\psi }\sigma \beta /d\\ -\gamma c\sin {\psi }\sigma \beta /d&\gamma \sigma +1 \end{bmatrix}^{-1}\\&=\tfrac{1}{(\gamma \sigma +1)^2+(\gamma c\sigma \beta \sin {\psi }/d)^2} \begin{bmatrix} \gamma \sigma +1&-\gamma c \sigma \beta \sin {\psi }/d\\ \gamma c \sigma \beta \sin {\psi }/d&\gamma \sigma +1 \end{bmatrix} \end{aligned}$$

where $\sigma $ and $\beta $ are defined in (7.6). The reflected resolvent $R_{\gamma A}$ is given by

$$\begin{aligned} R_{\gamma A}&=2J_{\gamma A}-I\\&=\tfrac{2}{(\gamma \sigma +1)^2+(\gamma c\sigma \beta \sin {\psi }/d)^2}\\&\quad \times \begin{bmatrix} \gamma \sigma +1-\tfrac{(\gamma \sigma +1)^2+(\gamma c\sigma \beta \sin {\psi }/d)^2}{2}&-\gamma c \sigma \beta \sin {\psi }/d\\ \gamma c \sigma \beta \sin {\psi }/d&\gamma \sigma +1-\tfrac{(\gamma \sigma +1)^2+(\gamma c\sigma \beta \sin {\psi }/d)^2}{2} \end{bmatrix}. \end{aligned}$$

To simplify this expression, we note that

$$\begin{aligned} \beta /\sigma&=\tfrac{d(1+2c\cos {\psi }+c^2)}{d(1+c\cos {\phi })(1+c\cos {\psi })}=\tfrac{(1+2c\cos {\psi }+c^2\pm c^2\cos ^2{\psi })}{(1+2c\cos {\phi }+c^2\cos ^2{\psi })}=\\&=1+\tfrac{c^2(1-\cos ^2{\psi })}{(1+c\cos {\psi })^2}=1+\tfrac{c^2\sin ^2{\psi }}{(1+c\cos {\psi })^2}=1+\tfrac{\beta ^2 c^2\sin ^2{\psi }}{d^2}. \end{aligned}$$

This implies that

$$\begin{aligned} (\gamma \sigma +1)^2+(\gamma c\sigma \beta \sin {\psi }/d)^2&=1+2\gamma \sigma +(\gamma \sigma )^2(1+c^2\beta ^2\sin ^2{\psi }/d^2)\nonumber \\&=1+2\gamma \sigma +\sigma \beta \gamma ^2. \end{aligned}$$

(D.5)

Using this equality, we can simplify the reflected resolvent expression:

$$\begin{aligned} R_{\gamma A}&=\tfrac{2}{1+2\gamma \sigma +\sigma \beta \gamma ^2} \begin{bmatrix} \tfrac{1}{2}(1-\beta \sigma \gamma ^2)&-\gamma \sqrt{\sigma (\beta -\sigma )}\\ \gamma \sqrt{\sigma (\beta -\sigma )}&\tfrac{1}{2}(1-\beta \sigma \gamma ^2) \end{bmatrix}, \end{aligned}$$

since $\gamma c\sigma \beta \sin {\psi }/d>0$. Now, let us write the matrix elements using polar coordinates:

$$\begin{aligned} \delta (\cos {\xi },\sin {\xi })=\left( \tfrac{1}{2}(1-\sigma \beta \gamma ^2),\gamma \sqrt{\sigma (\beta -\sigma )}\right) . \end{aligned}$$

This gives the reflected resolvent:

$$\begin{aligned} R_{\gamma A}=\tfrac{2\delta }{1+2\gamma \sigma +\sigma \beta \gamma ^2}\begin{bmatrix} \cos {\xi }&-\sin {\xi }\\ \sin {\xi }&\cos {\xi } \end{bmatrix}. \end{aligned}$$

(D.6)

The angle $\xi $ in the polar coordinates satisfies

$$\begin{aligned} \tan {\xi } = \tfrac{2\gamma \sqrt{\sigma (\beta -\sigma )}}{1-\sigma \beta \gamma ^2}. \end{aligned}$$

The numerator is always nonnegative, so $\xi $ is given by $\xi \!=\!\arctan _2\!\left( \!\tfrac{2\gamma \sqrt{\sigma (\beta -\sigma )}}{1-\sigma \beta \gamma ^2}\!\right) $ with $\arctan _2$ defined in (6.4). The radius $\delta $ in the polar coordinates satisfies

$$\begin{aligned} \delta ^2&=\delta ^2(\cos ^2{\xi }+\sin ^2{\xi })\\&=(\tfrac{1-\sigma \beta \gamma ^2}{2})^2+\gamma ^2\sigma (\beta -\sigma )\\&=\tfrac{1}{4}-\tfrac{\sigma \beta \gamma ^2}{2}+\tfrac{(\sigma \beta \gamma ^2)^2}{4}+\sigma \beta \gamma ^2-(\gamma \sigma )^2\\&=\tfrac{1}{4}+\tfrac{\sigma \beta \gamma ^2}{2}+\tfrac{(\sigma \beta \gamma ^2)^2}{4}-(\gamma \sigma )^2\\&=\tfrac{1}{4}(1-2\gamma \sigma +\gamma ^2\sigma \beta )(1+2\gamma \sigma +\gamma ^2\sigma \beta ) \end{aligned}$$

and (since $\delta >0$)

$$\begin{aligned} 2\delta = \sqrt{(1-2\gamma \sigma +\gamma ^2\sigma \beta )(1+2\gamma \sigma +\gamma ^2\sigma \beta )}. \end{aligned}$$

(D.7)

It remains to compute the factor in (D.6). Using (D.7), we conclude

$$\begin{aligned} \frac{2\delta }{1+2\gamma \sigma +\sigma \beta \gamma ^2}&=\frac{\sqrt{(1-2\gamma \sigma +\gamma ^2\sigma \beta )(1+2\gamma \sigma +\gamma ^2\sigma \beta )}}{1+2\gamma \sigma +\gamma ^2\sigma \beta }\\&=\sqrt{\frac{1-2\gamma \sigma +\gamma ^2\sigma \beta }{1+2\gamma \sigma +\gamma ^2\sigma \beta }}\\&=\sqrt{1-\frac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }}. \end{aligned}$$

This completes the proof.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Giselsson, P. Tight global linear convergence rate bounds for Douglas–Rachford splitting. J. Fixed Point Theory Appl. 19, 2241–2270 (2017). https://doi.org/10.1007/s11784-017-0417-1

Download citation

Published: 13 March 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s11784-017-0417-1

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Tight global linear convergence rate bounds for Douglas–Rachford splitting

Abstract

Similar content being viewed by others

Douglas–Rachford Splitting for the Sum of a Lipschitz Continuous and a Strongly Monotone Operator

Convergence of Two-Stage Method with Bregman Divergence for Solving Variational Inequalities*

Local Linear Convergence of an ADMM-Type Splitting Framework for Equality Constrained Optimization

1 Introduction

2 Background

2.1 Notation

2.2 Operator properties

Definition 2.1

Definition 2.2

Definition 2.3

Definition 2.4

Definition 2.5

Definition 2.6

3 Preliminaries

3.1 Useful lemmas

Lemma 3.1

Lemma 3.2

Lemma 3.3

Lemma 3.4

Lemma 3.5

Lemma 3.6

3.2 Negatively averaged operators

Definition 3.7

Definition 3.8

Proposition 3.9

Proof

Proposition 3.10

Proof

Remark 3.11

Proposition 3.12

Proof

Remark 3.13

4 Douglas–Rachford splitting

5 A strongly monotone and B cocoercive

Assumption 5.1

Proposition 5.2

Proposition 5.3

Proof

Proposition 5.4

Proof

Proposition 5.5

Proof

Theorem 5.6

5.1 Tightness

Proposition 5.7

5.2 Comparison to other bounds

6 A strongly monotone and Lipschitz continuous

Assumption 6.1

Proposition 6.2

Theorem 6.3

Proposition 6.4

Theorem 6.5

Proof

6.1 Tightness

Proposition 6.6

Proof

Proposition 6.7

Proposition 6.8

Proof

Proposition 6.9

Proof

Remark 6.10

6.2 Comparison to other bounds

7 A strongly monotone and cocoercive

Assumption 7.1

Theorem 7.2

Proposition 7.3

Theorem 7.4

Proof

7.1 Tightness

Proposition 7.5

Proof

Proposition 7.6

Proposition 7.7

7.2 Comparison to other bounds

8 Conclusions

References