1 Introduction

Douglas–Rachford splitting [12, 24] is an algorithm that solves monotone inclusion problems of the form

$$\begin{aligned} 0\in Ax+Bx \end{aligned}$$

where A and B are maximally monotone operators. A class of problems that falls under this category is composite convex optimization problems of the form

$$\begin{aligned} \mathrm{minimize}~&f(x)+g(x) \end{aligned}$$
(1.1)

where f and g are proper, closed, and convex functions. This holds since the subdifferential of proper, closed, and convex functions are maximally monotone operators, and since Fermat’s rule says that the optimality condition for solving (1.1) is \(0\in \partial f(x)+\partial g(x)\), under a suitable qualification condition. The algorithm has shown great potential in many applications such as signal processing [6], image denoising [32], and statistical estimation [5] (where the dual algorithm ADMM is discussed).

It has long been known that Douglas–Rachford splitting converges under quite mild assumptions, see [13, 14, 24]. However, the rate of convergence in the general case has just recently been shown to be \(O(1/\sqrt{k})\) for the fixed-point residual, [9, 10, 18]. For general maximal monotone operator problems, where one of the operators is strongly monotone and Lipschitz continuous, Lions and Mercier showed in [24] that the Douglas–Rachford algorithm enjoys a linear convergence rate. To the author’s knowledge, this was the sole linear convergence rate results for a long period of time for these methods. Recently, however, many works have shown linear convergence rates for Douglas–Rachford splitting and its dual version, ADMM, see [1, 2, 4, 8, 10, 11, 15,16,17, 19,20,21,22, 28, 30, 33]. The works in [4, 10, 19, 20, 30] concern local linear convergence under different assumptions. The works in [21, 22, 33] consider distributed formulations, while the works in [1, 2, 8, 11, 15,16,17, 24, 27, 28, 31] show global convergence rate bounds under various assumptions. Of these, the works in [1, 2, 16] show tight linear convergence rate bounds. The works in [1, 2] show tight convergence rate results for problem of finding a point in the intersection of two subspaces. In [16] it is shown that the linear convergence rate bounds in [17] (which are generalizations of the bounds in [15]) are tight for composite convex optimization problems where one function is strongly convex and smooth. All these results, except the one by Lions and Mercier, are stated in the convex optimization setting. In this paper, we will provide tight linear convergence rate bounds for monotone inclusion problems.

We consider three different sets of assumptions under which we provide linear convergence rate bounds. In all cases, the properties of Lipschitz continuity or cocoercivity, and strong monotonicity, are attributed to the operators. In the first case, we assume that one operator is strongly monotone and the other is cocoercive. In the second case, we assume that one operator is both strongly monotone and Lipschitz continuous. This is the setting considered by Lions and Mercier in [24], where a non-tight linear convergence rate bound is presented. In the third case, we assume that one operator is both strongly monotone and cocoercive. We show in all these settings that our bounds are tight, meaning that there exists problems from the respective classes that converge exactly with the provided rate bound. In the second and third cases, the rates are tight for all feasible algorithm parameters, while in the first case, the rate is tight for many algorithm parameters.

2 Background

In this section, we introduce some notations and define some operator and function properties.

2.1 Notation

We denote by \(\mathbb {R}\) the set of real numbers and by \(\overline{\mathbb {R}}:=\mathbb {R}\cup \{\infty \}\) the extended real line. Throughout this paper, \(\mathcal {H}\) denotes a separable real Hilbert space. Its inner product is denoted by \(\langle \cdot ,\cdot \rangle \), its induced norm by \(\Vert \cdot \Vert \). We denote by \(\{\phi _i\}_{i=1}^K\) any orthonormal basis in \(\mathcal {H}\), where K is the dimension of \(\mathcal {H}\) (possibly \(\infty \)). The gradient to \(f:~\mathcal {X}\rightarrow \mathbb {R}\) is denoted by \(\nabla f\) and the subdifferential operator to \(f:~\mathcal {X}\rightarrow \overline{\mathbb {R}}\) is denoted by \(\partial f\) and is defined as \(\partial f(x_1):=\{u~|~f(x_2)\ge f(x_1)+\langle u,x_2-x_1\rangle {\hbox { for all }} x_2\in \mathcal {X}\}\). The conjugate function of f is denoted and defined by \(f^{*}(y)\triangleq \sup _{x}\left\{ \langle y,x\rangle -f(x)\right\} \). The power set of a set \(\mathcal {X}\), i.e., the set of all subsets of \(\mathcal {X}\), is denoted by \(2^{\mathcal {X}}\). The graph of an (set-valued) operator \(A:~\mathcal {X}\rightarrow 2^{\mathcal {Y}}\) is defined and denoted by \(\mathrm{{gph}}A = \{(x,y)\in \mathcal {X}\times \mathcal {Y}~|~y\in Ax\}\). The inverse operator \(A^{-1}\) is defined through its graph by \(\mathrm{{gph}}A^{-1} = \{(y,x)\in \mathcal {Y}\times \mathcal {X}~|~y\in Ax\}\). The identity operator is denoted by \(\mathrm{{Id}}\) and the resolvent of a monotone operator A is defined and denoted by \(J_A=(\mathrm{{Id}}+A)^{-1}\). Finally, the class of closed, proper, and convex functions \(f:~\mathcal {H}\rightarrow \overline{\mathbb {R}}\) is denoted by \(\Gamma _0(\mathcal {H})\).

2.2 Operator properties

Definition 2.1

(Strong monotonicity) Let \(\sigma > 0\). An operator \(A:~\mathcal {H}\rightarrow 2^{\mathcal {H}}\) is \(\sigma \)-strongly monotone if

$$\begin{aligned} \langle u-v,x-y\rangle \ge \sigma \Vert x-y\Vert ^2 \end{aligned}$$

holds for all \((x,u)\in \mathrm{{gph}}(A)\) and \((y,v)\in \mathrm{{gph}}(A)\).

The operator is merely monotone if \(\sigma =0\) in the above definition. In the following three definitions, we state some properties for single-valued operators \(T:~\mathcal {H}\rightarrow \mathcal {H}\). We state the properties for operators with full domain, but they can also be stated for operators with any nonempty domain \(\mathcal {D}\subseteq \mathcal {H}\).

Definition 2.2

(Lipschitz continuity) Let \(\beta \ge 0\). A mapping \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\beta \)-Lipschitz continuous if

$$\begin{aligned} \Vert Tx-Ty\Vert \le \beta \Vert x-y\Vert \end{aligned}$$

holds for all \(x,y\in \mathcal {H}\).

Definition 2.3

(Nonexpansiveness) A mapping \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is nonexpansive if it is 1-Lipschitz continuous.

Definition 2.4

(Contractiveness) A mapping \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\delta \)-contractive if it is \(\delta \)-Lipschitz continuous with \(\delta \in [0,1)\).

Definition 2.5

(Averaged mappings) A mapping \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\alpha \)-averaged if there exists a nonexpansive mapping \(R:~\mathcal {H}\rightarrow \mathcal {H}\) and \(\alpha \in (0,1)\) such that \(T=(1-\alpha )\mathrm{{Id}}+\alpha R\).

From [3, Proposition 4.25], we know that an operator \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\alpha \)-averaged if and only if it satisfies

$$\begin{aligned} \tfrac{1-\alpha }{\alpha }\Vert (\mathrm{{Id}}-T)x-(\mathrm{{Id}}-T)y\Vert ^2+\Vert Tx-Ty\Vert ^2\le \Vert x-y\Vert ^2 \end{aligned}$$
(2.1)

for all \(x,y\in \mathcal {H}\).

Definition 2.6

(Cocoercivity) Let \(\beta > 0\). A mapping \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\tfrac{1}{\beta }\)-cocoercive if

$$\begin{aligned} \langle Tx-Ty,x-y\rangle \ge \tfrac{1}{\beta }\Vert Tx-Ty\Vert ^2 \end{aligned}$$

holds for all \(x,y\in \mathcal {H}\).

3 Preliminaries

In this section, we state and show preliminary results that are needed to prove the linear convergence rate bounds. We state some lemmas that describe how cocoercivity, Lipschitz continuity, as well as averagedness relate to each other. We also introduce negatively averaged operators, T, that are defined by that \(-T\) is averaged. We show different properties of such operators, including that averaged maps of negatively averaged operators are contractive. This result will be used to show linear convergence in the case where the strong monotonicity and Lipschitz continuity properties are split between the operators.

3.1 Useful lemmas

Proofs to the following three lemmas are found in Appendix 8.

Lemma 3.1

Assume that \(\beta >0\) and let \(T:~\mathcal {H}\rightarrow \mathcal {H}\). Then \(\tfrac{1}{2\beta }\)-cocoercivity of \(\beta \mathrm{{Id}}+T\) is equivalent to \(\beta \)-Lipschitz continuity of T.

Lemma 3.2

Assume that \(\beta \in (0,1)\). Then \(\tfrac{1}{\beta }\)-cocoercivity of \(R:~\mathcal {H}\rightarrow \mathcal {H}\) is equivalent to \(\tfrac{\beta }{2}\)-averagedness of \(T=R+(1-\beta )\mathrm{{Id}}\).

Lemma 3.3

Let \(T:~\mathcal {H}\rightarrow \mathcal {H}\) be \(\delta \)-contractive with \(\delta \in [0,1)\). Then \(R=(1-\alpha )\mathrm{{Id}}+\alpha T\) is contractive for all \(\alpha \in (0,\tfrac{2}{1+\delta })\). The contraction factor is \(|1-\alpha |+\alpha \delta \).

For easier reference, we also record special cases of some results in [3] that will be used later. Specifically, we record, in order, special cases of [3, Proposition 4.33], [3, Proposition 4.28], and [3, Proposition 23.11].

Lemma 3.4

Let \(\beta \in (0,1)\) and let \(T:~\mathcal {H}\rightarrow \mathcal {H}\) be \(\tfrac{1}{\beta }\)-cocoercive. Then \((\mathrm{{Id}}-T)\) is \(\tfrac{\beta }{2}\)-averaged.

Lemma 3.5

Let \(T:~\mathcal {H}\rightarrow \mathcal {H}\) be \(\alpha \)-averaged with \(\alpha \in (0,\tfrac{1}{2})\). Then \((2T-\mathrm{{Id}})\) is \(2\alpha \)-averaged.

Lemma 3.6

Let \(A:~\mathcal {H}\rightarrow 2^{\mathcal {H}}\) be maximally monotone and \(\sigma \)-strongly monotone with \(\sigma >0\). Then \(J_A=(\mathrm{{Id}}+A)^{-1}\) is \((1+\sigma )\)-cocoercive.

3.2 Negatively averaged operators

In this section we define negatively averaged operators and show various properties for these.

Definition 3.7

An operator \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\theta \)-negatively averaged with \(\theta \in (0,1)\) if \(-T\) is \(\theta \)-averaged.

This definition implies that an operator T is \(\theta \)-negatively averaged if and only if it satisfies

$$\begin{aligned} T = -((1-\theta )\mathrm{{Id}}+\theta \bar{R})=(\theta -1)\mathrm{{Id}}+\theta R \end{aligned}$$

where \(\bar{R}\) is nonexpansive and \(R:=-\bar{R}\) is, therefore, also nonexpansive. Since \(-T\) is averaged, it is also nonexpansive, and so is T.

Since negatively averaged operators are nonexpansive, they can be averaged.

Definition 3.8

An \(\alpha \)-averaged \(\theta \)-negatively averaged operator \(S:~\mathcal {H}\rightarrow \mathcal {H}\) is defined as \(S=(1-\alpha )\mathrm{{Id}}+\alpha T\) where \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\theta \)-negatively averaged.

Next, we show that averaged negatively averaged operators are contractive.

Proposition 3.9

An \(\alpha \)-averaged \(\theta \)-negatively averaged operator \(S:~\mathcal {H}\rightarrow \mathcal {H}\) is \(|1-2\alpha +\alpha \theta |+\alpha \theta \)-contractive.

Proof

Let \(T=(\theta -1)\mathrm{{Id}}+\theta R\) (for some nonexpansive R) be the \(\theta \)-negatively averaged operator, which implies that \(S=(1-\alpha )\mathrm{{Id}}+\alpha T\). Then

$$\begin{aligned} \Vert Sx-Sy\Vert&=\Vert ((1-\alpha )\mathrm{{Id}}+\alpha T)x-((1-\alpha )\mathrm{{Id}}+\alpha T)y\Vert \\&= \Vert (1-2\alpha +\alpha \theta )(x-y)+\alpha \theta (Rx-Ry))\Vert \\&\le |1-2\alpha +\alpha \theta |\Vert x-y\Vert +\alpha \theta \Vert x-y\Vert \\&= (|1-2\alpha +\alpha \theta |+\alpha \theta )\Vert x-y\Vert \end{aligned}$$

since R is nonexpansive. It is straightforward to verify that \(|1-2\theta +\alpha \theta |+\alpha \theta <1\) for all combinations of \(\alpha \in (0,1)\) and \(\theta \in (0,1)\). Hence, S is contractive and the proof is complete. \(\square \)

Next, we optimize the contraction factor w.r.t. \(\alpha \).

Proposition 3.10

Assume that \(T:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\theta \)-negatively averaged. Then the \(\alpha \) that optimizes the contraction factor for the \(\alpha \)-averaged \(\theta \)-negatively averaged operator \(S=(1-\alpha )\mathrm{{Id}}+\alpha T\) is \(\alpha = \tfrac{1}{2-\theta }\). The corresponding optimal contraction constant is \(\tfrac{\theta }{2-\theta }\).

Proof

Due to the absolute value, Proposition 3.9 states that the contraction factor \(\delta \) of T can be written as

$$\begin{aligned} \delta&={\left\{ \begin{array}{ll} 1-2\alpha +\alpha \theta +\alpha \theta &{} {\hbox {if }} \alpha \le \tfrac{1}{2-\theta }\\ -(1-2\alpha +\alpha \theta )+\alpha \theta &{} {\hbox {if }} \alpha \ge \tfrac{1}{2-\theta }\\ \end{array}\right. } ={\left\{ \begin{array}{ll} 1-2(1-\theta )\alpha &{} {\hbox {if }} \alpha \le \tfrac{1}{2-\theta }\\ 2\alpha -1 &{} {\hbox {if }} \alpha \ge \tfrac{1}{2-\theta } \end{array}\right. } \end{aligned}$$

where the kink in the absolute value term is at \(\alpha =\tfrac{1}{2-\theta }\). Since \(\theta \in (0,1)\), we get negative slope for \(\alpha \le \tfrac{1}{2-\theta }\) and positive slope for \(\alpha \ge \tfrac{1}{2-\theta }\). Therefore, the optimal \(\alpha \) is in the kink at \(\alpha =\tfrac{1}{2-\theta }\), which satisfies \(\alpha \in (\tfrac{1}{2},1)\) since \(\theta \in (0,1)\). Inserting this into the contraction factor expression gives \(\tfrac{\theta }{2-\theta }\). This concludes the proof. \(\square \)

Remark 3.11

The optimal contraction factor \(\tfrac{\theta }{2-\theta }\) is strictly increasing in \(\theta \) on the interval \(\theta \in (0,1)\). Therefore, the contraction factor becomes smaller the smaller \(\theta \) is.

We conclude this section by showing that the composition of an averaged and a negatively averaged operator is negatively averaged. Before we state the result, we need a characterization of \(\theta \)-negatively averaged operators T. This follows directly from the definition of averaged operators in (2.1) since \(-T\) is \(\theta \)-averaged:

$$\begin{aligned} \tfrac{1-\theta }{\theta }\Vert (\mathrm{{Id}}+T)x-(\mathrm{{Id}}+T)y\Vert ^2+\Vert Tx-Ty\Vert ^2\le \Vert x-y\Vert ^2. \end{aligned}$$
(3.1)

Proposition 3.12

Assume that \(T_{\theta }:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\theta \)-negatively averaged and \(T_{\alpha }:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\alpha \)-averaged. Then \(T_{\theta }T_{\alpha }\) is \(\tfrac{\kappa }{\kappa +1}\)-negatively averaged where \(\kappa =\tfrac{\theta }{1-\theta }+\tfrac{\alpha }{1-\alpha }\).

Proof

Let \(\kappa _{\theta }=\tfrac{\theta }{1-\theta }\) and \(\kappa _{\alpha }=\tfrac{\alpha }{1-\alpha }\), then \(\kappa = \kappa _{\theta }+\kappa _{\alpha }\). We have

$$\begin{aligned}&\Vert (\mathrm{{Id}}+T_{\theta }T_{\alpha })x-(\mathrm{{Id}}+T_{\theta } T_{\alpha })y\Vert ^2\nonumber \\&\quad =\Vert (x-y)-(T_{\alpha } x-T_{\alpha } y)+(T_{\alpha }x-T_{\alpha }y)+(T_{\theta }T_{\alpha }x-T_{\theta }T_{\alpha }y)\Vert ^2\nonumber \\&\quad =\Vert (\mathrm{{Id}}-T_{\alpha })x-(\mathrm{{Id}}-T_{\alpha })y+(\mathrm{{Id}}+T_{\theta })T_{\alpha }x-(\mathrm{{Id}}+T_{\theta })T_{\alpha }y\Vert ^2\nonumber \\&\quad \le \tfrac{\kappa _{\theta }+\kappa _{\alpha }}{\kappa _{\alpha }}\Vert (\mathrm{{Id}}-T_{\alpha })x-(\mathrm{{Id}}-T_{\alpha })y\Vert ^2\nonumber \\&\qquad +\;\tfrac{\kappa _{\theta }+\kappa _{\alpha }}{\kappa _{\theta }}\Vert (\mathrm{{Id}}+T_{\theta })T_{\alpha }x-(\mathrm{{Id}}+T_{\theta })T_{\alpha }y\Vert ^2\nonumber \\&\quad \le (\kappa _{\theta }+\kappa _{\alpha }) (\Vert x-y\Vert ^2-\Vert T_{\alpha }x-T_{\alpha }y\Vert ^2)\nonumber \\&\qquad +\;(\kappa _{\theta }+\kappa _{\alpha }) (\Vert T_{\alpha }x-T_{\alpha }y\Vert ^2-\Vert T_{\theta }T_{\alpha }x-T_{\theta }T_{\alpha }y\Vert ^2)\nonumber \\&\quad =(\kappa _{\theta }+\kappa _{\alpha })(\Vert x-y\Vert ^2-\Vert T_{\theta }T_{\alpha }x-T_{\theta }T_{\alpha }y\Vert ^2)\nonumber \\&\quad =\kappa (\Vert x-y\Vert ^2-\Vert T_{\theta }T_{\alpha }x-T_{\theta }T_{\alpha }y\Vert ^2) \end{aligned}$$
(3.2)

where the first inequality follows from convexity of \(\Vert \cdot \Vert ^2\). More precisely, let \(t\in [0,1]\), then, by convexity of \(\Vert \cdot \Vert ^2\), we conclude that

$$\begin{aligned} \Vert x+y\Vert ^2&=\Vert t\tfrac{1}{t}x+(1-t)\tfrac{1}{1-t}y\Vert ^2\le t\Vert \tfrac{1}{t}x\Vert ^2+(1-t)\tfrac{1}{1-t}\Vert y\Vert ^2\\&=\tfrac{1}{t}\Vert x\Vert ^2+\tfrac{1}{1-t}\Vert y\Vert ^2. \end{aligned}$$

Letting \(t=\tfrac{\kappa _{\alpha }}{\kappa _{\theta }+\kappa _{\alpha }}\in [0,1]\), which implies that \(1-t=\tfrac{\kappa _{\theta }}{\kappa _{\theta }+\kappa _{\alpha }}\in [0,1]\), gives the first inequality in (3.2). The second inequality in (3.2) follows from (2.1) and (3.1). The relation in (3.2) coincides with the definition of negative averagedness in (3.1). Thus, \(T_{\theta }T_{\alpha }\) is \(\phi \)-negatively averaged with \(\phi \) satisfying \(\tfrac{1-\phi }{\phi }=\tfrac{1}{\kappa }\). This gives \(\phi = \tfrac{\kappa }{\kappa +1}\) and the proof is complete. \(\square \)

Remark 3.13

This result can readily be extended to show averagedness of \(T=T_1T_2\cdots T_N\) where \(T_i\) are \(\alpha _i\)-(negatively) averaged for \(i=1,\ldots ,N\). We get that T is \(\tfrac{\kappa }{1+\kappa }\)-negatively averaged with \(\kappa =\sum _{i=1}^N\tfrac{\alpha _i}{1-\alpha _i}\) if the number of negatively averaged \(T_i\):s is odd, and that T is \(\tfrac{\kappa }{1+\kappa }\)-averaged if the number of negatively averaged \(T_i\):s is even. Similar results have been presented, e.g., in [3, Proposition 4.32] which is improved in [7]. Our result extends these results in that it allows also for negatively averaged operators and reduces to the result in [7] for averaged operators.

4 Douglas–Rachford splitting

Douglas–Rachford splitting can be applied to solve monotone inclusion problems of the form

$$\begin{aligned} 0\in Ax+Bx \end{aligned}$$
(4.1)

where \(A,B:~\mathcal {H}\rightarrow 2^{\mathcal {H}}\) are maximally monotone operators. The algorithm separates A and B by only touching the corresponding resolvents, where the resolvent \(J_A:~\mathcal {H}\rightarrow \mathcal {H}\) is defined as

$$\begin{aligned} J_{A} := (A+\mathrm{{Id}})^{-1}. \end{aligned}$$

The resolvent has full domain since A is assumed maximally monotone, see [26] and [3, Proposition 23.7]. If \(A=\partial f\) where f is a proper, closed, and convex function, then \(J_A = {\mathrm{prox}}_{f}\) where the prox operator \({\mathrm{prox}}_{f}\) is defined as

$$\begin{aligned} {\mathrm{prox}}_{f}(z) = {{\mathrm{argmin}}}_x\left\{ f(x)+\tfrac{1}{2}\Vert x-z\Vert ^2\right\} . \end{aligned}$$
(4.2)

That this holds follows directly from Fermat’s rule [3, Theorem 16.2] applied to the proximal operator definition.

The Douglas–Rachford algorithm is defined by the iteration

$$\begin{aligned} z^{k+1} = ((1-\alpha )\mathrm{{Id}}+\alpha R_AR_B)z^{k} \end{aligned}$$
(4.3)

where \(\alpha \in (0,1)\) (we will see that also \(\alpha \ge 1\) can sometimes be used) and \(R_A:~\mathcal {H}\rightarrow \mathcal {H}\) is the reflected resolvent, which is defined as

$$\begin{aligned} R_A:=2J_A-\mathrm{{Id}}. \end{aligned}$$

(Note that what is traditionally called Douglas–Rachford splitting is when \(\alpha =1/2\) in (4.3). The case with \(\alpha =1\) in (4.3) is often referred to as the Peaceman–Rachford algorithm, see [29]. We will use the term Douglas–Rachford splitting for all feasible choices of \(\alpha \).)

Since the reflected resolvent is nonexpansive in the general case [3, Corollary 23.10], and since compositions of nonexpansive operators are nonexpansive, the Douglas–Rachford algorithm is an averaged iteration of a nonexpansive mapping when \(\alpha \in (0,1)\). Therefore, Douglas–Rachford splitting is a special case of the Krasnosel’skiĭ–Mann iteration [23, 25], which is known to converge to a fixed-point of the nonexpansive operator, in this case \(R_AR_B\), see [3, Theorem 5.14]. Since an \(x\in \mathcal {H}\) solves (4.1) if and only if \(x=J_Az\) where \(z=R_AR_Bz\), see [3, Proposition 25.1] this algorithm can be used to solve monotone inclusion problems of the form (4.1). Note that to solve (4.1) is equivalent to solving

$$\begin{aligned} 0\in \gamma Ax+\gamma Bx \end{aligned}$$

for any \(\gamma \in (0,\infty )\). Then we can define \(A_{\gamma } = \gamma A\) and (4.1) can also be solved by the iteration

$$\begin{aligned} z^{k+1} = ((1-\alpha )\mathrm{{Id}}+\alpha R_{A_\gamma }R_{B_\gamma })z^{k}. \end{aligned}$$
(4.4)

Therefore, \(\gamma \) is an algorithm parameter that affects the progress of the iterations.

The objective of this paper is to provide tight linear convergence rate bounds for the Douglas–Rachford algorithm under various assumptions. Using these bounds, we will show how to select the algorithm parameters \(\gamma \) and \(\alpha \) that optimize these bounds. The first setting we consider is when A is strongly monotone and B is cocoercive.

5 A strongly monotone and B cocoercive

In this section, we show linear convergence for Douglas–Rachford splitting in the case where A and B are maximally monotone, A is strongly monotone, and B is cocoercive; that is, we make the following assumptions.

Assumption 5.1

Suppose that:

  1. (i)

    \(A:~\mathcal {H}\rightarrow 2^{\mathcal {H}}\) is maximally monotone and \(\sigma \)-strongly monotone.

  2. (ii)

    \(B:~\mathcal {H}\rightarrow \mathcal {H}\) is maximally monotone and \(\tfrac{1}{\beta }\)-cocoercive.

Before we can state the main linear convergence result, we need to characterize the properties of the resolvent, the reflected resolvent, and the composition between reflected resolvents. This is done in the following series of propositions, this first of which is proven in Appendix 8.

Proposition 5.2

The resolvent \(J_B\) of a \(\tfrac{1}{\beta }\)-cocoercive operator \(B:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\tfrac{\beta }{2(1+\beta )}\)-averaged.

This implies that also the reflected resolvent is averaged.

Proposition 5.3

The reflected resolvent of a \(\tfrac{1}{\beta }\)-cocoercive operator \(B:~\mathcal {H}\rightarrow \mathcal {H}\) is \(\tfrac{\beta }{1+\beta }\)-averaged.

Proof

This follows directly from the Proposition 5.2 and Lemma 3.5. \(\square \)

If the operator instead is strongly monotone, the reflected resolvent is negatively averaged.

Proposition 5.4

The reflected resolvent of a \(\sigma \)-strongly monotone and maximal monotone operator \(A:~\mathcal {H}\rightarrow 2^\mathcal {H}\) is \(\tfrac{1}{1+\sigma }\)-negatively averaged.

Proof

From Lemma 3.6, we have that the resolvent \(J_A\) is \((1+\sigma )\)-cocoercive. Using Lemma 3.4, this implies that \(\mathrm{{Id}}-J_A\) is \(\tfrac{1}{2(1+\sigma )}\)-averaged. Then using Lemma 3.5, this implies that \(2(\mathrm{{Id}}-J_A)-\mathrm{{Id}}=\mathrm{{Id}}-2J_A=-R_A\) is \(\tfrac{1}{1+\sigma }\)-averaged, hence \(R_A\) is \(\tfrac{1}{1+\sigma }\)-negatively averaged. This completes the proof. \(\square \)

The composition of the reflected resolvents of a strongly monotone operator and a cocoercive operator is negatively averaged.

Proposition 5.5

Suppose that Assumption 5.1 holds. Then, the composition \(R_AR_B\) is \(\tfrac{\tfrac{1}{\sigma }+\beta }{1+\tfrac{1}{\sigma }+\beta }\)-negatively averaged.

Proof

Since \(R_A\) is \(\tfrac{1}{1+\sigma }\)-negatively averaged and \(R_B\) is \(\tfrac{\beta }{1+\beta }\)-averaged, see Propositions 5.3 and 5.4, we can apply Proposition 3.12. We get that \(\kappa = \tfrac{\tfrac{1}{1+\sigma }}{1-\tfrac{1}{1+\sigma }}+\tfrac{\tfrac{\beta }{1+\beta }}{1-\tfrac{\beta }{1+\beta }}=\tfrac{1}{\sigma }+\beta \) and that the averagedness parameter of the negatively averaged operator \(R_AR_B\) is given by \(\tfrac{\kappa }{\kappa +1}=\tfrac{\tfrac{1}{\sigma }+\beta }{\tfrac{1}{\sigma }+\beta +1}\). This concludes the proof. \(\square \)

With these results, we can now show the following linear convergence rate bounds for Douglas–Rachford splitting under Assumption 5.1. The theorem is proven in Appendix 8.

Theorem 5.6

Suppose that Assumption 5.1 holds, that \(\alpha \in (0,1)\), that \(\gamma \in (0,\infty )\), and that the Douglas–Rachford algorithm (4.3) is applied to solve \(0\in \gamma Ax+\gamma Bx\). Then the algorithm converges at least with rate factor

$$\begin{aligned} \left| 1-2\alpha +\alpha \tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{1+\tfrac{1}{\gamma \sigma }+\gamma \beta }\right| +\alpha \tfrac{\tfrac{1}{\gamma \sigma }+\gamma \beta }{1+\tfrac{1}{\gamma \sigma }+\gamma \beta }. \end{aligned}$$
(5.1)

Optimizing this rate bound w.r.t. \(\alpha \) and \(\gamma \) gives \(\gamma = \tfrac{1}{\sqrt{\beta \sigma }}\) and \(\alpha =\tfrac{\sqrt{\beta /\sigma }+1/2}{1+\sqrt{\beta /\sigma }}\). The corresponding optimal rate bound is \(\tfrac{\sqrt{\beta /\sigma }}{\sqrt{\beta /\sigma }+1}\).

5.1 Tightness

In this section, we present an example that shows tightness of the linear convergence rate bounds in Theorem 5.6 for many algorithm parameters. We consider a two-dimensional Euclidean example, which is given by the following convex optimization problem:

$$\begin{aligned} \mathrm{minimize}&f(x)+f^*(x) \end{aligned}$$
(5.2)

where

$$\begin{aligned} f(x)=\tfrac{\beta }{2}x_1^2, \end{aligned}$$
(5.3)

and \(x=(x_1,x_2)\), and \(\beta >0\). The gradient \(\nabla f=\beta x_1\), so it is cocoercive with factor \(\tfrac{1}{\beta }\). According to [3, Theorem 18.15] this is equivalent to that \(f^*\) is \(\tfrac{1}{\beta }\)-strongly convex and, therefore, \(\partial f^*\) is \(\sigma :=\tfrac{1}{\beta }\)-strongly monotone.

The following proposition shows that when solving (5.2) with f defined in (5.3) using Douglas–Rachford splitting, the upper linear convergence rate bound is exactly attained. The result is proven in Appendix 8.

Proposition 5.7

Suppose that the Douglas–Rachford algorithm (4.3) is applied to solve (5.2) with f in (5.3). Further suppose that the parameters \(\gamma \) and \(\alpha \) satisfy \(\gamma \in (0,\infty )\) and \(\alpha \in [c,1)\) where \(c=\tfrac{1+\gamma \sigma +\gamma ^2\sigma \beta }{1+2\gamma \sigma +\gamma ^2\sigma \beta }\) and that \(z^0=(0,z_2^0)\) with \(z_2^0\ne 0\). Then the \(z^k\) sequence in (4.3) converges exactly with rate (5.1) in Theorem 5.6.

So, for all \(\gamma \) parameters and some \(\alpha \) parameters, the provided bound is tight. Especially, the optimal parameter choices \(\gamma =\tfrac{1}{\sqrt{\beta \sigma }}\) and \(\alpha =\tfrac{1+2\sqrt{\beta /\sigma }}{2(1+\sqrt{\beta /\sigma })}\) give a tight bound.

It is interesting to note that although we have considered a more general class of problems than convex optimization problems, a convex optimization problem is used to attain the worst case rate.

5.2 Comparison to other bounds

In [17], it was shown that Douglas–Rachford splitting converges as \(\tfrac{\sqrt{\beta /\sigma }-1}{\sqrt{\beta /\sigma }+1}\) when solving composite optimization problems of the form \(0\in \gamma \nabla f+\gamma \partial g\), where \(\nabla f\) is \(\sigma \)-strongly monotone and \(\tfrac{1}{\beta }\)-cocoercive and the algorithm parameters are chosen as \(\alpha =1\) and \(\gamma = \tfrac{1}{\sqrt{\beta \sigma }}\). In our setting, with \(\partial f\) being \(\sigma \)-strongly monotone and \(\partial g\) being \(\tfrac{1}{\beta }\)-cocoercive, we can instead pose the equivalent problem \(0\in \gamma \partial \hat{f}(x)+\gamma \partial \hat{g}(x)\) where \(\hat{f} = f-\tfrac{\sigma }{2}\Vert \cdot \Vert ^2\) and \(\hat{g}=g+\tfrac{\sigma }{2}\Vert \cdot \Vert ^2\). Then \(\partial \hat{f}\) is merely monotone and \(\hat{g}\) is \(\sigma \)-strongly monotone and \(\tfrac{1}{\beta +\sigma }\)-cocoercive. For that problem, [17] shows a linear convergence rate of at least rate \(\tfrac{\sqrt{(\beta +\sigma )/\sigma }-1}{\sqrt{(\beta +\sigma )/\sigma }+1}\) (when optimal parameters are used). This rate turns out to be better than the rate provided in Theorem 5.6, i.e., \(\tfrac{\sqrt{\beta /\sigma }}{\sqrt{\beta /\sigma }+1}\), which assumes that the strong convexity and smoothness properties are split between the two operators. This is shown by the following chain of equivalences which departs from the fact that the square root is sub-additive, i.e., that \(\sqrt{\beta +\sigma }\le \sqrt{\sigma }+\sqrt{\beta }\) for \(\beta ,\sigma \ge 0\):

This implies that, from a worst case perspective, it is better to shift both properties into one operator. This is also always possible, without increasing the computational cost in the algorithm, since the prox-operator is just shifted slightly:

$$\begin{aligned} {\mathrm{prox}}_{\gamma \hat{f}}(z)&= {{\mathrm{argmin}}}_x\left\{ \hat{f}(x)+\tfrac{1}{2\gamma }\Vert x-z\Vert ^2\right\} \\&= {{\mathrm{argmin}}}_x\left\{ f(x)-\tfrac{\sigma }{2}\Vert x\Vert ^2+\tfrac{1}{2\gamma }\Vert x-z\Vert ^2\right\} \\&= {{\mathrm{argmin}}}_x\left\{ f(x)+\tfrac{1-\gamma \sigma }{2\gamma }\Vert x-\tfrac{1}{1-\gamma \sigma }z\Vert ^2\right\} \\&= {\mathrm{prox}}_{\tfrac{\gamma }{1-\gamma \sigma } f}(\tfrac{1}{1-\gamma \sigma }z). \end{aligned}$$

A similar relation holds for \({\mathrm{prox}}_{\gamma \hat{g}}\) with the sign in front of \(\gamma \sigma \) flipped.

6 A strongly monotone and Lipschitz continuous

In this section, we consider the case where one of the operators is \(\sigma \)-strongly monotone and \(\beta \)-Lipschitz continuous. This is assumption is stated next.

Assumption 6.1

Suppose that:

  1. (i)

    The operators \(A:~\mathcal {H}\rightarrow \mathcal {H}\) and \(B:~\mathcal {H}\rightarrow 2^{\mathcal {H}}\) are maximally monotone.

  2. (ii)

    A is \(\sigma \)-strongly monotone and \(\beta \)-Lipschitz continuous.

First, we state a result that characterizes the resolvent of A. It is proven in Appendix 8.

Proposition 6.2

Assume that \(A:~\mathcal {H}\rightarrow \mathcal {H}\) is a maximal monotone \(\beta \)-Lipschitz continuous operator. Then the resolvent \(J_{A}=(\mathrm{{Id}}+A)^{-1}\) satisfies

$$\begin{aligned} 2\langle J_Ax-J_Ay,x-y\rangle&\ge \Vert x-y\Vert ^2 +(1-\beta ^2)\Vert J_Ax-J_Ay\Vert ^2. \end{aligned}$$
(6.1)

This resolvent property is used when proving the following contraction factor of the reflected resolvent. The result is proven in Appendix 8.

Theorem 6.3

Suppose that \(A:~\mathcal {H}\rightarrow \mathcal {H}\) is a \(\sigma \)-strongly monotone and \(\beta \)-Lipschitz continuous operator. Then the reflected resolvent \(R_{A} = 2J_A-\mathrm{{Id}}\) is \(\sqrt{1-\tfrac{4\sigma }{1+2\sigma +\beta ^2}}\)-contractive.

The parameter \(\gamma \) that optimizes the contraction factor for \(R_{\gamma A}\) is the minimizer of \(h(\gamma ):=1-\tfrac{4\gamma \sigma }{1+\gamma \sigma +(\gamma \beta )^2}\) (\(\gamma A\) is \(\gamma \sigma \)-strongly monotone and \(\gamma \beta \)-Lipschitz continuous). The gradient \(\nabla h(\gamma ) =\tfrac{4\sigma (\beta ^2\gamma ^2-1)}{(\beta ^2\gamma ^2+2\sigma \gamma +1)^2}\), which implies that the extreme points are given by \(\gamma =\pm \tfrac{1}{\beta }\). Since \(\gamma >0\) and the gradient is positive for \(\gamma >\tfrac{1}{\beta }\) and negative for \(\gamma \in (0,\tfrac{1}{\beta })\), \(\gamma =\tfrac{1}{\beta }\) optimizes the contraction factor. The corresponding rate is

$$\begin{aligned} \sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}} = \sqrt{1-\tfrac{2\sigma /\beta }{1+\sigma /\beta }}=\sqrt{\tfrac{1-\sigma /\beta }{1+\sigma /\beta }}=\sqrt{\tfrac{\beta /\sigma -1}{\beta /\sigma +1}}. \end{aligned}$$

This is summarized in the following proposition.

Proposition 6.4

The parameter \(\gamma \) that optimizes the contraction factor of \(R_{\gamma A}\) is given by \(\gamma = \tfrac{1}{\beta }\). The corresponding contraction factor is \(\sqrt{\tfrac{\beta /\sigma -1}{\beta /\sigma +1}}\).

Now, we are ready to state the convergence rate results for Douglas–Rachford splitting.

Theorem 6.5

Suppose that Assumption 6.1 holds and that the Douglas–Rachford algorithm (4.3) is applied to solve \(0\in \gamma Ax+\gamma Bx\). Let\(\delta =\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}\), then the algorithm converges at least with rate factor

$$\begin{aligned} |1-\alpha |+\alpha \delta \end{aligned}$$
(6.2)

for all \(\alpha \in (0,\tfrac{2}{1+\delta })\). Optimizing this bound w.r.t. \(\alpha \) and \(\gamma \) gives \(\alpha =1\) and \(\gamma =\tfrac{1}{\beta }\) and corresponding optimal rate bound \(\sqrt{\tfrac{\beta /\sigma -1}{\beta /\sigma +1}}\).

Proof

It follows immediately from Theorem 6.3, Lemma 3.3, and Proposition 6.4 by noting that \(\alpha =1\) minimizes (6.2). \(\square \)

In the following section, we will see that there exists a problem from the considered class that converges exactly with the provided rate.

6.1 Tightness

We consider a problem where A is a scaled rotation operator, i.e,:

$$\begin{aligned} A=d\begin{bmatrix} \cos {\psi }&\quad -\sin {\psi } \\ \sin {\psi }&\quad \cos {\psi } \end{bmatrix} \end{aligned}$$
(6.3)

where \(0\le \psi <\tfrac{\pi }{2}\) and \(d\in (0,\infty )\). First, we show that A is strongly monotone and Lipschitz continuous.

Proposition 6.6

The operator A in (6.3) is \(d\cos {\psi }\)-strongly monotone and d-Lipschitz continuous.

Proof

We first show that A is \(d\cos {\psi }\)-strongly monotone. Since A is linear, we have

$$\begin{aligned} \langle Av,v\rangle&=d\langle (\cos {\psi }v_1-\sin {\psi }v_2,\sin {\psi }v_1+\cos {\psi }v_2),(v_1,v_2)\rangle \\&=d\cos {\psi }(v_1^2+v_2^2)=d\cos {\psi }\Vert v\Vert ^2. \end{aligned}$$

That is, A is \(d\cos {\psi }\)-strongly monotone. Since A is a scaled (with d) rotation operator, its largest eigenvalue is d, and hence A is d-Lipschitz. This concludes the proof. \(\square \)

We need an explicit form of the reflected resolvent of A to show that the rate is tight. To state it, we define the following alternative arctan definition that is valid when \(\tan {\xi }=\tfrac{x}{y}\) and \(x\ge 0\):

$$\begin{aligned} \arctan _2\left( \tfrac{x}{y}\right) ={\left\{ \begin{array}{ll} \arctan (\tfrac{x}{y}) &{} {\hbox {if }} x\ge 0, y>0 \\ \arctan (\tfrac{x}{y})+\pi &{} {\hbox {if }} x\ge 0, y<0 \\ \tfrac{\pi }{2}&{} x \ge 0, y=0 \end{array}\right. } \end{aligned}$$
(6.4)

This arctan is defined for nonnegative numerators x only, and outputs an angle in the interval \([0,\pi ]\).

Next, we provide the expression for the reflected resolvent. To simplify its notation, we let \(\sigma \) denote the strong convexity modulus and \(\beta \) the Lipschitz constant of A, i.e.,

$$\begin{aligned} \sigma&= d\cos {\psi },&\beta&=d. \end{aligned}$$
(6.5)

The following result is proven in Appendix 8.

Proposition 6.7

The reflected resolvent of \(\gamma A\), with A in (6.3) and \(\gamma \in (0,\infty )\), is

$$\begin{aligned} R_{\gamma A} = \sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}} \begin{bmatrix} \cos {\xi }&\sin {\xi }\\ -\sin {\xi }&\cos {\xi } \end{bmatrix} \end{aligned}$$

where \(\sigma \) and \(\beta \) are defined in (6.5), and \(\xi \) satisfies \(\xi =\arctan _2\left( \tfrac{2\gamma \sqrt{\beta ^2-\sigma ^2}}{1-(\gamma \beta )^2}\right) \) with \(\arctan _2\) defined in (6.4).

That is, the reflected resolvent is first a rotation then a contraction. The contraction factor is exactly the upper bound on the contraction factor in Theorem 6.5. Therefore, the A in (6.3) can be used to show tightness of the results in Theorem 6.5. To do so, we need another operator B that cancels the rotation introduced by A. For \(\alpha \in (0,1]\), we will need \(R_{\gamma A}R_{\gamma B}=\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}I\) and for \(\alpha >1\), we will need \(R_{\gamma A}R_{\gamma B}=-\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}I\). This is clearly achieved if \(R_{\gamma B}\) is another rotation operator. Using the following straightforward consequence of Minty’s theorem (see [26]) we conclude that any rotation operator (since they are nonexpansive) is the reflected resolvent of a maximally monotone operator.

Proposition 6.8

An operator \(R:~\mathcal {H}\rightarrow \mathcal {H}\) is nonexpansive if and only if it is the reflected resolvent of a maximally monotone operator.

Proof

It follows immediately from [3, Corollary 23.8] and [3, Proposition 4.2]. \(\square \)

With this in mind, we can state the tightness claim.

Proposition 6.9

Let \(\gamma \in (0,\infty )\), \(\delta =\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}\), and \(\xi \) be defined as in Proposition 6.7. Suppose that A is as in (6.3) and B is maximally monotone and satisfies either of the following:

  1. (i)

    if \(\alpha \in (0,1]\): \(B=B_1\) with \(R_{\gamma B_1}=\left[ \begin{matrix} \cos {\xi }&{}-\sin {\xi }\\ \sin {\xi }&{}\cos {\xi }\end{matrix}\right] \),

  2. (ii)

    \(\alpha \in (1,\tfrac{2}{1+\delta })\): \(B=B_2\) with \(R_{\gamma B_2}=\left[ \begin{matrix} \cos {(\pi -\xi )}&{}\sin {(\pi -\xi )}\\ -\sin {(\pi -\xi )}&{}\cos {(\pi -\xi )}\end{matrix}\right] \).

Then the \(z^k\) sequence for solving \(0\in \gamma Ax+\gamma Bx\) using (4.3) converges exactly with the rate \(|1-\alpha |+\alpha \delta \).

Proof

Case (i): Using the reflected resolvent \(R_{\gamma A}\) in Proposition 6.7 and that \(\alpha \in (0,1]\), we conclude that

$$\begin{aligned} z^{k+1}&= (1-\alpha )z^k+\alpha R_{\gamma A}R_{\gamma B}z^k\\&=(1-\alpha )z^k+\alpha \delta \begin{bmatrix} \cos {\xi }&\sin {\xi }\\ -\sin {\xi }&\cos {\xi } \end{bmatrix} \begin{bmatrix} \cos {\xi }&-\sin {\xi }\\ \sin {\xi }&\cos {\xi } \end{bmatrix}z^k\\&=(1-\alpha )z^k+\alpha \delta z^k\\&=|1-\alpha |z^k+\alpha \delta z^k \end{aligned}$$

Case (ii): Using the reflected resolvent \(R_{\gamma A}\) in Proposition 6.7 and that \(\alpha \ge 1\), we conclude that

$$\begin{aligned} z^{k+1}&= (1-\alpha )z^k+\alpha R_{\gamma A}R_{\gamma B}z^k\\&=(1-\alpha )z^k+\alpha \delta \begin{bmatrix} \cos {\xi }&\sin {\xi }\\ -\sin {\xi }&\cos {\xi } \end{bmatrix} \begin{bmatrix} \cos {(\pi -\xi )}&\sin {(\pi -\xi )}\\ -\sin {(\pi -\xi )}&\cos {(\pi -\xi )} \end{bmatrix}z^k\\&=(1-\alpha )z^k-\alpha \delta z^k\\&=-(|1-\alpha |z^k+\alpha \delta )z^k. \end{aligned}$$

In both cases, the convergence rate is exactly \(|1-\alpha |+\alpha \sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +(\gamma \beta )^2}}\). This completes the proof. \(\square \)

Remark 6.10

It can be shown that the maximally monotone operator \(B_1\) that gives \(R_{\gamma B_1}\) satisfies \(B_1=\tfrac{1}{\gamma (1+\cos {\xi })} \left[ \begin{matrix} 0&{}-\sin {\xi }\\ \sin {\xi }&{}0 \end{matrix}\right] \) if \(\xi \in [0,\pi )\) and \(B_1=\partial \iota _0\) (that is, \(B_1\) is the subdifferential operator of the indicator function \(\iota _0\) of the origin) if \(\xi =\pi \). Similarly, the maximally monotone operator \(B_2\) that gives \(R_{\gamma B_2}\) satisfies \(B_2=\tfrac{1}{\gamma (1-\cos {\xi })} \left[ \begin{matrix} 0&{}-\sin {\xi }\\ \sin {\xi }&{}0 \end{matrix}\right] \) if \(\xi \in (0,\pi ]\) and \(B_2=0\) if \(\xi =0\).

We have shown that the rate provided in Theorem 6.5 is tight for all feasible \(\alpha \) and \(\gamma \).

6.2 Comparison to other bounds

In Fig. 1, we have compared the linear convergence rate result in Theorem 6.5 to the convergence rate result in [24]. The comparison is made with optimal \(\gamma \)-parameters for both bounds. The result in [24] is provided in the standard Douglas–Rachford setting, i.e., with \(\alpha =1/2\). By instead letting \(\alpha =1\), this rate can be improved, see [8] (which shows an improved rate in the composite convex optimization case, but the same rate can be shown to hold also for monotone inclusion problems). Also this improved rate is added to the comparison in Fig. 1. We see that both rates that follow from [24] are suboptimal and worse than the rate bound in Theorem 6.5.

Fig. 1
figure 1

Convergence rate comparison for general monotone inclusion problems where one operator is strongly monotone and Lipschitz continuous. We compare Theorem 6.5 to [24], and an improvement to [24] which holds when \(\alpha =1\)

7 A strongly monotone and cocoercive

In this section, we consider the case where A is strongly monotone and cocoercive; that is, we assume the following.

Assumption 7.1

Suppose that:

  1. (i)

    The operators \(A:~\mathcal {H}\rightarrow \mathcal {H}\) and \(B:~\mathcal {H}\rightarrow 2^{\mathcal {H}}\) are maximally monotone.

  2. (ii)

    A is \(\sigma \)-strongly monotone and \(\tfrac{1}{\beta }\)-cocoercive.

The linear convergence result for Douglas–Rachford splitting will follow from the contraction factor of the reflected resolvent of A. The contraction factor is provided in the following theorem, which is proven in Appendix 8.

Theorem 7.2

Suppose that \(A:~\mathcal {H}\rightarrow \mathcal {H}\) is a \(\sigma \)-strongly monotone and \(\tfrac{1}{\beta }\)-cocoercive operator. Then its reflected resolvent \(R_{A} = 2J_A-\mathrm{{Id}}\) is contractive with factor \(\sqrt{1-\tfrac{4\sigma }{1+2\sigma +\sigma \beta }}\).

When considering the reflected resolvent of \(\gamma A\) where \(\gamma \in (0,\infty )\), the \(\gamma \)-parameter can be chosen to optimize the contraction factor of \(R_{\gamma A}\). The operator \(\gamma A\) is \(\gamma \sigma \)-strongly monotone and \(\tfrac{1}{\gamma \beta }\)-cocoercive, so the optimal \(\gamma >0\) minimizes \(h(\gamma ):=1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }\). The gradient of h satisfies \(\nabla h(\gamma ) = \tfrac{4\sigma (\beta \sigma \gamma ^2-1)}{(\beta \sigma \gamma ^2+2\sigma \gamma +1)^2}\), so the extreme points of h are given by \(\gamma =\pm \tfrac{1}{\sqrt{\beta \sigma }}\). Since \(\gamma >0\) and the gradient is negative for \(\gamma \in (0,\tfrac{1}{\sqrt{\beta \sigma }})\) and positive for \(\gamma >\tfrac{1}{\sqrt{\beta \sigma }}\), the parameter \(\gamma =\tfrac{1}{\sqrt{\beta \sigma }}\) minimizes the contraction factor. The corresponding contraction factor is

$$\begin{aligned} \sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }} =\sqrt{1-\tfrac{2\sqrt{\sigma /\beta }}{1+\sqrt{\sigma }{\beta }}}=\sqrt{\tfrac{1-\sqrt{\sigma /\beta }}{1+\sqrt{\sigma /\beta }}}=\sqrt{\tfrac{\sqrt{\beta /\sigma }-1}{\sqrt{\beta /\sigma }+1}}. \end{aligned}$$

This is summarized in the following proposition.

Proposition 7.3

The parameter \(\gamma \in (0,\infty )\) that optimizes the contraction factor for \(R_{\gamma A}\) is \(\gamma \!=\!\tfrac{1}{\sqrt{\beta \sigma }}\). The corresponding contraction factor is \(\sqrt{\tfrac{\sqrt{\beta /\sigma }-1}{\sqrt{\beta /\sigma }+1}}\).

Now we are ready to state the linear convergence rate result for the Douglas–Rachford algorithm.

Theorem 7.4

Suppose that Assumption 7.1 holds and that the Douglas–Rachford algorithm (4.3) is applied to solve \(0\in \gamma Ax+\gamma Bx\). Let\(\delta = \sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }}\), then the algorithm converges at least with rate factor

$$\begin{aligned} |1-\alpha |+\alpha \delta \end{aligned}$$
(7.1)

for all \(\alpha \in (0,\tfrac{2}{1+\delta })\). Optimizing this bound w.r.t. \(\alpha \) and \(\gamma \) gives \(\alpha =1\) and \(\gamma =\tfrac{1}{\sqrt{\beta \sigma }}\) and corresponding optimal rate bound \(\sqrt{\tfrac{\sqrt{\beta /\sigma }-1}{\sqrt{\beta /\sigma }+1}}\).

Proof

It follows immediately from Theorem 7.2, Lemma 3.3, and Proposition 7.3 by noting that \(\alpha =1\) minimizes (7.1). \(\square \)

7.1 Tightness

In this section, we provide a two-dimensional example that shows that the provided bounds are tight. We let A be the resolvent of a scaled rotation operator to achieve this. Let C be that scaled rotation operator, i.e.,

$$\begin{aligned} C=c\begin{bmatrix} \cos {\psi }&\quad -\sin {\psi }\\ \sin {\psi }&\quad \cos {\psi } \end{bmatrix} \end{aligned}$$
(7.2)

with \(c\in (1,\infty )\) and \(\psi \in [0,\tfrac{\pi }{2})\). We will let A satisfy \(A=dJ_{C}\) for some \(d\in (0,\infty )\); that is

$$\begin{aligned} A=d(C+I)^{-1}=\tfrac{d}{(1+c\cos {\psi })^2+c^2\sin ^2{\psi }} \begin{bmatrix} c\cos {\psi }+1&c\sin {\psi }\\ -c\sin {\psi }&c\cos {\psi }+1 \end{bmatrix}. \end{aligned}$$
(7.3)

In the following proposition, we state the strong monotonicity and cocoercivity properties of A.

Proposition 7.5

The operator A in (7.3) is \(\tfrac{1+c\cos {\psi }}{d}\)-cocoercive and strongly monotone with modulus \(\tfrac{d(1+c\cos {\psi })}{1+2c\cos {\psi }+c^2}\).

Proof

The matrix C in (7.2) is \(c\cos {\psi }\)-strongly monotone (see Proposition 6.6), so \(J_C\) is \((1+c\cos {\psi })\)-cocoercive (see [3, Definition 4.4]) and the operator \(A=d(I+C)^{-1}\) is \(\tfrac{1+c\cos {\psi }}{d}\)-cocoercive. Further, since C is monotone and c-Lipschitz continuous (see Proposition 6.6), the following holds (see Proposition 6.2):

$$\begin{aligned} 2\langle J_{C}x-J_Cy,x-y\rangle \ge \Vert x-y\Vert ^2+(1-c^2)\Vert J_Cx-J_Cy\Vert ^2. \end{aligned}$$
(7.4)

Since \(J_C\) is \((1+c\cos {\psi })\)-cocoercive, we have

$$\begin{aligned} \langle J_Cx-J_Cy,x-y\rangle \ge (1+c\cos {\psi })\Vert J_Cx-J_Cy\Vert ^2. \end{aligned}$$
(7.5)

We add (7.5) multiplied by \(-\tfrac{1-c^2}{1+c\cos {\psi }}\) (which is positive since \(c\in (1,\infty )\)) to (7.4) to get

$$\begin{aligned} (2-\tfrac{1-c^2}{1+c\cos {\psi }})\langle J_{C}x-J_Cy,x-y\rangle \ge \Vert x-y\Vert ^2; \end{aligned}$$

that is, \(J_C\) is \(\sigma \)-strongly monotone with

$$\begin{aligned} \sigma \frac{1}{2-\tfrac{1-c^2}{1+c\cos {\psi }}}=\frac{1+c\cos {\psi }}{2+2c\cos {\psi }-1+c^2}=\frac{1+c\cos {\psi }}{1+2c\cos {\psi }+c^2}, \end{aligned}$$

so A is strongly monotone with parameter \(d\tfrac{1+c\cos {\psi }}{1+2c\cos {\psi }+c^2}\). This concludes the proof. \(\square \)

This shows that the assumptions needed for the linear convergence rate result in Theorem 7.4 hold. To prove the tightness claim, we need an expression for the reflected resolvent of A. This is easier expressed in the strong convexity modulus, which we define as \(\sigma \) and the inverse cocoercivity constant, which we define as \(\beta \), i.e.,

$$\begin{aligned} \sigma&=\tfrac{d(1+c\cos {\psi })}{1+2c\cos {\psi }+c^2},&\beta&=\tfrac{d}{1+c\cos {\psi }}. \end{aligned}$$
(7.6)

The following results is proven in Appendix 8.

Proposition 7.6

The reflected resolvent \(R_{\gamma A}\) of \(\gamma A\), where A is defined in (7.3) and \(\gamma \in (0,\infty )\), is given by

$$\begin{aligned} R_{\gamma A}=\sqrt{1-\frac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }} \begin{bmatrix} \cos {\xi }&-\sin {\xi }\\ \sin {\xi }&\cos {\xi } \end{bmatrix} \end{aligned}$$

where \(\sigma \) and \(\beta \) are defined in (7.6), and \(\xi \) satisfies \(\xi =\arctan _2\left( \tfrac{2\gamma \sqrt{\sigma (\beta -\sigma )}}{1-\sigma \beta \gamma ^2}\right) \) with \(\arctan _2\) defined in (6.4).

Based on this reflected resolvent, we can show that the rate bound in Theorem 7.4 is indeed tight. The proof of the following result is the same as the proof to Proposition 6.9.

Proposition 7.7

Let \(\gamma \in (0,\infty )\), \(\delta =\sqrt{1-\tfrac{4\gamma \sigma }{1+2\gamma \sigma +\gamma ^2\sigma \beta }}\), and let \(\xi \) be as defined in Proposition 7.6. Suppose that A is as in (7.3) and B is maximally monotone and satisfies either of the following:

  1. (i)

    if \(\alpha \in (0,1]\): \(B=B_1\) with \(R_{\gamma B_1}= \left[ \begin{matrix} \cos {\xi }&{}\sin {\xi }\\ -\sin {\xi }&{}\cos {\xi } \end{matrix}\right] \),

  2. (ii)

    \(\alpha \in (1,\tfrac{2}{1+\delta })\): \(B=B_2\) with \(R_{\gamma B_2}=\left[ \begin{matrix} \cos {(\pi -\xi )}&{}-\sin {(\pi -\xi )}\\ \sin {(\pi -\xi )}&{}\cos {(\pi -\xi )} \end{matrix}\right] \).

Then the \(z^k\) sequence for solving \(0\in \gamma Ax+\gamma Bx\) using (4.3) converges exactly with the rate \(|1-\alpha |+\alpha \delta \).

So, we have shown that the rate in Theorem 7.4 is tight for all feasible algorithm parameters \(\alpha \) and \(\gamma \).

Fig. 2
figure 2

Convergence rate comparison between Theorems 6.5,  7.4, and [17]. In all, one operator has both regularity properties. It is strongly monotone in all examples and Lipschitz in Theorem 6.5, cocoercive in Theorem 7.4 (which is stronger than Lipschitz), and a cocoercive subdifferential operator in [17] (which is the strongest property). The worst case rate improves when the class of problems becomes more restricted

7.2 Comparison to other bounds

We have shown tight convergence rate estimates for Douglas–Rachford splitting when the monotone operator A is cocoercive and strongly monotone (Theorem 7.4). In Sect. 6, we showed tight estimates when A is Lipschitz and strongly monotone (Theorem 6.5). In [17], tight convergence rate estimates are proven for the case when A and B are subdifferential operators of proper closed and convex functions and A is strongly monotone and Lipschitz continuous (which in this case is equivalent to cocoercive). The class of problems considered in [17] is a subclass of the problems considered in this section, which in turn is a subclass of the problems considered in Sect. 6. The optimal rates for these classes of problems are shown in Fig. 2. By restricting the problem classes, the rate bounds get tighter. This is in contrast to the case in Sect. 5, where a convex optimization problem achieved the worst case estimate.

8 Conclusions

We have shown linear convergence rate bounds for Douglas–Rachford splitting for monotone inclusion problems with three different sets of assumptions. One setting was the one used by Lions and Mercier [24], for which we provided a tighter bound. We also stated linear convergence rate bounds under two other assumptions, for which no other linear rate bounds were previously available. In addition, we have shown that all our rate bounds are tight for, in two cases all feasible algorithm parameters, and in the remaining case many algorithm parameters.