1 Introduction and Main Results

In their paper [23], Wang and Zhang studied existence and uniqueness for a class of stochastic differential equations (SDEs) with Hölder–Dini continuous drifts; Wang [22] also investigated the strong Feller property, log-Harnack inequality and gradient estimates for SDEs with Dini-continuous drifts. So far there are no numerical schemes available for SDEs with Hölder–Dini continuous drifts. So the aim of this paper is to prove the convergence of Euler–Maruyama (EM) scheme and obtain the rate of convergence for these equations under reasonable conditions.

It is well-known that convergence rate of EM for SDEs with regular coefficients is one-half, see, e.g., [11]. With regard to convergence rate of EM scheme under various settings, we refer to, e.g., [1] for stochastic differential delay equations (SDDEs) with polynomial growth with respect to (w.r.t.) the delay variables, [4] for SDDEs under local Lipschitz and monotonicity condition, [14] for SDEs with discontinuous coefficients, and [25] for SDEs under log-Lipschitz condition, whereas for SDEs with non-globally Lipschitz continuous coefficients; see, e.g., [2, 6,7,8], to name a few. On the other hand, Hairer et al. [5] have established the first result in the literature that Euler’s method converges to the solution of an SDE with smooth coefficients in the strong and numerical weak sense without any arbitrarily small polynomial rate of convergence, and Jentzen et al. [9] have further given a counterexample that no approximation method converges to the true solution in the mean square sense with polynomial rate.

The rate of convergence of EM scheme for SDEs with irregular coefficients has also gained much attention. For instance, adopting the Yamada–Watanabe approximation approach, [3] discussed strong convergence rate in \(L^p\)-norm sense; using the Yamada–Watanabe approximation trick and heat kernel estimate, [16] studied strong convergence rate in \(L^1\)-norm sense for a class of non-degenerate SDEs, where the bounded drift term satisfies a weak monotonicity and is of bounded variation w.r.t. a Gaussian measure and the diffusion term is Hölder continuous; applying the Zvonkin transformation, [18] discussed strong convergence rate in \(L^p\)-norm sense for SDEs with additive noises, where the drift coefficient is bounded and Hölder continuous.

It is worth pointing out that [16, 18] focused on convergence rate of EM for SDEs with Hölder continuous and bounded drifts, which rules out Hölder–Dini continuous and unbounded drifts. On the other hand, most of the existing literature on convergence rate of EM scheme is concerned with non-degenerate SDEs. Yet the corresponding issue for degenerate SDEs is scarce, to the best of our knowledge. So, in this work, we will not only investigate the convergence of the EM scheme for SDEs with Hölder–Dini continuous drifts, but will also study the degenerate setup. For wellposedness of SDEs with singular coefficients, we refer to, e.g., [13, 22, 23, 27] for more details.

Throughout the paper, the following notation will be used. Let nm be positive integers, \((\mathbb {R}^n, \left\langle \cdot ,\cdot \right\rangle ,|\cdot |)\) the n-dimensional Euclidean space, and \(\mathbb {R}^n\otimes \mathbb {R}^m\) the family of all \(n\times m\) matrices. Let \(\Vert \cdot \Vert \) and \(\Vert \cdot \Vert _{\mathrm {HS}}\) stand for the usual operator norm and the Hilbert–Schmidt norm, respectively. Fix \(T>0\) and set \(\Vert f\Vert _{T,\infty }:=\sup _{t\in [0,T],x\in \mathbb {R}^m}\Vert f(t,x)\Vert \) for an operator-valued map f on \([0,T]\times \mathbb {R}^m\). \(C(\mathbb {R}^m;\mathbb {R}^n)\) means the continuous functions \(f:\mathbb {R}^m\rightarrow \mathbb {R}^n.\) Let \(C^2(\mathbb {R}^n;\mathbb {R}^n\otimes \mathbb {R}^n)\) be the family of all continuously twice differentiable functions \(f:\mathbb {R}^n\rightarrow \mathbb {R}^n\otimes \mathbb {R}^n\). Denote \(\mathbb {M}_\mathrm{non}^n\) by the collection of all nonsingular \(n\times n\)-matrices. Let \(\mathscr {S}_0\) be the collection of all slowly varying functions \(\phi :\mathbb {R}_+\rightarrow \mathbb {R}_+\) at zero in Karamata’s sense (i.e., \(\lim _{t\rightarrow 0}\frac{\phi (\lambda t)}{\phi (t)}=1\) for any \(\lambda >0\)), which are bounded from 0 and \(\infty \) on \([\varepsilon ,\infty )\) for any \(\varepsilon >0\). Let \(\mathscr {D}_0\) be the family of Dini functions, i.e.,

$$\begin{aligned} \begin{aligned} \mathscr {D}_0:&= \left\{ \phi \Big |\phi : \mathbb {R}_+\rightarrow \mathbb {R}_+ \text{ is } \text{ increasing } \text{ and } \int _0^1{\frac{\phi (s)}{s}{\mathrm{d}}s}<\infty \right\} . \end{aligned} \end{aligned}$$

A function \(f:\mathbb {R}^m\rightarrow \mathbb {R}^n\) is called Dini continuity if there exists \(\phi \in \mathscr {D}_0\) such that \(|f(x)-f(y)|\le \phi (|x-y|)\) for any \(x,y\in \mathbb {R}^m.\) We remark that every Dini-continuous function is continuous and every Lipschitz continuous function is Dini continuous; Moreover, if f is Hölder continuous, then f is Dini continuous. Nevertheless, there are numerous Dini-continuous functions, which are not Hölder continuous at all, see, e.g.,

$$\begin{aligned} \phi (x)={\left\{ \begin{array}{ll} \frac{1}{(\log (c+x^{-1}))^{(1+\delta )}}, &{} \quad x>0\\ 0, &{} \quad x=0 \end{array}\right. } \end{aligned}$$

for some constants \(\delta >0\) and \(c\ge {\mathrm{e}}^{3+2\delta }\). Set

$$\begin{aligned} \begin{aligned} \mathscr {D}:= \{\phi \in \mathscr {D}_0| \phi ^{2} \text { is concave}\}\quad { \text{ and } }\quad \mathscr {D}^\varepsilon := \{\phi \in \mathscr {D}|\phi ^{2(1+\varepsilon )} \text { is concave} \} \end{aligned} \end{aligned}$$

for some \(\varepsilon \in (0,1)\) sufficiently small. Clearly, \(\phi \) constructed above belongs to \(\mathscr {D}^\varepsilon .\) A function \(f:\mathbb {R}^m\rightarrow \mathbb {R}^n\) is called Hölder–Dini continuity of order \(\alpha \in [0,1)\) if

$$\begin{aligned} |f(x)-f(y)|\le |x-y|^\alpha \phi (|x-y|),\quad |x-y|\le 1 \end{aligned}$$

for some \(\phi \in \mathscr {D}_0;\) see, for instance,

$$\begin{aligned} f(x)={\left\{ \begin{array}{ll} \frac{1}{(1+x)^\alpha (\log (c+x^{-1}))^{(1+\delta )}}, &{} \quad x>0\\ 0, &{} \quad x=0 \end{array}\right. } \end{aligned}$$

for some constants \(c,\delta >0\) and \(\alpha \in (0,1).\)

Before proceeding further, a few words about the notation are in order. Generic constants will be denoted by c; we use the shorthand notation \(a\lesssim b\) to mean \(a\le c\,b\). If the constant c depends on a parameter p, we shall also write \(c_p\) and \(a\lesssim _p b\). Throughout the paper, for fixed \(T>0\), \(C_T>0\), dependent on the quantity T, is a generic constant which may change from line to line.

1.1 Non-degenerate SDEs with Bounded Coefficients

In this subsection, we consider an SDE on \((\mathbb {R}^n,\left\langle \cdot ,\cdot \right\rangle , |\cdot |)\)

$$\begin{aligned} {\mathrm{d}}X_t=b_t(X_t){\mathrm{d}}t+\sigma _t(X_t){\mathrm{d}}W_t, \quad t>0,\; X_0=x, \end{aligned}$$
(1.1)

where \(b: \mathbb {R}_+\times \mathbb {R}^n\rightarrow \mathbb {R}^n\), \(\sigma : \mathbb {R}_+\times \mathbb {R}^n\rightarrow \mathbb {R}^n\otimes \mathbb {R}^n\), and \((W_t)_{t\ge 0}\) is an n-dimensional Brownian motion defined on a complete filtered probability space \((\Omega , \mathscr {F}, (\mathscr {F}_t)_{t\ge 0}, \mathbb {P})\).

With regard to (1.1), we suppose that there exists \(\phi \in \mathscr {D}\) such that, for any \(s,t\in [0,T]\) and \(x,y\in \mathbb {R}^n,\)

(A1) :

\(\sigma _t\in C^2(\mathbb {R}^n;\mathbb {R}^n\otimes \mathbb {R}^n)\), \(\sigma _t(x)\in \mathbb {M}_\mathrm{non}^n\), and

$$\begin{aligned} \Vert b\Vert _{T,\infty }+\sum _{i=0}^{2}\Vert \nabla ^{i} \sigma \Vert _{T,\infty }+\Vert \nabla \sigma ^{-1}\Vert _{T,\infty }+\Vert \sigma ^{-1}\Vert _{T,\infty }<\infty , \end{aligned}$$
(1.2)

where \(\nabla ^i\) means the ith order gradient operator;

(A2) :

(Regularity of b w.r.t. spatial variables)

$$\begin{aligned} |b_t(x)-b_t(y)|\le \phi (|x-y|); \end{aligned}$$
(A3) :

(Regularity of b and \(\sigma \) w.r.t. time variables)

$$\begin{aligned} |b_{s}(x)-b_{t}(x)|+\Vert \sigma _{s}(x)-\sigma _{t}(x)\Vert _{\mathrm {HS}}\le \phi (|s-t|). \end{aligned}$$

Without loss of generality, we take an integer \(N>0\) sufficiently large such that the stepsize \(\delta :=T/N\in (0,1)\). The continuous-time EM scheme corresponding to (1.1) is

$$\begin{aligned} {\mathrm{d}}Y_t=b_{t_\delta }( Y_{t_\delta }){\mathrm{d}}t+\sigma _{t_\delta } ( Y_{t_\delta }){\mathrm{d}}W_t,\quad t>0,\; Y_0=X_0=x. \end{aligned}$$
(1.3)

Herein, \(t_\delta :=\lfloor t/\delta \rfloor \delta \) with \(\lfloor t/\delta \rfloor \) the integer part of \(t/\delta \).

The first contribution in this paper is stated as follows.

Theorem 1.1

Let (A1)(A3) hold. Then

$$\begin{aligned} \left( \mathbb {E}\left( \sup _{0\le t\le T}|X_t-Y_t|^2\right) \right) ^{1/2}\lesssim _T\phi (C_T\sqrt{\delta }) \end{aligned}$$

for some constant \(C_T\ge 1\).

Under (A1) and (A2), (1.1) admits a unique non-explosive strong solution \((X_t)_{t\in [0,T]}\); see, e.g., [22, Theorem 1.1]. In Theorem 1.1, by taking \(\phi (x)=x^\beta \) for \(x\ge 0\) and \(\beta \in (0,1],\) and inspecting closely the argument of Theorem 1.1, the concave property of \(\phi ^2\) can be dropped. Moreover, we have

$$\begin{aligned} \mathbb {E}\left( \sup _{0\le t\le T}|X_t-Y_t|^2\right) \lesssim _T\delta ^\beta . \end{aligned}$$

So, our present result covers [18, Theorem 2.13], where the drift is Hölder continuous. In particular, for the setting \(\beta =1,\) it reduces to the classical result on strong convergence of EM scheme for SDEs with regular coefficients; see, e.g., [11] for more details.

1.2 Non-degenerate SDEs with Unbounded Coefficients

As we see, in Theorem 1.1, the coefficients are uniformly bounded, and that the drift term b satisfies the global Dini-continuous condition [see (A2) above], which seems to be a little bit stringent. Therefore, concerning the coefficients, it is quite natural to replace uniform boundedness by local boundedness and global Dini continuity by local Dini continuity, respectively.

In lieu of (A1)(A3), as for (1.1) we assume that, for any \(s,t\in [0,T]\) and \(k\ge 1\),

(A1’) :

\(\sigma _t\in C^2(\mathbb {R}^n;\mathbb {R}^n\otimes \mathbb {R}^n)\), for every \(x\in \mathbb {R}^n,\)\(\sigma _t(x)\in \mathbb {M}_\mathrm{non}^n\), and

$$\begin{aligned}&|b_t(x)|+\sum _{i=0}^{2}\Vert \nabla ^{i} \sigma _t(x)\Vert _{\mathrm {HS}}+\Vert \nabla \sigma _t^{-1}(x)\Vert _{\mathrm {HS}}\\&\quad +\Vert \sigma _t^{-1}(x)\Vert _{\mathrm {HS}}\le K_T (1+|x|),\quad x\in \mathbb {R}^n \end{aligned}$$

for some constant \(K_T>0\);

(A2’) :

(Regularity of b w.r.t. spatial variables) There exists \(\phi _k\in \mathscr {D}\) such that

$$\begin{aligned} |b_{t}(x)-b_{t}(y)|\le \phi _k(|x-y|),\quad |x|\vee |y|\le k; \end{aligned}$$
(A3’) :

(Regularity of b and \(\sigma \) w.r.t. time variables) For \(\phi _k\in \mathscr {D}\) such that (A2’),

$$\begin{aligned} |b_{s}(x)-b_{t}(x)|+\Vert \sigma _{s}(x)-\sigma _{t}(x)\Vert _\mathrm{HS}\le \phi _k(|s-t|),\quad |x| \le k. \end{aligned}$$

By employing the cutoff approach, Theorem 1.1 can be extended to include SDEs with local Dini-continuous coefficients, which is presented as below.

Theorem 1.2

Assume (A1’)(A3’) hold. Then it holds that

$$\begin{aligned} \lim _{\delta \rightarrow 0} \mathbb {E}\left( \sup _{0\le t\le T}|X_t-Y_t|^2\right) =0. \end{aligned}$$
(1.4)

In particular, if \(\phi _k(s)={\mathrm{e}}^{{\mathrm{e}}^{c_0k^4}}s^{\alpha }, s\ge 0,\) for some \(\alpha \in (0,1]\) and \(c_0>0\), then

$$\begin{aligned} \mathbb {E}\left( \sup _{0\le t\le T}|X_t-Y_t|^2\right) \lesssim \inf _{\varepsilon \in (0,1)}\left\{ (\log \log (\delta ^{-\alpha \varepsilon }))^{-\frac{1}{4}}+\delta ^{\alpha (1-\varepsilon )}\right\} . \end{aligned}$$
(1.5)

Moreover, if \(\sigma _{\cdot }(\cdot )\) is uniformly bounded (i.e., \(\Vert \sigma \Vert _{T,\infty }<\infty \)), then

$$\begin{aligned} \mathbb {E}\left( \sup _{0\le t\le T}|X_t-Y_t|^2\right) \lesssim \inf _{\varepsilon \in (0,1)}\left\{ \exp \left( -\frac{1}{C_T\Vert \sigma \Vert _{T,\infty }^2}(\log \log (\delta ^{-\alpha \varepsilon }))^{\frac{1}{2}}\right) +\delta ^{\alpha (1-\varepsilon )}\right\} \end{aligned}$$
(1.6)

for some constant \(C_T>0\), where \(\Vert \sigma \Vert _{T,\infty }:=\sup _{0\le t\le T,x\in \mathbb {R}^n}\Vert \sigma _t(x)\Vert _\mathrm{HS}\).

Under (A1’) and (A2’), (1.1) enjoys a unique strong solution \((X_t)_{t\in [0,T]}\); see, for instance, [22, Theorem 1.1]. Theorem 1.2 has improved the result in [17] since the drift involved is allowed to be unbounded and local Dini continuous, while the drift in [17] is bounded and Hölder continuous. Furthermore, by comparing (1.5) with (1.6), we infer that the convergence rate of EM scheme is better whenever \(\sigma _{\cdot }(\cdot )\) is uniformly bounded.

Remark 1.3

In fact, in terms of [10, Theorem D], (1.4) holds under (A1’)(A3’) as well as the pathwise uniqueness of (1.1), whereas in Sect. 4 we provide an alternative proof of (1.4) in order to reveal the convergence rate of the EM scheme.

1.3 Degenerate SDEs

So far, most of the existing literature on convergence of EM scheme for SDEs with irregular coefficients is concerned with non-degenerate SDEs; see, e.g., [16,17,18] for SDEs driven by Brownian motions, and [18] for SDEs driven by jump processes. The issue for the setup of degenerate SDEs has not yet been considered to date to the best of our knowledge. Nevertheless, in this subsection, we make an attempt to discuss the topic for degenerate SDEs with Hölder–Dini continuous drift.

For notation simplicity, we shall write \(\mathbb {R}^{2n}\) instead of \(\mathbb {R}^n\times \mathbb {R}^n\). Consider the following degenerate SDE on \(\mathbb {R}^{2n}\)

$$\begin{aligned} {\left\{ \begin{array}{ll} {\mathrm{d}}X_t^{(1)}=b^{(1)}_t(X_t^{(1)},X_t^{(2)}){\mathrm{d}}t,&{} \quad X_0^{(1)}=x^{(1)}\in \mathbb {R}^n,\\ {\mathrm{d}}X_t^{(2)}=b^{(2)}_t(X_t^{(1)},X_t^{(2)}){\mathrm{d}}t+\sigma _t(X_t^{(1)},X_t^{(2)}){\mathrm{d}}W_t, &{} \quad X_0^{(2)}=x^{(2)}\in \mathbb {R}^n, \end{array}\right. } \end{aligned}$$
(1.7)

where \(b^{(1)}_t,b^{(2)}_t:\mathbb {R}^{2n}\rightarrow \mathbb {R}^{n}\), \(\sigma _t: \mathbb {R}^{2n} \rightarrow \mathbb {R}^n\otimes \mathbb {R}^n\), and \((W_t)_{t\ge 0}\) is an n-dimensional Brownian motion defined on the complete filtered probability space \((\Omega , \mathscr {F}, (\mathscr {F}_t)_{t\ge 0}, \mathbb {P})\). (1.7) is also called the stochastic Hamiltonian system, which has been investigated extensively in [24, 26] on Bismut formulae, in [15] on ergodicity, in [21] on hypercontractivity, and in [23] on wellposedness, to name a few. For applications of the model (1.7), we refer to, e.g., Soize [20].

Write the gradient operator on \(\mathbb {R}^{2n}\) as \(\nabla =(\nabla ^{(1)},\nabla ^{(2)})\), where \(\nabla ^{(1)}\) and \( \nabla ^{(2)}\) stand for the gradient operators w.r.t. the first and the second components, respectively.

We assume that there exists \(\phi \in \mathscr {D}^\varepsilon \cap \mathscr {S}_0\) such that for any \(x=(x^{(1)},x^{(2)}),y=(y^{(1)},y^{(2)})\in \mathbb {R}^{2n}\) and \(s,t\in [0,T]\),

(C1) :

(Hypoellipticity) \((\nabla ^{(2)}b_t^{(1)})(x), \sigma _t(x)\in \mathbb {M}_\mathrm{non}^{n }\), and

$$\begin{aligned}&\Vert b^{(1)}\Vert _{T,\infty }+\Vert b^{(2)}\Vert _{T,\infty }+ \Vert \nabla ^{(2)}b^{(1)}\Vert _{T,\infty }+\left\| (\nabla ^{(2)}b^{(1)})^{-1}\right\| _{T,\infty }\\&\quad +\Vert \sigma \Vert _{T,\infty }+\Vert \nabla \sigma \Vert _{T,\infty }+\Vert \sigma ^{-1}\Vert _{T,\infty }<\infty ; \end{aligned}$$
(C2) :

(Regularity of \(b^{(1)}\) w.r.t. spatial variables)

$$\begin{aligned} |b_t^{(1)}(x)-b_t^{(1)}(y)|\le |x^{(1)}-y^{(1)}|^{\frac{2}{3}}\phi (|x^{(1)}-y^{(1)}|)&\quad \text{ if } x^{(2)}=y^{(2)},\\ \Vert (\nabla ^{(2)}b_t^{(1)})(x)-(\nabla ^{(2)}b_t^{(1)})(y)\Vert _{\mathrm {HS}}\le \phi (|x^{(2)}-y^{(2)}|)&\quad \text{ if } x^{(1)}=y^{(1)}; \end{aligned}$$
(C3) :

(Regularity of \(b^{(2)}\) w.r.t. spatial variables)

$$\begin{aligned} |b_t^{(2)}(x)-b_t^{(2)}(y)|\le |x^{(1)}-y^{(1)}|^{\frac{2}{3}}\phi (|x^{(1)}-y^{(1)}|)+ \phi ^{\frac{7}{2}}(|x^{(2)}-y^{(2)}|); \end{aligned}$$
(C4) :

(Regularity of \(b^{(1)}, b^{(2)} \) and \(\sigma \) w.r.t. time variables)

$$\begin{aligned} |b_t^{(1)}(x)-b_s^{(1)}(x)| +|b_t^{(2)}(x)-b_s^{(2)}(x)| +\Vert \sigma _t(x)-\sigma _s(x)\Vert _{\mathrm {HS}}\le \phi (|t-s|). \end{aligned}$$

Observe from (C2) and (C3) that \(b^{(1)}(\cdot ,x^{(2)})\) and \(b^{(2)}(\cdot ,x^{(2)})\) with fixed \(x^{(2)}\) are locally Hölder–Dini continuous of order \(\frac{2}{3}\), and \((\nabla ^{(2)}b^{(1)})(x^{(1)},\cdot )\) and \(b^{(2)}(x^{(1)},\cdot )\) with fixed \(x^{(1)}\) are merely Dini-continuous.

The continuous-time EM scheme associated with (1.7) is as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} {\mathrm{d}}Y_t^{(1)}=b^{(1)}_{t_\delta }(Y_{t_\delta }^{(1)},Y_{t_\delta }^{(2)}){\mathrm{d}}t,&{} \quad X_0^{(1)}=x^{(1)}\in \mathbb {R}^n,\\ {\mathrm{d}}Y_t^{(2)}=b^{(2)}_{t_\delta }(Y_{t_\delta }^{(1)},Y_{t_\delta }^{(2)}){\mathrm{d}}t+\sigma _{t_\delta }(Y_{t_\delta }^{(1)},Y_{t_\delta }^{(2)}){\mathrm{d}}W_t,&{} \quad X_0^{(2)}=x^{(2)}\in \mathbb {R}^n. \end{array}\right. } \end{aligned}$$
(1.8)

Another contribution in this paper reads as below.

Theorem 1.4

Let (C1)(C4) hold. Then

$$\begin{aligned} \left( \mathbb {E}\left( \sup _{0\le t\le T}|X_t-Y_t|^2\right) \right) ^{1/2}\lesssim _T\phi (C_T\sqrt{\delta }) \end{aligned}$$

for some constant \(C_T\ge 1\), in which

According to [23, Theorem 1.2], (1.7) admits a unique strong solution under the assumptions (C1)(C3). In fact, (1.7) is wellposed under (C1)(C3) with \(\phi \in \mathscr {D}_0\cap \mathscr {S}_0\) in lieu of \(\phi \in \mathscr {D}^\varepsilon \cap \mathscr {S}_0\). Nevertheless, the requirement \(\phi \in \mathscr {D}^\varepsilon \cap \mathscr {S}_0\) is imposed in order to reveal the order of convergence for the EM scheme above. By applying the cutoff approach and refining the argument of [23, Theorem 2.3] (see also Lemma 5.1 below), the boundedness of coefficients can be removed. We herein do not go into details since the corresponding trick is quite similar to the proof of Theorem 1.2.

The outline of this paper is organized as follows: In Sect. 2, we elaborate regularity of non-degenerate Kolmogorov equation, which plays an important role in dealing with convergence rate of EM scheme for non-degenerate SDEs with Hölder–Dini continuous and unbounded drifts; In Sects. 3, 4 and 5, we complete the proofs of Theorems 1.1, 1.2 and 1.4, respectively.

2 Regularity of Non-degenerate Kolmogorov Equation

Let \((e_i)_{i\ge 1}\) be an orthogonal basis of \(\mathbb {R}^n.\) For any \(\lambda >0\), consider the following \(\mathbb {R}^n\)-valued parabolic equation:

$$\begin{aligned} \begin{aligned} \partial _tu_t^\lambda +L_tu_t^\lambda +b_t+\nabla _{b_t}u_t^\lambda =\lambda u_t^\lambda , \quad u_T^\lambda =\mathbf{0_n}, \end{aligned} \end{aligned}$$
(2.1)

where \(\nabla _{b_t}u_t^\lambda \) means the directional derivative along the direction \(b_t\), \(\mathbf{0_n}\) is the zero vector in \(\mathbb {R}^n\) and

$$\begin{aligned} L_{t}:= \frac{1}{2}\sum _{i,j}{\left\langle (\sigma _t\sigma _t^*)(\cdot )e_{i},e_{j}\right\rangle }\nabla _{e_{i}}\nabla _{e_{j}} \end{aligned}$$

with \(\sigma _t^*\) standing for the transpose of \(\sigma _t.\) Let \((P_{s,t}^0)_{0\le s\le t}\) be the semigroup generated by \((Z_t^{s,x})_{0\le s\le t}\) which solves an SDE below

$$\begin{aligned} {\mathrm{d}}Z_t^{s,x}=\sigma _t(Z_t^{s,x}){\mathrm{d}}W_t, \quad t>s,\; Z_s^{s,x}=x. \end{aligned}$$
(2.2)

By the chain rule, it follows from (2.1) that

$$\begin{aligned} \begin{aligned} \partial _t\left( {\mathrm{e}}^{-\lambda (t-s)}P_{s,t}^0u_t^\lambda \right)&={\mathrm{e}}^{-\lambda (t-s)} \left\{ -\lambda P_{s,t}^0u_t^\lambda +P_{s,t}^0L_tu_t^\lambda +P_{s,t}^0 \partial _tu_t^\lambda \right\} \\&=-{\mathrm{e}}^{-\lambda (t-s)}P_{s,t}^0\left\{ b_t+\nabla _{b_t}u_t^\lambda \right\} . \end{aligned} \end{aligned}$$

Thus, integrating from s to T and taking advantage of \(u_T^\lambda =\mathbf{0_n}\), we arrive at

$$\begin{aligned} u_s^\lambda =\int _s^T{\mathrm{e}}^{-\lambda ( t-s)}P_{s,t}^0\left\{ b_t+\nabla _{b_t}u_t^\lambda \right\} {\mathrm{d}}t. \end{aligned}$$
(2.3)

For notation simplicity, let

$$\begin{aligned} \Lambda _{T,\sigma }={\mathrm{e}}^{\frac{T}{2}\Vert \nabla \sigma \Vert _{T,\infty }^2}\Vert \sigma ^{-1}\Vert _{T,\infty } \end{aligned}$$
(2.4)

and

$$\begin{aligned} \begin{aligned} \tilde{\Lambda }_{T,\sigma }&=48{\mathrm{e}}^{288\,T^2\Vert \nabla \sigma \Vert _{T,\infty }^4}\left\{ 6\sqrt{2}{\mathrm{e}}^{T\Vert \nabla \sigma \Vert _{T,\infty }^2}\Vert \sigma ^{-1}\Vert _{T,\infty }^4+T\Vert \nabla \sigma ^{-1}\Vert _{T,\infty }^2\right. \\&\left. \quad +\,2T^2\Vert \nabla ^2\sigma \Vert _{T,\infty }^2\Vert \sigma ^{-1}\Vert _{T,\infty }^2 {\mathrm{e}}^{2T\Vert \nabla \sigma \Vert _{T,\infty }^2} \right\} . \end{aligned} \end{aligned}$$
(2.5)

Moreover, set

$$\begin{aligned} \Upsilon _{T,\sigma }:=\sqrt{\tilde{\Lambda }_{T,\sigma }}\left\{ 3+2\Vert b\Vert _{T,\infty }+ 28\left( \Lambda _{T,\sigma }+ \sqrt{\tilde{\Lambda }_{T,\sigma }}\right) \Vert b\Vert _{T,\infty }^2\right\} . \end{aligned}$$
(2.6)

The lemma below plays a crucial role in investigating error analysis.

Lemma 2.1

Under (A1) and (A2), for any \( \lambda \ge 9\pi \Lambda _{T,\sigma }^2\Vert b\Vert _{T,\infty }^2+4(\Vert b\Vert _{T,\infty }+\Lambda _{T,\sigma })^2, \)

(i):

(2.1) (i.e., (2.3)) enjoys a unique strong solution \(u^\lambda \in C([0,T];C_b^1(\mathbb {R}^n;\mathbb {R}^n))\);

(ii):

\(\Vert \nabla u^\lambda \Vert _{T,\infty }\le \frac{1}{2}\);

(iii):

\( \Vert \nabla ^2 u^\lambda \Vert _{T,\infty }\le \Upsilon _{T,\sigma }\int _0^T\frac{{\mathrm{e}}^{-\lambda t} }{t}\tilde{\phi }(\Vert \sigma \Vert _{T,\infty }\sqrt{t} ){\mathrm{d}}t, \) where \( \tilde{\phi }(s):=\sqrt{\phi ^2(s)+s},\, s\ge 0. \)

Proof

To show (i)–(iii), it boils down to refine the argument of [22, Lemma 2.1]. (i) holds for any \(\lambda \ge 4(\Vert b\Vert _{T,\infty }+\Lambda _{T,\sigma })^2\) via the Banach fixed-point theorem.

In what follows, we aim to show (ii) and (iii) hold true, one-by-one. Observe from [12, Theorem 3.1, p.218] that

$$\begin{aligned} {\mathrm{d}}\nabla _\eta Z_t^{s,x}=\left( \nabla _{\nabla _{\eta }Z_t^{s,x}}\sigma _t\right) \left( Z_t^{s,x}\right) {\mathrm{d}}W_t, \quad t\ge s,\quad \nabla _\eta Z_s^{s,x}=\eta \in \mathbb {R}^n. \end{aligned}$$
(2.7)

Using Itô’s isometry and Gronwall’s inequality, one has

$$\begin{aligned} \mathbb {E}|\nabla _{\eta }Z_t^{s,x}|^2\le |\eta |^2{\mathrm{e}}^{T\Vert \nabla \sigma \Vert _{T,\infty }^2}. \end{aligned}$$
(2.8)

Utilizing the BDG inequality, we deduce that

$$\begin{aligned} \begin{aligned} \mathbb {E}|\nabla _{\eta }Z_t^{s,x}|^4&\le 8\left\{ |\eta |^4+36(t-s)\Vert \nabla \sigma \Vert _{T,\infty }^4\int _s^t\mathbb {E}|\nabla _{\eta }Z_{u}^{s,x}|^4{\mathrm{d}}u\right\} , \end{aligned} \end{aligned}$$

which, combining with Gronwall’s inequality, yields that

$$\begin{aligned} \begin{aligned} \mathbb {E}|\nabla _{\eta }Z_t^{s,x}|^4&\le 8 |\eta |^4{\mathrm{e}}^{288\,T^2\Vert \nabla \sigma \Vert _{T,\infty }^4}. \end{aligned} \end{aligned}$$
(2.9)

Recall from [22, (2.8)] that the following Bismut formula

$$\begin{aligned} \nabla _\eta P_{s,t}^0f(x)=\mathbb {E}\left( \frac{f\left( Z_t^{s,x}\right) }{t-s}\int _s^t\left\langle \sigma _r^{-1}\left( Z_r^{s,x}\right) \nabla _\eta Z_r^{s,x},{\mathrm{d}}W_r\right\rangle \right) ,\quad f\in \mathscr {B}_b(\mathbb {R}^n) \end{aligned}$$
(2.10)

holds. By the Cauchy–Schwartz inequality, the Itô isometry and (2.8), we obtain that

$$\begin{aligned} \begin{aligned} |\nabla _\eta P_{s,t}^0f|^2(x)&\le \frac{\Lambda _{T,\sigma }^2|\eta |^2P_{s,t}^{0}f^{2}(x)}{t-s},\quad f\in \mathscr {B}_b(\mathbb {R}^n), \end{aligned} \end{aligned}$$
(2.11)

where \( \Lambda _{T,\sigma }>0\) is defined in (2.4). So, one infers from (2.3) and (2.11) that

$$\begin{aligned} \begin{aligned} \Vert \nabla u_s^\lambda \Vert&\le \int _s^T{\mathrm{e}}^{-\lambda ( t-s)}\Vert \nabla P_{s,t}^0\{b_t+\nabla _{b_t}u_t^\lambda \}\Vert {\mathrm{d}}t\\&\le \Lambda _{T,\sigma }\left( 1+\Vert \nabla u^\lambda \Vert _{T,\infty }\right) \Vert b\Vert _{T,\infty } \int _0^T\frac{{\mathrm{e}}^{-\lambda t}}{\sqrt{t}}{\mathrm{d}}t\\&\le \,\lambda ^{-\frac{1}{2}}\sqrt{\pi }\Lambda _{T,\sigma }\Vert b\Vert _{T,\infty }(1+\Vert \nabla u^\lambda \Vert _{T,\infty }). \end{aligned} \end{aligned}$$

Thus, (ii) follows by taking \(\lambda \ge 9\pi \Lambda _{T,\sigma }^2\Vert b\Vert _{T,\infty }^2\).

In the sequel, we intend to verify (iii). Set \(\gamma _{s,t}:=\nabla _{\eta }\nabla _{\eta ^{\prime }}Z_t^{s,x}\) for any \(\eta ,\eta '\in \mathbb {R}^n\). Notice from (2.7) that

$$\begin{aligned} {\mathrm{d}}\gamma _{s,t}=\left\{ \left( \nabla _{\gamma _{s,t}}\sigma _t\right) \left( Z_{t}^{s,x}\right) + \left( \nabla _{\nabla _{\eta }Z_t^{s,x}}\nabla _{\nabla _{\eta ^{\prime }} Z_t^{s,x}}\sigma _t\right) \left( Z_t^{s,x}\right) \right\} {\mathrm{d}}W_t, \quad t\ge s,\; \gamma _{s,s}=\mathbf{0_n}. \end{aligned}$$

By the Doob submartingale inequality and the Itô isometry, besides the Gronwall inequality and (2.8), we derive that

$$\begin{aligned} \begin{aligned} \sup _{s\le t\le T}\mathbb {E}|\gamma _{s,t}|^2&\le 16T\Vert \nabla ^2\sigma \Vert _{T,\infty }^2 {\mathrm{e}}^{288T^2 \Vert \nabla \sigma \Vert _{T,\infty }^4+2T\Vert \nabla \sigma \Vert _{T,\infty }^2}|\eta |^2|\eta ^{\prime }|^2. \end{aligned} \end{aligned}$$
(2.12)

From (2.10) and the Markov property, we have

$$\begin{aligned} \begin{aligned}&\nabla _\eta P_{s,t}^0f(x) =\mathbb {E}\left( \frac{\left( P_{\frac{t+s}{2},t}^0f\right) \left( Z^{s,x}_{\frac{t+s}{2}}\right) }{(t-s)/2}\int _s^{\frac{t+s}{2}}\left\langle \sigma ^{-1}_r\left( Z_r^{s,x}\right) \nabla _\eta Z_r^{s,x},{\mathrm{d}}W_r\right\rangle \right) . \end{aligned} \end{aligned}$$

This further gives that

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\left( \nabla _{\eta '}\nabla _\eta P_{s,t}^0f\right) (x)\\&\quad =\mathbb {E}\left( \frac{\left( \nabla _{\nabla _{\eta '}Z^{s,x}_{\frac{t+s}{2}}} P_{\frac{t+s}{2},t}^0f\right) \left( Z^{s,x}_{\frac{t+s}{2}}\right) }{t-s}\int _s^{\frac{t+s}{2}}\left\langle \sigma _r^{-1}\left( Z_r^{s,x}\right) \nabla _\eta Z_r^{s,x},{\mathrm{d}}W_r\right\rangle \right) \\&\quad +\,\mathbb {E}\left( \frac{\left( P_{\frac{t+s}{2},t}^0f\right) \left( Z^{s,x}_{\frac{t+s}{2}}\right) }{t-s}\int _s^{\frac{t+s}{2}} \left\langle \left( \nabla _{\nabla _{\eta '}Z^{s,x}_r} \sigma _r^{-1}\right) \left( Z_r^{s,x}\right) \nabla _\eta Z_r^{s,x},{\mathrm{d}}W_r\right\rangle \right) \\&\quad +\mathbb {E}\left( \frac{\left( P_{\frac{t+s}{2},t}^0f\right) \left( Z^{s,x}_{\frac{t+s}{2}}\right) }{t-s}\int _s^{\frac{t+s}{2}}\left\langle \sigma _r^{-1}\left( Z_r^{s,x}\right) \nabla _{\eta '}\nabla _\eta Z_r^{s,x},{\mathrm{d}}W_r\right\rangle \right) . \end{aligned} \end{aligned}$$

Thus, applying Cauchy–Schwartz’s inequality and Itô’s isometry and taking (2.9), (2.11) and (2.12) into consideration, we derive that

$$\begin{aligned}&|\nabla _{\eta '}\nabla _\eta P_{s,t}^0f|^2(x)\nonumber \\&\quad \le 12\left\{ 6\Vert \sigma ^{-1}\Vert _{T,\infty }^2\frac{\mathbb {E}\left| \nabla P_{\frac{t+s}{2},t}^0f\right| ^2\left( Z^{s,x}_{\frac{t+s}{2}}\right) }{(t-s)^{5/2}}\right. \nonumber \\&\quad \quad \times \left( \mathbb {E}\left| {\nabla _{\eta '}Z^{s,x}_{\frac{t+s}{2}}}\right| ^4\right) ^{1/2} \left( \int _s^{\frac{t+s}{2}}\mathbb {E}\left| \nabla _\eta Z_r^{s,x}\right| ^4{\mathrm{d}}r\right) ^{1/2}\nonumber \\&\quad \quad +\,\frac{P_{s,t}^{0}f^{2}(x)}{(t-s)^2}\Vert \nabla \sigma ^{-1}\Vert _{T,\infty }^2 \int _s^{\frac{t+s}{2}}\left( \mathbb {E}|\nabla _{\eta '}Z^{s,x}_r|^4\right) ^{1/2} \left( \mathbb {E}|\nabla _{\eta }Z^{s,x}_r|^4\right) ^{1/2}{\mathrm{d}}r\nonumber \\&\left. \quad \quad +\,\frac{P_{s,t}^{0}f^{2}(x)}{(t-s)^2}\Vert \sigma ^{-1}\Vert _{T,\infty }^2 \int _s^{\frac{t+s}{2}}\mathbb {E}\left| \nabla _{\eta '}\nabla _\eta Z_r^{s,x}\right| ^2{\mathrm{d}}r\right\} \nonumber \\&\quad \le \tilde{\Lambda }_{T,\sigma }|\eta |^2|\eta ^{\prime }|^2\frac{P_{s,t}^{0} f^{2}(x)}{(t-s)^2}, \end{aligned}$$
(2.13)

where \(\tilde{\Lambda }_{T,\sigma }>0\) is defined as in (2.5).

Set \(\tilde{f}(\cdot ):=f(\cdot )-f(x)\) for fixed \(x\in \mathbb {R}^n\) and \(f\in \mathscr {B}_b(\mathbb {R}^n)\) which verifies

$$\begin{aligned} |f(x)-f(y)|\le \phi (|x-y|),\quad x,y\in \mathbb {R}^n \end{aligned}$$
(2.14)

for some \(\phi \in {\mathscr {D}}.\) For \(f\in \mathscr {B}_b(\mathbb {R}^n)\) such that (2.14), (2.13) implies that

$$\begin{aligned} \begin{aligned} |\nabla _{\eta '}\nabla _\eta P_{s,t}^0f|^2(x)=|\nabla _{\eta '}\nabla _\eta P_{s,t}^0\tilde{f}|^2(x)&\le \frac{\tilde{\Lambda }_{T,\sigma }|\eta |^{2}|\eta ^{\prime }|^{2}}{(t-s)^2}\mathbb {E}|f(Z^{s,x}_t)-f(x)|^2\\&\le \frac{\tilde{\Lambda }_{T,\sigma }|\eta |^{2}|\eta ^{\prime }|^{2}}{(t-s)^2}\phi ^2(\Vert \sigma \Vert _{T,\infty }(t-s)^{1/2}), \end{aligned} \end{aligned}$$
(2.15)

where in the second display we have used that

$$\begin{aligned} Z^{s,x}_t-x=\int _s^t\sigma _r(Z^{s,x}_r){\mathrm{d}}W_r, \end{aligned}$$

and utilized Jensen’s inequality as well as Itô’s isometry.

Let \(f_t=b_t+\nabla _{b_t}u_t^\lambda \). For any \( \lambda \ge 9\pi \Lambda _{T,\sigma }^2\Vert b\Vert _{T,\infty }^2+4(\Vert b\Vert _{T,\infty }+\Lambda _{T,\sigma })^2, \) note from (ii), (2.11) and (2.13) that

$$\begin{aligned} |f_t(x)-f_t(y)|\le & {} (1+\Vert \nabla u^\lambda \Vert _{T,\infty })\phi (|x-y|)\\&+\Vert b\Vert _{T,\infty }\Vert \nabla u_t^\lambda (x)-\nabla u_t(y)\Vert \mathbf{\mathbf {1}}_{\{|x-y|\ge 1\}}\\&+\Vert b\Vert _{T,\infty }\Vert \nabla u_t^\lambda (x)-\nabla u_t(y)\Vert \mathbf{{1}}_{\{|x-y|\le 1\}}\\\le & {} \frac{3}{2}\phi (|x-y|)+\Vert b\Vert _{T,\infty } \sqrt{|x-y|}{} \mathbf{{1}}_{\{|x-y|\ge 1\}}\\&+10\left( \Lambda _{T,\sigma }+\sqrt{\tilde{\Lambda }_{T,\sigma }}\right) \Vert b\Vert _{T,\infty }^2\sqrt{|x-y|}\sqrt{|x-y|}\\&\times \log \left( {\mathrm{e}}+\frac{1}{|x-y|}\right) \mathbf{{1}}_{\{|x-y|\le 1\}}\\\le & {} \left\{ 3+2\Vert b\Vert _{T,\infty }+ 28\left( \Lambda _{T,\sigma }+ \sqrt{\tilde{\Lambda }_{T,\sigma }}\right) \Vert b\Vert _{T,\infty }^2\right\} \tilde{\phi }(|x-y|) \end{aligned}$$

with \( \tilde{\phi }(s):=\sqrt{\phi ^2(s)+s},\,s\ge 0, \) where in the second inequality we have used [22, Lemma 2.2 (1)], and the fact that the function \([0,1]\ni x\mapsto \sqrt{x}\log ({\mathrm{e}}+\frac{1}{x})\) is non-decreasing. As a result, (iii) follows from (2.15). \(\square \)

3 Proof of Theorem 1.1

With Lemma 2.1 in hand, we now in the position to complete the

Proof of Theorem 1.1

Throughout the whole proof, we assume \( \lambda \ge 9\pi \Lambda _{T,\sigma }^2\Vert b\Vert _{T,\infty }^2+4(\Vert b\Vert _{T,\infty }+\Lambda _{T,\sigma })^2\) so that (i)–(iii) in Lemma 2.1 hold. For any \(t\in [0,T]\), applying Itô’s formula to \(x+u_t^\lambda (x),x\in \mathbb {R}^n\), we deduce from (2.1) that

$$\begin{aligned} \begin{aligned} X_t+u_t^\lambda (X_t)= x+u_0^\lambda (x)+\lambda \int _0^tu_s^\lambda (X_s){\mathrm{d}}s+\int _0^t\{\mathbf{I_{n\times n}}+(\nabla u_s^\lambda )(\cdot )\}(X_s)\sigma _s(X_s){\mathrm{d}}W_s, \end{aligned} \end{aligned}$$
(3.1)

where \(\mathbf{I_{n\times n}}\) is an \(n\times n\) identity matrix, and that

$$\begin{aligned} \begin{aligned} Y_t+u_t^\lambda (Y_t)&=x+u_0^\lambda (x)+\lambda \int _0^tu_s^\lambda (Y_s){\mathrm{d}}s +\int _0^t\{\mathbf{I_{n\times n}}+(\nabla u_s^\lambda )(\cdot )\}(Y_s){\sigma _{s_\delta } (Y_{s_\delta }){\mathrm{d}}W_s}\\&\quad +\int _0^t\{\mathbf{I_{n\times n}}+(\nabla u_s^\lambda )(\cdot )\}(Y_s)\{b_{s_\delta }(Y_{s_\delta })-b_s(Y_s)\}{\mathrm{d}}s\\&\quad +\frac{1}{2}\int _0^t\sum _{k,j}{\left\langle \{ (\sigma _{s_\delta }\sigma ^{*}_{s_\delta })(Y_{s_\delta })-(\sigma _s\sigma ^{*}_s)( Y_s)\}e_{k},e_{j}\right\rangle } (\nabla _{e_{k}}\nabla _{e_{j}}u_s^\lambda )(Y_s){\mathrm{d}}s. \end{aligned} \end{aligned}$$
(3.2)

For notation simplicity, set

$$\begin{aligned} M^\lambda _t:=X_t-Y_t+u^\lambda _t(X_t)-u^\lambda _t(Y_t). \end{aligned}$$
(3.3)

Using the elementary inequality: \( (a+b)^2\le (1+\varepsilon )(a^2+\varepsilon ^{-1}b^2)\) for arbitrary \(\varepsilon ,a,b>0, \) we derive from (ii) that

$$\begin{aligned} \begin{aligned} |X_t-Y_t|^2&\le (1+\varepsilon )(|M_t^\lambda |^2+\varepsilon ^{-1}|u^\lambda _t(X_t)-u^\lambda _t(Y_t)|^2)\\&\le (1+\varepsilon )\left( |M_t^\lambda |^2+\frac{\varepsilon ^{-1}}{4}|X_t-Y_t|^2\right) . \end{aligned} \end{aligned}$$

In particular, taking \(\varepsilon =1\) leads to

$$\begin{aligned} |X_t-Y_t|^2 \le \frac{1}{2}|X_t-Y_t|^2+2|M_t^\lambda |^2. \end{aligned}$$

As a consequence,

$$\begin{aligned} \mathbb {E}\left( \sup _{0\le s\le t}|X_s-Y_s|^2\right) \le 4\mathbb {E}\left( \sup _{0\le s\le t}|M_s^\lambda |^2\right) . \end{aligned}$$
(3.4)

In what follows, our goal is to estimate the term on the right-hand side of (3.4). Observe from the definition of the Hilbert–Schmidt norm that

$$\begin{aligned}&\int _0^t\mathbb {E}\left| \sum _{k,j}{\langle \{ (\sigma _{s_\delta }\sigma ^{*}_{s_\delta })( Y_{s_\delta })-(\sigma _s\sigma ^{*}_s)( Y_s)\}e_{k},e_{j}\rangle } (\nabla _{e_{k}}\nabla _{e_{j}}u_s^\lambda )(Y_s)\right| ^2{\mathrm{d}}s\nonumber \\&\quad \lesssim _T\Vert \nabla ^2 u^\lambda \Vert _{T,\infty }^2\int _0^t\mathbb {E}\Vert (\sigma _{s_\delta }\sigma ^{*}_{s_\delta })( Y_{s_\delta })-(\sigma _s\sigma ^{*}_s)( Y_s)\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s. \end{aligned}$$
(3.5)

Thus, by Hölder’s inequality, Doob’s submartingale inequality and Itô’s isometry, it follows from (3.1), (3.2) and (3.5) that

$$\begin{aligned} \mathbb {E}\left( \sup _{0\le s\le t}|M_s^\lambda |^2\right)&\le C_T\left\{ \lambda ^2\int _0^t\mathbb {E}|u_s^\lambda (X_s)-u_s^\lambda (Y_s)|^2{\mathrm{d}}s\right. \\&\quad +(1+\Vert \nabla u\Vert _{T,\infty }^2)\int _0^t\mathbb {E}|b_{s_\delta }(Y_s)-b_{s_\delta }(Y_{s_\delta })|^2{\mathrm{d}}s\\&\quad +(1+\Vert \nabla u\Vert _{T,\infty }^2)\int _0^t\mathbb {E}|b_s(Y_s)-b_{s_\delta }(Y_s)|^2{\mathrm{d}}s\\&\quad +\int _0^t\mathbb {E}\Vert \{(\nabla u_s^\lambda )(X_s)-(\nabla u_s^\lambda )(Y_s)\}\sigma _s(X_s)\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s\\&\quad +(1+\Vert \nabla u\Vert _{T,\infty }^2)\int _0^t\mathbb {E}\Vert \sigma _{s_\delta }(X_s)-\sigma _{s_\delta } ( Y_{s_\delta })\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s\\&\quad +\Vert \nabla ^2 u^\lambda \Vert _{T,\infty }^2\int _0^t\mathbb {E}\Vert \{ \sigma _{s_\delta }(Y_s)-\sigma _{s_\delta }( Y_{s_\delta } )\}\sigma ^{*}_{s_\delta }( Y_{s_\delta } )\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s\\&\quad +\Vert \nabla ^2 u^\lambda \Vert _{T,\infty }^2\int _0^t\mathbb {E}\Vert \sigma _s(Y_s)\{\sigma ^{*}_{s_\delta }(Y_s)-\sigma ^{*}_{s_\delta }( Y_{s_\delta } )\}\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s\\&\quad +(1+\Vert \nabla u\Vert _{T,\infty }^2)\int _0^t\mathbb {E}\Vert \sigma _s(X_s)-\sigma _{s_\delta } ( X_s)\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s\\&\quad +\Vert \nabla ^2 u^\lambda \Vert _{T,\infty }^2\int _0^t\mathbb {E}\Vert \sigma _s(Y_s)\{\sigma ^{*}_s(Y_s)-\sigma ^{*}_{s_\delta }( Y_s )\}\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s\\&\left. \quad +\Vert \nabla ^2 u^\lambda \Vert _{T,\infty }^2\int _0^t\mathbb {E}\Vert \{ \sigma _s(Y_s)-\sigma _{s_\delta }( Y_s )\}\sigma ^{*}_{s_\delta }( Y_{s_\delta } )\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s\right\} \\&=:C_T\left( \sum _{i=1}^{10}I_i(t)\right) \end{aligned}$$

for some constant \(C_T>0.\) Also, applying Hölder’s inequality and Itô’s isometry, we deduce from (A1) that

$$\begin{aligned} \mathbb {E}|Y_t- Y_{t_\delta }|^2\le \beta _T\delta \end{aligned}$$
(3.6)

for some constant \(\beta _T\ge 1.\) By Taylor’s expansion, it is obvious to see that

$$\begin{aligned} I_1(t)+I_4(t)\lesssim \{\lambda ^2\Vert \nabla u^\lambda \Vert _{T,\infty }^2+\Vert \nabla ^2 u^\lambda \Vert _{T,\infty }^2\Vert \sigma \Vert _{T,\infty }^2\}\int _0^t\mathbb {E}|X_s-Y_s|^2{\mathrm{d}}s. \end{aligned}$$
(3.7)

From (A3) and due to the fact that \(\phi (\cdot ) \) is increasing and \(\delta \in (0,1),\) one has

$$\begin{aligned} I_3(t)+\sum _{i=8}^{10}I_i(t)\lesssim _T\{1+\Vert \nabla u^\lambda \Vert _{T,\infty }^2+\Vert \nabla ^2 u^\lambda \Vert _{T,\infty }^2\Vert \sigma \Vert _{T,\infty }^2\}\phi ^2(\sqrt{\delta }). \end{aligned}$$
(3.8)

In view of (A2), we derive that

$$\begin{aligned} \begin{aligned}&I_2(t)+\sum _{i=5}^7I_i(t)\\&\quad \lesssim \{1+\Vert \nabla u^\lambda \Vert _{T,\infty }^2\}\int _0^t\mathbb {E}\phi (|Y_s-Y_{s_\delta }|)^2{\mathrm{d}}s\\&\quad +\{1+\Vert \nabla u^\lambda \Vert _{T,\infty }^2\}\Vert \nabla \sigma \Vert _{T,\infty }^2\int _0^t\mathbb {E}|X_s-Y_s|^2{\mathrm{d}}s\\&\quad +\{1+\Vert \nabla u^\lambda \Vert _{T,\infty }^2+\Vert \nabla ^2 u^\lambda \Vert _{T,\infty }^2\Vert \sigma \Vert _{T,\infty }^2\}\Vert \nabla \sigma \Vert _{T,\infty }^2\int _0^t\mathbb {E}|Y_s-Y_{s_\delta }|^2{\mathrm{d}}s. \end{aligned} \end{aligned}$$
(3.9)

Thus, taking (3.6)–(3.9) into account and applying Jensen’s inequality gives that

$$\begin{aligned}\begin{aligned} \mathbb {E}\left( \sup _{0\le s\le t}|M_s^\lambda |^2\right)&\lesssim _T C_{T,\sigma ,\lambda }\{\delta +\phi ^2(\beta _T\sqrt{\delta })\}+ C_{T,\sigma ,\lambda }\int _0^t\mathbb {E}|X_s-Y_s|^2{\mathrm{d}}s, \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} C_{T,\sigma ,\lambda }:=\{1+\Vert \nabla \sigma \Vert _{T,\infty }^2\}\left\{ \frac{5}{4} +(1+\lambda ^2)\Vert \nabla ^2 u^\lambda \Vert _{T,\infty }^2\Vert \sigma \Vert _{T,\infty }^2\right\} . \end{aligned} \end{aligned}$$
(3.10)

Owing to \(\phi \in \mathscr {D},\) we conclude that \(\phi (0)=0,\)\(\phi '>0\) and \(\phi ''<0\) so that, for any \(c>0\) and \(\delta \in (0,1),\)

$$\begin{aligned} \phi (c\delta )=\phi (0)+\phi '(\xi )c\delta \ge \phi '(c)c\delta , \end{aligned}$$

where \(\xi \in (0,c\delta ).\) This further implies that

$$\begin{aligned} \begin{aligned} \mathbb {E}\left( \sup _{0\le s\le t}|M_s^\lambda |^2\right)&\lesssim _T C_{T,\sigma ,\lambda }\phi ^2(\beta _T\sqrt{\delta })+ C_{T,\sigma ,\lambda }\int _0^t\mathbb {E}|X_s-Y_s|^2{\mathrm{d}}s. \end{aligned} \end{aligned}$$

Substituting this into (3.4) gives that

$$\begin{aligned} \begin{aligned} \mathbb {E}\left( \sup _{0\le s\le t}|X_s-Y_s|^2\right)&\lesssim _T C_{T,\sigma ,\lambda }\phi ^2(\beta _T\sqrt{\delta })+ C_{T,\sigma ,\lambda }\int _0^t\mathbb {E}|X_s-Y_s|^2{\mathrm{d}}s. \end{aligned} \end{aligned}$$

Thus, Gronwall’s inequality implies that there exists \(\tilde{C}_T>0\) such that

$$\begin{aligned} \begin{aligned} \mathbb {E}\left( \sup _{0\le s\le t}|X_s-Y_s|^2\right)&\le \tilde{C}_T C_{T,\sigma ,\lambda }{\mathrm{e}}^{\tilde{C}_T C_{T,\sigma ,\lambda }}\phi ^2(\beta _T\sqrt{\delta }). \end{aligned} \end{aligned}$$
(3.11)

So the desired assertion holds immediately. \(\square \)

4 Proof of Theorem 1.2

We shall adopt the cutoff approach to finish the

Proof of Theorem 1.2

Take \(\psi \in C_b^\infty (\mathbb {R}_+)\) such that \(0\le \psi \le 1\), \(\psi (r)=1\) for \(r\in [0,1]\) and \(\psi (r)=0\) for \(r\ge 2\). For any \(t\in [0,T]\) and \(k\ge 1\), define the cutoff functions

$$\begin{aligned} b^{(k)}_t(x)=b_t(x)\psi (|x|/k)\quad \text{ and }\quad \sigma ^{(k)}_t(x)=\sigma _t(\psi (|x|/k)x),\quad x\in \mathbb {R}^n. \end{aligned}$$

It is easy to see that \(b^{(k)}\) and \(\sigma ^{(k)}\) satisfy (A1). For fixed \(k\ge 1,\) consider the following SDE

$$\begin{aligned} {\mathrm{d}}X^{(k)}_t=b^{(k)}_t(X^{(k)}_t){\mathrm{d}}t+\sigma ^{(k)}_t(X^{(k)}_t){\mathrm{d}}W_t,\quad t>0,\; X^{(k)}_0=X_0=x. \end{aligned}$$
(4.1)

The corresponding continuous-time EM of (4.1) is defined by

$$\begin{aligned} {\mathrm{d}}Y_t^{(k)}=b_{t_\delta }^{(k)}( Y_{t_\delta }^{(k)}){\mathrm{d}}t+\sigma _{t_\delta }^{(k)} ( Y_{t_\delta }^{(k)}){\mathrm{d}}W_t,\quad t>0,\; Y_0^{(k)}=X_0=x. \end{aligned}$$
(4.2)

Applying BDG’s inequality, Hölder’s inequality and Gronwall’s inequality, we deduce from (A1’) that

$$\begin{aligned} \mathbb {E}\left( \sup _{0\le t\le T}|X_t|^4\right) \,+\,\mathbb {E}\left( \sup _{0\le t\le T}|Y_t|^4\right) \,+\,\mathbb {E}\left( \sup _{0\le t\le T}|X_t^{(k)}|^4\right) \,+\,\mathbb {E}\left( \sup _{0\le t\le T}|Y_t^{(k)}|^4\right) \le C_T \end{aligned}$$
(4.3)

for some constant \(C_T>0\). Note that

$$\begin{aligned} \begin{aligned} \mathbb {E}\left( \sup _{0\le t\le T}|X_t-Y_t|^2\right)&\le 3\,\mathbb {E}\left( \sup _{0\le t\le T}|X_t-X_t^{(k)}|^2\right) +3\,\mathbb {E}\left( \sup _{0\le t\le T}|X_t^{(k)}-Y_t^{(k)}|^2\right) \\&\quad +3\,\mathbb {E}\left( \sup _{0\le t\le T}|Y_t-Y_t^{(k)}|^2\right) \\&=:I_1+I_2+I_3. \end{aligned} \end{aligned}$$

For the terms \(I_1\) and \(I_3\), in terms of the Chebyshev inequality we find from (4.3) that

$$\begin{aligned} \begin{aligned} I_1+I_3&\lesssim \mathbb {E}\left( \sup _{0\le t\le T}|X_t-X_t^{(k)}|^2\mathbf{{1}}_{\{\sup _{0\le t\le T}|X_t|\ge k\}}\right) \\&\quad +\mathbb {E}\left( \sup _{0\le t\le T}|Y_t-Y_t^{(k)}|^2\mathbf{{1}}_{\{\sup _{0\le t\le T}|Y_t|\ge k\}}\right) \\&\lesssim \sqrt{\mathbb {E}\left( \sup _{0\le t\le T}|X_t|^4\right) +\mathbb {E}\left( \sup _{0\le t\le T}|X_t^{(k)}|^4\right) }\frac{\sqrt{\mathbb {E}\left( \sup _{0\le t\le T}|X_t|^2\right) }}{k}\\&\quad +\sqrt{\mathbb {E}\left( \sup _{0\le t\le T}|Y_t|^4\right) +\mathbb {E}\left( \sup _{0\le t\le T}|Y_t^{(k)}|^4\right) }\frac{\sqrt{\mathbb {E}\left( \sup _{0\le t\le T}|Y_t|^2\right) }}{k}\\&\lesssim _T\frac{1}{k}, \end{aligned} \end{aligned}$$
(4.4)

where in the first display we have used the facts that \(\{X_t\ne X_t^{(k)}\}\subset \{\sup _{0\le s\le t}|X_s|\ge k\}\) and \(\{Y_t\ne Y_t^{(k)}\}\subset \{\sup _{0\le s\le t}|Y_s|\ge k\}\). Observe from (A1’) that \( 9\pi \Lambda _{T,\sigma ^{(k)}}^2\Vert b^{(k)}\Vert _{T,\infty }^2+4(\Vert b^{(k)}\Vert _{T,\infty }+\Lambda _{T,\sigma ^{(k)}})^2\le {\mathrm{e}}^{c\,k^2} \) for some \(c>0.\) Next, according to (3.11), by taking \(\lambda ={\mathrm{e}}^{c\,k^2}\) there exits \(C_T>0\) such that

$$\begin{aligned} I_2\le {\mathrm{e}}^{C_T C_{T,\sigma ^{(k)},\lambda }}\phi _k^2(\beta _T\sqrt{\delta }). \end{aligned}$$
(4.5)

Herein, \( C_{T,\sigma ^{(k)},\lambda }>0\) is defined as in (3.10) with \(\sigma \) and \(u^\lambda \) replaced by \(\sigma ^{(k)}\) and \(u^{\lambda ,k}\), respectively, where \(u^{\lambda ,k}\) solves (2.3) by writing \(b^{(k)}\) instead of b. Consequently, we conclude that

$$\begin{aligned} \mathbb {E}\left( \sup _{0\le t\le T}|X_t-Y_t|^2\right) \le \frac{\bar{c}_0}{k}+\bar{c}_0{\mathrm{e}}^{C_T C_{T,\sigma ^{(k)},\lambda }}\phi _k^2(\beta _T\sqrt{\delta }) \end{aligned}$$
(4.6)

for some \(\bar{c}_0>0.\) For any \(\varepsilon >0\), taking \(k=\lfloor 2\bar{c}_0/\varepsilon \rfloor \) and letting \(\delta \) go to zero implies that

$$\begin{aligned} \lim _{\delta \rightarrow 0}\mathbb {E}\left( \sup _{0\le t\le T}|X_t-Y_t|^2\right) \le \varepsilon . \end{aligned}$$

Thus, (1.4) follows due to the arbitrariness of \(\varepsilon \).

For \(\phi _k(s)={\mathrm{e}}^{{\mathrm{e}}^{c_0k^4}}s^{\alpha },s\ge 0,\) with \(\alpha \in (0,1],\) we deduce from Lemma 2.1 (iii) that

$$\begin{aligned} \Vert \nabla ^2 u^{\lambda ,k}\Vert _{T,\infty }\le \frac{1}{2} \end{aligned}$$
(4.7)

whenever

$$\begin{aligned} \begin{aligned} \lambda&\ge \left\{ 2\Upsilon _{T,\sigma ^{(k)}}\left( {\mathrm{e}}^{{\mathrm{e}}^{c_0k^4}}\Vert \sigma ^{(k)}\Vert _{T,\infty }^\alpha \Gamma (\alpha /2)+\Vert \sigma ^{(k)}\Vert _{T,\infty }^{1/2}\Gamma (1/4)\right) \right\} ^{2/\alpha }\\&\quad +9\pi (\Lambda _{T,\sigma ^{(k)}})^2\Vert b^{(k)}\Vert _{T,\infty }^2+4(\Vert b^{(k)}\Vert _{T,\infty }+\Lambda _{T,\sigma ^{(k)}})^2. \end{aligned} \end{aligned}$$
(4.8)

Since the right-hand side of (4.8) can be bounded by \({\mathrm{e}}^{{\mathrm{e}}^{\bar{C}_T\,k^4}}\) for some constant \(\bar{C}_T>0\) due to (A1’), we can take \(\lambda ={\mathrm{e}}^{{\mathrm{e}}^{\bar{C}_T\,k^4}}\) so that (4.7) holds. Thus, (4.6), together with (4.7) and (A1’), yields that

$$\begin{aligned} \mathbb {E}\left( \sup _{0\le t\le T}|X_t-Y_t|^2\right) \le \frac{\hat{C}_T}{k}+\hat{C}_T{\mathrm{e}}^{{\mathrm{e}}^{ \tilde{C}_Tk^4}}\delta ^\alpha \end{aligned}$$

for some constants \(\hat{C}_T,\tilde{C}_T>0\). Thus, (1.5) follows immediately by taking

$$\begin{aligned} k=\left\lfloor \left( \frac{1}{\tilde{C}_T}\log \log \delta ^{-\alpha \varepsilon }\right) ^{\frac{1}{4}}\right\rfloor . \end{aligned}$$
(4.9)

Next, we aim to show that (1.6) holds true. In view of (4.3) and (4.4), it follows from Hölder’s inequality that

$$\begin{aligned} \begin{aligned} I_1+I_3&\lesssim \sqrt{\mathbb {E}\left( \sup _{0\le t\le T}|X_t-X_t^{(k)}|^4\right) }\sqrt{\mathbb {P}\left( \sup _{0\le t\le T}|X_t|\ge k \right) }\\&\quad +\sqrt{\mathbb {E}\left( \sup _{0\le t\le T}|Y_t-Y_t^{(k)}|^4\right) }\sqrt{\mathbb {P}\left( \sup _{0\le t\le T}|Y_t|\ge k\right) }\\&\lesssim _T\sqrt{\mathbb {P}\left( \sup _{0\le t\le T}|X_t|\ge k \right) }+\sqrt{\mathbb {P}\left( \sup _{0\le t\le T}|Y_t|\ge k\right) }. \end{aligned} \end{aligned}$$
(4.10)

By (A1’), we infer that

$$\begin{aligned} \begin{aligned} \sup _{0\le s\le t}|Y_s|&\le |x|+K_T T+\sup _{0\le s\le t}|N_t|+K_T\int _0^t| Y_{s_\delta }|{\mathrm{d}}s\\&\le |x|+K_T T+\sup _{0\le s\le t}|N_t|+K_T\int _0^t\sup _{0\le r\le s}| Y_r|{\mathrm{d}}s \end{aligned} \end{aligned}$$
(4.11)

where

$$\begin{aligned} N_t:=\int _0^t\sigma _{s_\delta }(Y_{s_\delta }){\mathrm{d}}W_s. \end{aligned}$$

Thus, Gronwall’s inequality enables us to get that

$$\begin{aligned} \sup _{0\le s\le t}|Y_s|\le (|x|+K_T T){\mathrm{e}}^{K_TT}+{\mathrm{e}}^{K_TT}\sup _{0\le s\le t} |N_s |. \end{aligned}$$

For any integer \(k\ge 1\) such that

$$\begin{aligned} \rho := k {\mathrm{e}}^{-K_TT} -|x|-K_T T>0, \end{aligned}$$

we derive from (4.11) that

$$\begin{aligned} \mathbb {P}\left( \sup _{0\le t\le T}|Y_t|\ge k\right) =\mathbb {P}\left( \sup _{0\le t\le T} |N_t |\ge \rho \right) . \end{aligned}$$

This, by taking advantage of [19, Proposition 6.8], yields that

$$\begin{aligned} \begin{aligned} \mathbb {P}\left( \sup _{0\le t\le T}|Y_t|\ge k\right)&=\mathbb {P}\left( \left\langle N\right\rangle _T\le \Vert \sigma \Vert _\infty ^2T, \sup _{0\le t\le T} |N_t |\ge \rho \right) \\&\le 2n\exp \left( -\frac{\rho ^2}{2n\Vert \sigma \Vert _{T,\infty }^2T}\right) , \end{aligned} \end{aligned}$$
(4.12)

where \(\left\langle N\right\rangle _t\) stands for the quadratic variation process of \(N_t\). Next, by using the inequality: \((a-b)^2\ge \frac{1}{2}a^2-b^2,a,b\in \mathbb {R},\) we deduce from (4.12) that

$$\begin{aligned} \begin{aligned} \mathbb {P}\left( \sup _{0\le t\le T}|Y_t|\ge k\right) \le 2n\exp \left( \frac{(|x|+K_T T)^2}{2n\Vert \sigma \Vert _{T,\infty }^2T}\right) \exp \left( -\frac{k^2}{4n\Vert \sigma \Vert _{T,\infty }^2T{\mathrm{e}}^{2K_TT}}\right) . \end{aligned} \end{aligned}$$
(4.13)

Similarly, one can obtain that

$$\begin{aligned} \begin{aligned} \mathbb {P}\left( \sup _{0\le t\le T}|X_t|\ge k\right) \le 2n\exp \left( \frac{(|x|+K_T T)^2}{2n\Vert \sigma \Vert _{T,\infty }^2T}\right) \exp \left( -\frac{k^2}{4n\Vert \sigma \Vert _{T,\infty }^2T{\mathrm{e}}^{2K_TT}}\right) . \end{aligned} \end{aligned}$$
(4.14)

Inserting (4.13) and (4.14) back into (4.10) leads to

$$\begin{aligned} I_1+I_3 \lesssim _T\exp \left( -\frac{k^2}{2n\Vert \sigma \Vert _{T,\infty }^2T{\mathrm{e}}^{2K_TT}}\right) . \end{aligned}$$

This, together with (4.5), (4.7) and (A1’), gives that

$$\begin{aligned} \mathbb {E}\left( \sup _{0\le t\le T}|X_t-Y_t|^2\right) \le \hat{C}_T\exp \left( -\frac{k^2}{2n\Vert \sigma \Vert _{T,\infty }^2T{\mathrm{e}}^{2K_TT}}\right) +\hat{C}_T{\mathrm{e}}^{{\mathrm{e}}^{ \tilde{C}_Tk^4}}\delta ^\alpha \end{aligned}$$

for some constants \(\hat{C}_T,\tilde{C}_T>0\). As a consequence, (1.6) follows by taking k given in (4.9). \(\square \)

5 Proof of Theorem 1.4

For simplicity, for any \(f:\mathbb {R}^{m_1}\rightarrow \mathbb {R}^{m_2}\), let

$$\begin{aligned} {[}f]_{L}=\sup _{x\ne y}\frac{|f(x)-f(y)|}{|x-y|}, \quad \Vert f\Vert _\infty =\sup _{x\in \mathbb {R}^{m_1}}|f(x)|. \end{aligned}$$

The proof of Theorem 1.4 relies on regularization properties of the following \(\mathbb {R}^{2n}\)-valued degenerate parabolic equation

$$\begin{aligned} \partial _t u_t^\lambda +\mathscr {L}_t^{b,\sigma } u_t^\lambda +b_t=\lambda u_t^\lambda ,\quad u_T^\lambda =\mathbf{0_{2n}},\quad t\in [0,T],\quad \lambda >0, \end{aligned}$$
(5.1)

where \(\mathbf{0_{2n}}\) is the zero vector in \(\mathbb {R}^{2n}\),

The following lemma on regularity estimate of solution to (5.1) is taken from [23, Theorem 3.10, (4.4)] and is an essential ingredient in analyzing numerical approximation.

Lemma 5.1

Under (C1)–(C3), (5.1) has a unique solution \(u^\lambda \in C([0,T];C_b^1(\mathbb {R}^{2n};\mathbb {R}^{2n}))\) such that for all \(t\in [0,T],\)

$$\begin{aligned} \begin{aligned} \Vert \nabla u_t^\lambda \Vert _{\infty }&+\Vert \nabla ^{(2)}\nabla ^{(2)}u_t^\lambda \Vert _{\infty } +[\nabla ^{(2)}u_t]_{L} \le C\int _0^T{\mathrm{e}}^{-\lambda t}\frac{\phi (t^{\frac{1}{2}})}{t}{\mathrm{d}}t, \end{aligned} \end{aligned}$$
(5.2)

where \(C>0\) is a constant.

From now on, we move forward to complete the

Proof of Theorem 1.4

For notation simplicity, set

Then (1.7) and (1.8) can be reformulated, respectively, as

where \(\mathbf{0_{n\times n}}\) is an \(n\times n\) zero matrix, and

Note from (5.2) that there exists \(\lambda _0>0\) sufficiently large such that for any \(t\in [0,T]\),

$$\begin{aligned} \Vert \nabla u_t^\lambda \Vert _{\infty }+\Vert \nabla ^{(2)}\nabla ^{(2)}u_t^\lambda \Vert _{\infty }+[\nabla ^{(2)}u_t^\lambda ]_{L}\le \frac{1}{2},\quad \lambda \ge \lambda _0. \end{aligned}$$
(5.3)

Applying Itô’s formula to \(x+u_t^\lambda (x)\) for any \( x\in \mathbb {R}^{2n}\), we deduce that

(5.4)

and that

(5.5)

where \(\mathbf{I}_{\mathbf{2n\times 2n}}\) is an \(2n\times 2n\) identity matrix. Thus, using Hölder’s inequality, Doob’s submartingale inequality and Itô’s isometry and taking (3.5) into consideration gives that

$$\begin{aligned} \mathbb {E}\left( \sup _{0\le s\le t}|M_s^\lambda |^2\right)&\le C_{0,T}\left\{ \int _0^t\mathbb {E}|u_s^\lambda (X_s)-u_s^\lambda (Y_s)|^2{\mathrm{d}}s\right. \\&\quad +\left( 1+\Vert \nabla u^\lambda \Vert _{T,\infty }^2\right) \int _0^t\mathbb {E}|b_{s_\delta }(Y_s)-b_{s_\delta }(Y_{s_\delta })|^2{\mathrm{d}}s\\&\quad +\left( 1+\Vert \nabla u^\lambda \Vert _{T,\infty }^2\right) \int _0^t\mathbb {E}|b_s(Y_s)-b_{s_\delta }(Y_s)|^2{\mathrm{d}}s\\&\quad +\int _0^t\mathbb {E}\Vert \{(\nabla ^{(2)} u_s^\lambda )(X_s)-\nabla ^{(2)} u_s^\lambda (Y_s)\}\sigma _s(X_s)\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s\\&\quad +\left( 1+\Vert \nabla ^{(2)} u^\lambda \Vert _{T,\infty }^2\right) \int _0^t\mathbb {E}\Vert \sigma _{s_\delta }(X_s)-\sigma _{s_\delta } ( Y_{s_\delta })\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s\\&\quad +(1+\Vert \nabla ^{(2)} u^\lambda \Vert _{T,\infty }^2)\int _0^t\mathbb {E}\Vert \sigma _s(X_s)-\sigma _{s_\delta } ( X_s)\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s\\&\quad +\Vert \nabla ^{(2)}\nabla ^{(2)} u^\lambda \Vert _{T,\infty }^2\int _0^t\mathbb {E}\Vert \{ \sigma _{s_\delta }(Y_s)-\sigma _{s_\delta }( Y_{s_\delta } )\}\sigma ^{*}_{s_\delta }( Y_{s_\delta } )\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s\\&\quad +\Vert \nabla ^{(2)}\nabla ^{(2)} u^\lambda \Vert _{T,\infty }^2\int _0^t\mathbb {E}\Vert \sigma _s(Y_s)\{\sigma ^{*}_{s_\delta }(Y_s)-\sigma ^{*}_{s_\delta }( Y_{s_\delta } )\}\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s\\&\quad +\Vert \nabla ^{(2)}\nabla ^{(2)} u^\lambda \Vert _{T,\infty }^2\int _0^t\mathbb {E}\Vert \sigma _s(Y_s)\{\sigma ^{*}_s(Y_s)-\sigma ^{*}_{s_\delta }( Y_s )\}\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s\\&\left. \quad +\Vert \nabla ^{(2)}\nabla ^{(2)}u^\lambda \Vert _{T,\infty }^2\int _0^t\mathbb {E}\Vert \{ \sigma _s(Y_s)-\sigma _{s_\delta }( Y_s )\}\sigma ^{*}_{s_\delta }( Y_{s_\delta } )\Vert ^2_{\mathrm {HS}}{\mathrm{d}}s \right\} \\&=:C_{0,T}\left( \sum _{i=1}^{10}J_i(t)\right) \end{aligned}$$

for some constant \(C_{0,T}>0\), where \(M_t^\lambda \) is defined as in (3.3). By using Hölder’s inequality and the BDG inequality, (C1) implies that

$$\begin{aligned} \mathbb {E}|Y_t-Y_{t_\delta }|^p\lesssim \delta ^{\frac{p}{2}},\quad p\ge 1. \end{aligned}$$
(5.6)

Utilizing Taylor’s expansion, one gets from (3.6), (5.3) and (5.6) that

$$\begin{aligned} \begin{aligned} J_1(t)+J_4(t)+J_5(t)&\lesssim \left\{ 1+\Vert \nabla u^\lambda \Vert _{T,\infty }^2+\Vert \nabla \nabla ^{(2)}u^\lambda \Vert _{T,\infty }^2\Vert \sigma \Vert _{T,\infty }^2\right\} \int _0^t\mathbb {E}|X_s-Y_s|^2{\mathrm{d}}s\\&\quad +\{1+\Vert \nabla ^{(2)} u^\lambda \Vert _{T,\infty }^2\}\int _0^t\mathbb {E}|Y_s-Y_{s_\delta }|^2{\mathrm{d}}s\\&\lesssim \delta +\int _0^t\mathbb {E}|X_s-Y_s|^2{\mathrm{d}}s. \end{aligned} \end{aligned}$$

Next, (C1), (C5) and (5.3) yield that

$$\begin{aligned} J_3(t)+J_6(t)+J_9(t)+J_{10}(t)\lesssim \phi ^2(\sqrt{\delta }), \end{aligned}$$

where we have also used that \(\phi (\cdot )\) is increasing and \(\delta \in (0,1).\) Additionally, by virtue of (C1), (C2), and (5.3), we infer from (C3) that

$$\begin{aligned} J_2(t)+J_7(t)+J_8(t)&\lesssim \delta +\int _0^t\mathbb {E}|b_{s_\delta }\left( Y_s^{(1)},Y_s^{(2)}\right) -b_{s_\delta }\left( Y_{s_\delta }^{(1)},Y_s^{(2)}\right) |^2{\mathrm{d}}s\\&\quad + \int _0^t\mathbb {E}\left| b_{s_\delta }\left( Y_{s_\delta }^{(1)},Y_s^{(2)}\right) -b_{s_\delta }\left( Y_{s_\delta }^{(1)},Y_{s_\delta }^{(2)}\right) \right| ^2{\mathrm{d}}s\\&\le C_{1,T}\left\{ \delta +\int _0^t\mathbb {E}\left| b_{s_\delta }^{(1)}(Y_s^{(1)},Y_s^{(2)})-b_{s_\delta }^{(1)}(Y_{s_\delta }^{(1)},Y_s^{(2)})\right| ^2{\mathrm{d}}s\right. \\&\quad +\int _0^t\mathbb {E}\left| b_{s_\delta }^{(2)}(Y_s^{(1)},Y_s^{(2)})-b_{s_\delta }^{(2)}(Y_{s_\delta }^{(1)},Y_s^{(2)})\right| ^2{\mathrm{d}}s\\&\quad + \int _0^t\mathbb {E}\left| b_{s_\delta }^{(1)}(Y_{s_\delta }^{(1)},Y_s^{(2)})-b_{s_\delta }^{(1)}(Y_{s_\delta }^{(1)},Y_{s_\delta }^{(2)})\right| ^2{\mathrm{d}}s\\&\left. \quad + \int _0^t\mathbb {E}\left| b_{s_\delta }^{(2)}(Y_{s_\delta }^{(1)},Y_s^{(2)})-b_{s_\delta }^{(2)}(Y_{s_\delta }^{(1)},Y_{s_\delta }^{(2)})\right| ^2{\mathrm{d}}s\right\} \\&=:C_{1,T}\left( \delta +\sum _{i=1}^4\Lambda _i(t)\right) \end{aligned}$$

for some constant \(C_{1,T}>0\). From (C2), (C3), (5.6) and \(\phi \in \mathscr {D}^\varepsilon \), we derive from Hölder’s inequality and Jensen’s inequality that

$$\begin{aligned} \begin{aligned} \Lambda _1(t)+\Lambda _2(t)&\lesssim \sum _{i=1}^2\int _0^t \mathbb {E}\left( \frac{\left| b_{s_\delta }^{(i)}(Y_s^{(1)},Y_s^{(2)}) -b_{s_\delta }^{(i)}(Y_{s_\delta }^{(1)},Y_s^{(2)})\right| }{\left| Y_s^{(1)}-Y_{s_\delta }^{(1)}\right| ^{\frac{2}{3}}\phi \left( \left| Y_s^{(1)}-Y_{s_\delta }^{(1)}\right| \right) }\mathbf {1}_{\{Y_s^{(1)}\ne Y_{s_\delta }^{(1)}\}}\right. \\&\left. \quad \times \left| Y_s^{(1)}-Y_{s_\delta }^{(1)}\right| ^{\frac{2}{3}}\phi \left( \left| Y_s^{(1)}-Y_{s_\delta }^{(1)}\right| \right) \right) ^2{\mathrm{d}}s\\&\lesssim \int _0^t\mathbb {E}\left( \left| Y_s^{(1)}-Y_{s_\delta }^{(1)}\right| ^{\frac{2}{3}}\phi \left( \left| Y_s^{(1)}-Y_{s_\delta }^{(1)}\right| \right) \right) ^2{\mathrm{d}}s\\&\lesssim \int _0^t\left( \mathbb {E}\phi \left( \left| Y_s^{(1)}-Y_{s_\delta }^{(1)}\right| \right) ^{2(1+\varepsilon )}\right) ^{\frac{1}{1+\varepsilon }}\left( \mathbb {E}\left| Y_s^{(1)}-Y_{s_\delta }^{(1)}\right| ^{\frac{4(1+\varepsilon )}{3\varepsilon }}\right) ^{\frac{\varepsilon }{1+\varepsilon }}{\mathrm{d}}s\\&\lesssim \delta ^{\frac{2}{3}} \phi ^2(C_{2,T}\sqrt{\delta }) \end{aligned} \end{aligned}$$
(5.7)

for some constant \(C_{2,T}>0.\) With regard to the term \(\Lambda _3(t)\), (C1) and (5.6) lead to

$$\begin{aligned} \Lambda _3(t)\lesssim \Vert \nabla ^{(2)}b^{(1)}\Vert _{T,\infty }^2\int _0^t\mathbb {E}\left| Y_s^{(1)}-Y_{s_\delta }^{(1)}\right| ^2{\mathrm{d}}s\lesssim \delta . \end{aligned}$$
(5.8)

Due to (C3), observe from Jensen’s inequality and (5.6) that

$$\begin{aligned} \begin{aligned} \Lambda _4(t)&\lesssim \int _0^t\mathbb {E}\left( \frac{|b_{s_\delta }^{(2)}(Y_{s_\delta }^{(1)},Y_s^{(2)})-b_{s_\delta }^{(2)} (Y_{s_\delta }^{(1)},Y_{s_\delta }^{(2)})|}{\phi (|Y_s^{(2)}-Y_{s_\delta }^{(2)}|)}\mathbf{1}_{\{Y_s^{(2)}\ne Y_{s_\delta }^{(2)}\}}\times \phi (|Y_s^{(2)}-Y_{s_\delta }^{(2)}|)\right) ^2{\mathrm{d}}s\\&\lesssim \int _0^t\mathbb {E}\phi (|Y_s^{(2)}-Y_{s_\delta }^{(2)}|)^2{\mathrm{d}}s\\&\lesssim \phi ^2(C_{3,T}\sqrt{\delta }) \end{aligned} \end{aligned}$$

for some constant \( C_{3,T}>0.\) Consequently, we arrive at

$$\begin{aligned} \begin{aligned} \mathbb {E}\left( \sup _{0\le s\le t}|X_s-Y_s|^2\right) \lesssim _T \phi ^2(C_{4,T}\sqrt{\delta }) +\int _0^t\mathbb {E}\sup _{0\le r\le s}|X_r-Y_r|^2{\mathrm{d}}s \end{aligned} \end{aligned}$$

for some constant \(C_{4,T}\ge 1\). Thus, the desired assertion follows from the Gronwall inequality. \(\square \)