Paper The following article is Open access

Consistency of Bayesian inference with Gaussian process priors in an elliptic inverse problem

and

Published 20 August 2020 © 2020 The Author(s). Published by IOP Publishing Ltd
, , Citation Matteo Giordano and Richard Nickl 2020 Inverse Problems 36 085001 DOI 10.1088/1361-6420/ab7d2a

0266-5611/36/8/085001

Abstract

For $\mathcal{O}$ a bounded domain in ${\mathbb{R}}^{d}$ and a given smooth function $g:\mathcal{O}\to \mathbb{R}$, we consider the statistical nonlinear inverse problem of recovering the conductivity f > 0 in the divergence form equation $\nabla \cdot \left(f\nabla u\right)=g\;\;\mathrm{o}\mathrm{n}\;\mathcal{O},u=0\mathrm{o}\mathrm{n}\;\partial \mathcal{O},$ from N discrete noisy point evaluations of the solution u = uf on $\mathcal{O}$. We study the statistical performance of Bayesian nonparametric procedures based on a flexible class of Gaussian (or hierarchical Gaussian) process priors, whose implementation is feasible by MCMC methods. We show that, as the number N of measurements increases, the resulting posterior distributions concentrate around the true parameter generating the data, and derive a convergence rate Nλ, λ > 0, for the reconstruction error of the associated posterior means, in ${L}^{2}\left(\mathcal{O}\right)$-distance.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Statistical inverse problems arise naturally in many applications in physics, imaging, tomography, and generally in engineering and throughout the sciences. A prototypical example involves a domain $\mathcal{O}\subset {\mathbb{R}}^{d}$, some function $f:\mathcal{O}\to \mathbb{R}$ of interest, and indirect measurements G(f) of f, where G is a given solution (or 'forward') operator of some partial differential equation (PDE) governed by the unknown coefficient f. A natural statistical observational model postulates data

Equation (1)

where the Xi's are design points at which the PDE solution G(f) is measured, and where the Wi's are standard Gaussian noise variables scaled by a noise level σ > 0. The aim is then to infer f from the data ${\left({Y}_{i},{X}_{i}\right)}_{i=1}^{N}$. The study of problems of this type has a long history in applied mathematics, see the monographs (Engl et al 1996, Kaltenbacher et al 2008), although explicit statistical noise models have been considered only more recently (Bissantz et al 2004, Bissantz et al 2007, Hohage and Pricop 2008, Kaipio and Somersalo 2004). Recent survey articles on the subject are (Arridge et al 2019, Benning and Burger 2018) where many more references can be found.

For many of the most natural PDEs—such as the divergence form elliptic equation (2) considered below—the resulting maps G are non-linear in f, and this poses various challenges: among other things, the negative log-likelihood function associated to the model (1), which equals the least squares criterion (see (10) below for details), is then possibly non-convex, and commonly used statistical algorithms (such as maximum likelihood estimators, Tikhonov regularisers or MAP estimates) defined as optimisers in f of likelihood-based objective functions can not reliably be computed by standard convex optimisation techniques. While iterative optimisation methods (such as Landweber iteration) may overcome such challenges (Kaltenbacher et al 2008, Hanke et al 1995, Kaltenbacher et al 2009, Qi-nian 2000), an attractive alternative methodology arises from the Bayesian approach to inverse problems advocated in an influential paper by Stuart (Stuart 2010): one starts from a Gaussian process prior Π for the parameter f or in fact, as is often necessary, for a suitable vector-space valued re-parameterisation F of f. One then uses Bayes' theorem to infer the best posterior guess for f given data ${\left({Y}_{i},{X}_{i}\right)}_{i=1}^{N}$. Posterior distributions and their expected values can be approximately computed via Markov chain Monte Carlo (MCMC) methods (see, for example, Beskos et al 2017, Conrad et al 2016, Cotter et al 2013 and references therein) as soon as the forward map G(⋅) can be evaluated numerically, avoiding optimisation algorithms as well as the use of (potentially tedious, or non-existent) inversion formulas for G−1; see subsection 2.3.1 below for more discussion. The Bayesian approach has been particularly popular in application areas as it does not only deliver an estimator for the unknown parameter f but simultaneously provides uncertainty quantification methodology for the recovery algorithm via the probability distribution of $f\vert {\left({Y}_{i},{X}_{i}\right)}_{i=1}^{N}$ (see, for example, Dashti and Stuart 2016). Conceptually related is the area of 'probabilistic numerics' (Briol et al 2019) in the noise-less case σ = 0, with key ideas dating back to work by Diaconis (1988).

As successful as this approach may have proved to be in algorithmic practice, for the case when the forward map G is non-linear we currently only have a limited understanding of the statistical validity of such Bayesian inversion methods. By validity we mean here statistical guarantees for convergence of natural Bayesian estimators such as the posterior mean $\overline{f}={E}^{{\Pi}}\left[f\vert {\left({Y}_{i},{X}_{i}\right)}_{i=1}^{N}\right]$ towards the ground truth f0 generating the data. Without such guarantees, the interpretation of posterior based inferences remains vague: the randomness of the prior may have propagated into the posterior in a way that does not 'wash out' even when very informative data is available (e.g., small noise variance and/or large sample size N), rendering Bayesian methods potentially ambiguous for the purposes of valid statistical inference and uncertainty quantification.

In the present article we attempt to advance our understanding of this problem area in the context of the following basic but representative example for a non-linear inverse problem: let g be a given smooth 'source' function, and let $f:\mathcal{O}\to \mathbb{R}$ be a an unknown conductivity parameter determining solutions u = uf of the PDE

Equation (2)

where we denote by ∇⋅ the divergence and by ∇ the gradient operator, respectively. Under mild regularity conditions on f, and assuming that fKmin > 0 on $\mathcal{O}$, standard elliptic theory implies that (2) has a unique classical C2-solution G(f) ≡ uf. Identification of f from an observed solution uf of this PDE has been considered in a large number of articles both in the applied mathematics and statistics communities—we mention here (Stuart 2010, Beskos et al 2017, Dashti and Stuart 2016, Briol et al 2019, Alessandrini 1986, Bonito et al 2017, Dashti and Stuart 2011, Falk 1983, Hoffmann and Sprekels 1985, Ito and Kunisch 1994, Knowles 2001, Kohn and Lowe 1988, Kravaris and Seinfeld 1985, Nickl et al 2020, Richter 1981, Schwab and Stuart 2012, Vollmer 2013) and the many references therein.

The main contributions of this article are as follows: we show that posterior means arising from a large class of Gaussian (or conditionally Gaussian) process priors for f provide statistically consistent recovery (with explicit polynomial convergence rates as the number N of measurements increases) of the unknown parameter f in (2) from data in (1). While we employ the theory of posterior contraction from Bayesian non-parametric statistics (Ghosal and van der Vaart 2017, van der Vaart and van Zanten 2008, van der Vaart and van Zanten 2009), the non-linear nature of the problem at hand leads to substantial additional challenges arising from the fact that (a) the Hellinger distance induced by the statistical experiment is not naturally compatible with relevant distances on the actual parameter f and that (b) the 'push-forward' prior induced on the information-theoretically relevant regression functions G(f) is non-explicit (in particular, non-Gaussian) due to the non-linearity of the map G. Our proofs apply recent ideas from Monard et al (2020) to the present elliptic situation. In the first step we show that the posterior distributions arising from the priors considered (optimally) solve the PDE-constrained regression problem of inferring G(f) from data (1). Such results can then be combined with a suitable 'stability estimate' for the inverse map G−1 to show that, for large sample size N, the posterior distributions concentrate around the true parameter generating the data at a convergence rate Nλ for some λ > 0. We ultimately deduce the same rate of consistency for the posterior mean from quantitative uniform integrability arguments.

The first results we obtain apply to a large class of 'rescaled' Gaussian process priors similar to those considered in Monard et al (2020), addressing the need for additional a priori regularisation of the posterior distribution in order to tame non-linear effects of the 'forward map'. This rescaling of the Gaussian process depends on sample size N. From a non-asymptotic point of view this just reflects an adjustment of the covariance operator of the prior, but following Diaconis (1988) one may wonder whether a 'fully Bayesian' solution of this non-linear inverse problem, based on a prior that does not depend on N, is also possible. We show indeed that a hierarchical prior that randomises a finite truncation point in the Karhunen–Loéve-type series expansion of the Gaussian base prior will also result in consistent recovery of the conductivity parameter f in equation (2) from data (1), at least if f is smooth enough.

Let us finally discuss some related literature on statistical guarantees for Bayesian inversion: to the best of our knowledge, the only previous paper concerned with (frequentist) consistency of Bayesian inversion in the elliptic PDE (2) is by Vollmer (2013). The proofs in Vollmer (2013) share a similar general idea in that they rely on a preliminary treatment of the associated regression problem for G(f), which is then combined with a suitable stability estimate for G−1. However, the convergence rates obtained in Vollmer (2013) are only implicitly given and sub-optimal, also (unlike ours) for 'prediction risk' in the PDE-constrained regression problem. Moreover, when specialised to the concrete non-linear elliptic problem (2) considered here, the results in section 4 in Vollmer (2013) only hold for priors with bounded Cβ-norms, such as 'uniform wavelet type priors', similar to the ones used in Nickl (2018), Nickl and Söhl (2017), and Nickl and Söhl (2019) for different non-linear inverse problems. In contrast, our results hold for the more practical Gaussian process priors which are commonly used in applications, and which permit the use of tailor-made MCMC methodology—such as the pCN algorithm discussed in subsection 2.3.1—for computation.

The results obtained in Nickl et al (2020) for the maximum a posteriori (MAP) estimates associated to the priors studied here are closely related to our findings in several ways. Ultimately the proof methods in Nickl et al (2020) are, however, based on variational methods and hence entirely different from the Bayesian ideas underlying our results. Moreover, the MAP estimates in Nickl et al (2020) are difficult to compute due to the lack of convexity of the forward map, whereas posterior means arising from Gaussian process priors admit explicit computational guarantees, see Hairer et al (2014) and also subsection 2.3.1 for more details.

It is further of interest to compare our results to those recently obtained in Abraham and Nickl (2019), where the statistical version of the Caldéron problem is studied. There the 'Dirichlet-to-Neumann map' of solutions to the PDE (2) is observed, corrupted by appropriate Gaussian matrix noise. In this case, as only boundary measurements of uf at $\partial \mathcal{O}$ are available, the statistical convergence rates are only of order logγ(N) for some γ > 0 (as N), whereas our results show that when interior measurements of uf are available throughout $\mathcal{O}$, the recovery rates improve to Nλ for some λ > 0.

There is of course a large literature on consistency of Bayesian linear inverse problems with Gaussian priors, we only mention Agapiou et al (2013), Kekkonen et al (2016), Knapik et al (2011), Monard et al (2019), and Ray (2013) and references therein. The non-linear case considered here is fundamentally more challenging and cannot be treated by the techniques from these papers—however, some of the general theory we develop in the appendix provides novel proof methods also for the linear setting.

This paper is structured as follows. Section 2 contains all the main results for the inverse problem arising with the PDE model (2). The proofs, which also include some theory for general non-linear inverse problems that is of independent interest, are given in section 3 and appendix A. Finally, appendix B provides additional details on some facts used throughout the paper.

2. Main results

2.1. A statistical inverse problem with elliptic PDEs

2.1.1. Main notation

Throughout the paper, $\mathcal{O}\subset {\mathbb{R}}^{d},\;d\in \mathbb{N}$, is a given nonempty open and bounded set with smooth boundary $\partial \mathcal{O}$ and closure $\overline{\mathcal{O}}$.

The spaces of continuous functions defined on $\mathcal{O}$ and $\overline{\mathcal{O}}$ are respectively denoted $C\left(\mathcal{O}\right)$ and $C\left(\overline{\mathcal{O}}\right)$, and endowed with the supremum norm ||⋅||. For positive integers $\beta \in \mathbb{N}$, ${C}^{\beta }\left(\mathcal{O}\right)$ is the space of β-times differentiable functions with uniformly continuous derivatives; for non-integer β > 0, ${C}^{\beta }\left(\mathcal{O}\right)$ is defined as

where ⌊β⌋ denotes the largest integer less than or equal to β, and for any multi-index i = (i1, ..., id), Di is the ith partial differential operator. ${C}^{\beta }\left(\mathcal{O}\right)$ is normed by

where the second summand is removed for integer β. We denote by ${C}^{\infty }\left(\mathcal{O}\right)={\cap }_{\beta }{C}^{\beta }\left(\mathcal{O}\right)$ the set of smooth functions, and by ${C}_{c}^{\infty }\left(\mathcal{O}\right)$ the subspace of elements in ${C}^{\infty }\left(\mathcal{O}\right)$ with compact support contained in $\mathcal{O}$.

Denote by ${L}^{2}\left(\mathcal{O}\right)$ the Hilbert space of square integrable functions on $\mathcal{O}$, equipped with its usual inner product ${\langle \cdot ,\cdot \rangle }_{{L}^{2}\left(\mathcal{O}\right)}$. For integer α ⩾ 0, the order-α Sobolev space on $\mathcal{O}$ is the separable Hilbert space

For non-integer α ⩾ 0, ${H}^{\alpha }\left(\mathcal{O}\right)$ can be defined by interpolation, for example, Lions and Magenes (1972). For any α ⩾ 0, ${H}_{c}^{\alpha }\left(\mathcal{O}\right)$ will denote the completion of ${C}_{c}^{\infty }\left(\mathcal{O}\right)$ with respect to the norm ${\Vert}\cdot {{\Vert}}_{{H}^{\alpha }\left(\mathcal{O}\right)}$. Finally, if K is a nonempty compact subset of $\mathcal{O}$, we denote by ${H}_{K}^{\alpha }\left(\mathcal{O}\right)$ the closed subspace of functions in ${H}^{\alpha }\left(\mathcal{O}\right)$ with support contained in K. Whenever there is no risk of confusion, we will omit the reference to the underlying domain $\mathcal{O}$.

Throughout, we use the symbols ≲ and ≳ for inequalities holding up to a universal constant. Also, for two real sequences (aN) and (bN), we say that aNbN if both aNbN and bNaN for all N large enough. For a sequence of random variables ZN we write ZN = OPr(aN) if for all ɛ > 0 there exists Mɛ < such that for all N large enough, Pr(|ZN| ⩾ MɛaN) < ɛ. Finally, we will denote by $\mathcal{L}\left(Z\right)$ the law of a random variable Z.

2.1.2. Parameter spaces and link functions

Let $g\in {C}^{\infty }\left(\mathcal{O}\right)$ be an arbitrary source function, which will be regarded as fixed throughout. For $f\in {C}^{\beta }\left(\mathcal{O}\right),\;\beta { >}1$, consider the boundary value problem

Equation (3)

If we assume that fKmin > 0 on $\mathcal{O}$, then standard elliptic theory (e.g. Gilbarg and Trudinger (1998)) implies that (3) has a classical solution $G\left(f\right)\equiv {u}_{f}\in C\left(\overline{\mathcal{O}}\right)\cap {C}^{1+\beta }\left(\mathcal{O}\right)$.

We consider the following parameter space for f: for integer α > 1 + d/2, Kmin ∈ (0, 1), and denoting by n = n(x) the outward pointing normal at $x\in \partial \mathcal{O}$, let

Equation (4)

Our approach will be to place a prior probability measure on the unknown conductivity f and base our inference on the posterior distribution of f given noisy observations of G(f), via Bayes' theorem. It is of interest to use Gaussian process priors. Such probability measures are naturally supported in linear spaces (in our case ${H}_{c}^{\alpha }\left(\mathcal{O}\right)$) and we now introduce a bijective re-parametrisation so that the prior for f is supported in the relevant parameter space ${\mathcal{F}}_{\alpha ,{K}_{\mathrm{min}}}$. We follow the approach of using regular link functions Φ as in Nickl et al (2020).

Condition 1. For given Kmin > 0, let ${\Phi}:\mathbb{R}\to \left({K}_{\mathrm{min}},\infty \right)$ be a smooth, strictly increasing bijective function such that Φ(0) = 1, ${{\Phi}}^{\prime }\left(t\right){ >}0,t\in \mathbb{R}$, and assume that all derivatives of Φ are bounded on $\mathbb{R}$.

For some of the results to follow it will prove convenient to slightly strengthen the previous condition.

Condition 2. Let Φ be as in condition 1, and assume furthermore that Φ' is nondecreasing and that lim inft→−Φ'(t)ta > 0 for some a > 0.

For a = 2, an example of such a link function is given in example 24 below. Note however that the choice of Φ = exp is not permitted in either condition.

Given any link function Φ satisfying condition 1, one can show (cf. Nickl et al (2020), section 3.1) that the set ${\mathcal{F}}_{\alpha ,{K}_{\mathrm{min}}}$ in (4) can be realised as the family of composition maps

We then regard the solution map associated to (3) as one defined on ${H}_{c}^{\alpha }$ via

Equation (5)

where G(Φ◦F) is the solution to (3) now with $f={\Phi}{\circ}F\in {\mathcal{F}}_{\alpha ,{K}_{\mathrm{min}}}$. In the results to follow, we will implicitly assume a link function Φ to be given and fixed, and understand the re-parametrised solution map $\mathcal{G}$ as being defined as in (5) for such choice of Φ.

2.1.3. Measurement model

Define the uniform distribution on $\mathcal{O}$ by $\mu =\mathrm{d}x/\text{vol}\left(\mathcal{O}\right)$, where dx is the Lebesgue measure and $\text{vol}\left(\mathcal{O}\right)={\int }_{\mathcal{O}}\;\mathrm{d}x$, and consider random design variables

Equation (6)

For unknown $f\in {\mathcal{F}}_{\alpha ,{K}_{\mathrm{min}}}$, we model the statistical errors under which we observe the corresponding measurements ${\left\{G\left(f\right)\left({X}_{i}\right)\right\}}_{i=1}^{N}$ by i.i.d. Gaussian random variables WiN(0, 1), all independent of the Xi's. Using the re-parameterisation f = Φ◦F via a given link function from the previous subsection, the observation scheme is then

Equation (7)

where σ > 0 is the noise amplitude. We will often use the shorthand notation ${Y}^{\left(N\right)}={\left({Y}_{i}\right)}_{i=1}^{N}$, with analogous definitions for X(N) and W(N). The random vectors (Yi, Xi) on $\mathbb{R}{\times}\mathcal{O}$ are then i.i.d with laws denoted as ${P}_{F}^{i}$. Writing dy for the Lebesgue measure on $\mathbb{R}$, it follows that ${P}_{F}^{i}$ has Radon–Nikodym density

Equation (8)

We will write ${P}_{F}^{N}={\otimes }_{i=1}^{N}{P}_{F}^{i}$ for the joint law of (Y(N), X(N)) on ${\mathbb{R}}^{N}{\times}{\mathcal{O}}^{N}$, with ${E}_{F}^{i}$, ${E}_{F}^{N}$ the expectation operators corresponding to the laws ${P}_{F}^{i}$, ${P}_{F}^{N}$ respectively. In the sequel we sometimes use the notation ${P}_{f}^{N}$ instead of ${P}_{F}^{N}$ when convenient.

2.1.4. The Bayesian approach

In the Bayesian approach one models the parameter $F\in {H}_{c}^{\alpha }\left(\mathcal{O}\right)$ by a Borel probability measure Π supported in the Banach space $C\left(\mathcal{O}\right)$. Since the map (F, (y, x)) ↦ pF(y, x) can be shown to be jointly measurable, the posterior distribution Π(⋅|Y(N), X(N)) of F|(Y(N), X(N)) arising from data in model (7) equals, by Bayes' formula (p 7, Ghosal and van der Vaart 2017),

Equation (9)

where

Equation (10)

is (up to an additive constant) the joint log-likelihood function.

2.2. Statistical convergence rates

In this section we will show that the posterior distribution arising from certain priors concentrates near any sufficiently regular ground truth F0 (or, equivalently, f0), and provide a bound on the rate of this contraction, assuming the observation (Y(N), X(N)) to be generated through model (7) of law ${P}_{{F}_{0}}^{N}$. We will regard σ > 0 as a fixed and known constant; in practice it may be replaced by the estimated sample variance of the Yi's.

The priors we will consider are built around a Gaussian process base prior Π', but to deal with the non-linearity of the inverse problem, some additional regularisation will be required. We first show how this can be done by an N-dependent 'rescaling' step as suggested in Monard et al (2020). We then further show that a randomised truncation of a Karhunen–Loeve-type series expansion of the base prior also leads to a consistent, 'fully Bayesian' solution of this inverse problem.

2.2.1. Results with re-scaled Gaussian priors

We will freely use terminology from the basic theory of Gaussian processes and measures, see, for example, Giné and Nickl (2016), chapter 2 for details.

Condition 3. Let α > 1 + d/2, β ⩾ 1, and let $\mathcal{H}$ be a Hilbert space continuously imbedded into ${H}_{c}^{\alpha }\left(\mathcal{O}\right)$. Let Π' be a centred Gaussian Borel probability measure on the Banach space $C\left(\mathcal{O}\right)$ that is supported on a separable measurable linear subspace of ${C}^{\beta }\left(\mathcal{O}\right)$, and assume that the reproducing-kernel Hilbert space (RKHS) of Π' equals $\mathcal{H}$.

As a basic example of a Gaussian base prior Π' satisfying condition 3, consider a Whittle–Matérn process $M=\left\{M\left(x\right),\;x\in \mathcal{O}\right\}$ indexed by $\mathcal{O}$ and of regularity α (cf. example 25 below for full details). We will assume that it is known that ${F}_{0}\in {H}^{\alpha }\left(\mathcal{O}\right)$ is supported inside a given compact subset K of the domain $\mathcal{O}$, and fix any smooth cut-off function $\chi \in {C}_{c}^{\infty }\left(\mathcal{O}\right)$ such that χ = 1 on K. Then, ${{\Pi}}^{\prime }=\mathcal{L}\left(\chi M\right)$ is supported on the separable linear subspace ${C}^{{\beta }^{\prime }}\left(\mathcal{O}\right)$ of ${C}^{\beta }\left(\mathcal{O}\right)$ for any β < β' < αd/2, and its RKHS $\mathcal{H}=\left\{\chi F,F\in {H}^{\alpha }\left(\mathcal{O}\right)\right\}$ is continuously imbedded into ${H}_{c}^{\alpha }\left(\mathcal{O}\right)$ (and contains ${H}_{K}^{\alpha }\left(\mathcal{O}\right)$). The condition ${F}_{0}\in \mathcal{H}$ that is employed in the following theorems then amounts to the standard assumption that ${F}_{0}\in {H}^{\alpha }\left(\mathcal{O}\right)$ be supported in a strict subset K of $\mathcal{O}$.

To proceed, if Π' is as above and F' ∼ Π', we consider the 're-scaled' prior

Equation (11)

Then ΠN again defines a centred Gaussian prior on $C\left(\mathcal{O}\right)$, and a basic calculation (e.g., exercise 2.6.5 in Giné and Nickl (2016)) shows that its RKHS ${\mathcal{H}}_{N}$ is still given by $\mathcal{H}$ but now with norm

Equation (12)

Our first result shows that the posterior contracts towards F0 in 'prediction'-risk at rate N−(α+1)/(2α+2+d) and that, moreover, the posterior draws possess a bound on their Cβ-norm with overwhelming frequentist probability.

Theorem 4. For fixed integer α > β + d/2, β ⩾ 1, consider the Gaussian prior ΠN in (11) with base prior F' ∼ Π' satisfying condition 3 for RKHS $\mathcal{H}$. Let ΠN(⋅|Y(N), X(N)) be the resulting posterior distribution arising from observations (Y(N), X(N)) in (7), set δN = N−(α+1)/(2α+2+d), and assume ${F}_{0}\in \mathcal{H}$.

Then for any D > 0 there exists L > 0 large enough (depending on σ, F0, D, α, β, as well as on $\mathcal{O},d,g$) such that, as N,

Equation (13)

and for sufficiently large M > 0 (depending on σ, D, α, β)

Equation (14)

Following ideas in Monard et al (2020), we can combine (13) with the regularisation property (14) and a suitable stability estimate for G−1 to show that the posterior contracts about f0 also in L2-risk. We shall employ the stability estimate proved in Nickl et al (2020), lemma 24, which requires the source function g in the base PDE (3) to be strictly positive, a natural condition ensuring injectivity of the map fG(f), see Richter (1981). Denote the push-forward posterior on the conductivities f by

Equation (15)

Theorem 5. Let ΠN(⋅|Y(N), X(N)), δN and F0 be as in theorem 4 for integer β > 1. Let f0 = Φ◦F0 and assume in addition that ${\mathrm{inf}}_{x\in \mathcal{O}}g\left(x\right){\geqslant}{g}_{\mathrm{min}}{ >}0$. Then for any D > 0 there exists L > 0 large enough (depending on $\sigma ,{f}_{0},D,\alpha ,\beta ,\mathcal{O}$, gmin, d) such that, as N,

We note that as the smoothness α of f0 increases, we can employ priors of higher regularity α, β. In particular, if F0C = ∩α>0Hα, we can let the above rate Nλ be as closed as desired to the 'parametric' rate N−1/2.

We conclude this section showing that the posterior mean EΠ[F|Y(N), X(N)] of ΠN(⋅|Y(N), X(N)) converges to F0 at the rate Nλ from theorem 5. We formulate this result at the level of the vector space valued parameter F (instead of for conductivities f), as the most commonly used MCMC algorithms (such as pCN, see subsection 2.3.1) target the posterior distribution of F.

Theorem 6. Under the hypotheses of theorem 5, let ${\overline{F}}_{N}={E}^{{\Pi}}\left[F\vert {Y}^{\left(N\right)},{X}^{\left(N\right)}\right]$ be the (Bochner-)mean of ΠN(⋅|Y(N), X(N)). Then, as N,

Equation (16)

The same result holds for the implied conductivities, that is, for ${\Vert}{\Phi}{\circ}{\overline{F}}_{N}-{f}_{0}{{\Vert}}_{{L}^{2}}$ replacing ${\Vert}{\overline{F}}_{N}-{F}_{0}{{\Vert}}_{{L}^{2}}$, since composition with Φ is Lipschitz.

2.2.2. Extension to high-dimensional Gaussian sieve priors

It is often convenient, for instance for computational reasons as discussed in subsection 2.3.1, to employ 'sieve'-priors that are concentrated on a finite-dimensional approximation of the parameter space supporting the prior. For example a truncated Karhunen–Loeve-type series expansion (or some other discretisation) of the Gaussian base prior Π' is frequently used (Dashti and Stuart 2011, Hairer et al 2014). The theorems of the previous subsection remain valid if the approximation spaces are appropriately chosen.

Let us illustrate this by considering a Gaussian series prior based on an orthonormal basis $\left\{{{\Psi}}_{\ell r},\;\ell {\geqslant}-1, r\in {\mathbb{Z}}^{d}\right\}$ of ${L}^{2}\left({\mathbb{R}}^{d}\right)$ composed of sufficiently regular, compactly supported Daubechies wavelets (see chapter 4 in Giné and Nickl (2016) for details). We assume that ${F}_{0}\in {H}_{K}^{\alpha }\left(\mathcal{O}\right)$ for some $K\subset \mathcal{O}$, and denote by ${\mathcal{R}}_{\ell }$ the set of indices r for which the support of Ψℓr intersects K. Fix any compact ${K}^{\prime }\subset \mathcal{O}$ such that KK', and a cut-off function $\chi \in {C}_{c}^{\infty }\left(\mathcal{O}\right)$ such that χ = 1 on K'. For any real α > 1 + d/2, consider the prior Π'J arising as the law of the Gaussian random sum

Equation (17)

where J = JN is a (deterministic) truncation point to be chosen. Then Π'J defines a centred Gaussian prior that is supported on the finite-dimensional space

Equation (18)

Proposition 7. Consider a prior ΠN as in (11) where now F' ∼ Π'J and $J={J}_{N}\in \mathbb{N}$ is such that 2JN1/(2α+2+d). Let ΠN(⋅|Y(N), X(N)) be the resulting posterior distribution arising from observations (Y(N), X(N)) in (7), and assume ${F}_{0}\in {H}_{K}^{\alpha }\left(\mathcal{O}\right)$. Then the conclusions of theorems 46 remain valid (under the respective hypotheses on α, β, g).

A similar result could be proved for more general Gaussian priors (not of wavelet type), but we refrain from giving these extensions here.

2.2.3. Randomly truncated Gaussian series priors

In this section we show that instead of rescaling Gaussian base priors Π', Π'J in an N-dependent way to attain extra regularisation, one may also randomise the dimensionality parameter J in (17) by a hyper-prior with suitable tail behaviour. While this is computationally somewhat more expensive (by necessitating a hierarchical sampling method, see subsection 2.3.1), it gives a possibly more principled approach to ('fully') Bayesian regularisation in our inverse problem. The theorem below will show that such a procedure is consistent in the frequentist sense, at least for smooth enough F0.

For the wavelet basis and cut-off function χ introduced before (17), we consider again a random (conditionally Gaussian) sum

Equation (19)

where now J is a random truncation level, independent of the random coefficients Fℓr, satisfying the following inequalities:

Equation (20)

When d = 1, a (log-)Poisson random variable satisfies these tail conditions, and for d > 1 such a random variable J can be easily constructed too—see example 28 below.

Our first result in this section shows that the posterior arising from the truncated series prior in (19) achieves (up to a log-factor) the same contraction rate in L2-prediction risk as the one obtained in theorem 4. Moreover, as is expected in light of the results in van der Vaart and van Zanten (2009) and Ray (2013), the posterior adapts to the unknown regularity α0 of F0 when it exceeds the base smoothness level α.

Theorem 8. For any α > 1 + d/2, let Π be the random series prior in (19), and let Π(⋅|Y(N), X(N)) be the resulting posterior distribution arising from observations (Y(N), X(N)) in (7). Then, for each α0α and any ${F}_{0}\in {H}_{K}^{{\alpha }_{0}}\left(\mathcal{O}\right)$, we have that for any D > 0 there exists L > 0 large enough (depending on $\sigma ,{F}_{0},D,\alpha ,\mathcal{O},d,g$) such that, as N,

where ${\xi }_{N}={N}^{-\left({\alpha }_{0}+1\right)/\left(2{\alpha }_{0}+2+d\right)}\mathrm{log}N$. Moreover, for ${\mathcal{H}}_{J}$ the finite-dimensional subspaces in (18) and ${J}_{N}\in \mathbb{N}$ such that ${2}^{{J}_{N}}\simeq {N}^{1/\left(2{\alpha }_{0}+2+d\right)}$, we also have that for sufficiently large M > 0 (depending on D, α)

Equation (21)

We can now exploit the previous result along with the finite-dimensional support of the posterior and again the stability estimate from Nickl et al (2020) to obtain the following consistency theorem for ${F}_{0}\in {H}^{{\alpha }_{0}}$ if α0 is large enough (with a precise bound α0α* given in the proof of lemma 12).

Theorem 9. Let the link function Φ in the definition (5) of $\mathcal{G}$ satisfy condition 2. Let Π(⋅|Y(N), X(N)), ξN be as in theorem 8, assume in addition ggmin > 0 on $\mathcal{O}$, and let $\tilde {{\Pi}}\left(\cdot \vert {Y}^{\left(N\right)},{X}^{\left(N\right)}\right)$ be the posterior distribution of f as in (15). Then for f0 = Φ◦F0 with ${F}_{0}\in {H}_{K}^{{\alpha }_{0}}\left(\mathcal{O}\right)$ for α0 large enough (depending on α, d, a) and for any D > 0 there exists L > 0 large enough (depending on $\sigma ,{f}_{0},D,\alpha ,\mathcal{O},{g}_{\mathrm{min}},d$) such that, as N,

Just as before, for f0C the above rate can be made as close as desired to N−1/2 by choosing α large enough. Moreover, the last contraction theorem also translates into a convergence result for the posterior mean of F.

Theorem 10. Under the hypotheses of theorem 9, let ${\overline{F}}_{N}={E}^{{\Pi}}\left[F\vert {Y}^{\left(N\right)},{X}^{\left(N\right)}\right]$ be the mean of Π(⋅|Y(N), X(N)). Then, as N,

Equation (22)

We note that the proof of the last two theorems crucially takes advantage of the 'non-symmetric' and 'non-exponential' nature of the stability estimate from Nickl et al (2020), and may not hold in other non-linear inverse problems where such an estimate may not be available (e.g., as in Monard et al (2020), Abraham and Nickl (2020) or also in the Schrödinger equation setting studied in Nickl et al (2020), Nickl (2018)).

Let us conclude this section by noting that hierarchical priors such as the one studied here are usually devised to 'adapt to unknown' smoothness α0 of F0, see van der Vaart and van Zanten (2009) and Ray (2013). Note that while our posterior distribution is adaptive to α0 in the 'prediction risk' setting of theorem 8, the rate Nρ obtained in theorems 9 and 10 for the inverse problem does depend on the minimal smoothness α, and is therefore not adaptive. Nevertheless, this hierarchical prior gives an example of a fully Bayesian, consistent solution of our inverse problem.

2.3. Concluding discussion

2.3.1. Posterior computation

As mentioned in the introduction, in the context of the elliptic inverse problem considered in the present paper, posterior distributions arising from Gaussian process priors such as those above can be computed by MCMC algorithms, see Beskos et al (2017), Conrad et al (2016), Cotter et al (2013), and computational guarantees can be obtained as well: for Gaussian priors, Hairer et al (2014) establish non-asymptotic sampling bounds for the 'preconditioned Crank–Nicholson (pCN)' algorithm, which hold even in the absence of log-concavity of the likelihood function, and which imply bounds on the approximation error for the computation of the posterior mean. The algorithm can be implemented as long as it is possible to evaluate the forward map $F{\mapsto}\mathcal{G}\left(F\right)\left(x\right)$ at $x\in \mathcal{O}$, which in our context can be done by using standard numerical methods to solve the elliptic PDE (3). In practice, these algorithms often employ a finite-dimensional approximation of the parameter space (see subsection 2.2.2).

In order to sample from the posterior distribution arising from the more complex hierarchical prior (19), MCMC methods based on fixed Gaussian priors (such as the pCN algorithm) can be employed within a suitable Gibbs-sampling scheme that exploits the conditionally Gaussian structure of the prior. The algorithm would then alternate, for given J, an MCMC step targeting the marginal posterior distribution of F|(Y(N), X(N), J), followed by, given the actual sample of F, a second MCMC run with objective the marginal posterior of J|(Y(N), X(N), F). A related approach to hierarchical inversion is empirical Bayesian estimation. In the present setting this would entail first estimating the truncation level J from the data, via an estimator Ĵ = Ĵ(Y(N), X(N)) (e.g., the marginal maximum likelihood estimator), and then performing inference based on the fixed finite-dimensional prior ΠĴ (defined as in (19) with J replaced by Ĵ). See Knapik et al (2015) where this is studied in a diagonal linear inverse problem.

2.3.2. Open problems: towards optimal convergence rates

The convergence rates obtained in this article demonstrate the frequentist consistency of a Bayesian (Gaussian process) inversion method in the elliptic inverse problem (2) with data (1) in the large sample limit N. While the rates approach the optimal rate N−1/2 for very smooth models (α), the question of optimality for fixed α remains an interesting avenue for future research. We note that for the 'PDE-constrained regression' problem of recovering $\mathcal{G}\left({F}_{0}\right)$ in 'prediction' loss, the rate δN = N−(α+1)/(2α+2+d) obtained in theorems 4 and 8 can be shown to be minimax optimal (as in Nickl et al (2020), theorem 10). But for the recovery rates for f obtained in theorems 6 and 10, no matching lower bounds are currently known. Related to this issue, in Nickl et al (2020) faster (but still possibly suboptimal) rates are obtained for the modes of our posterior distributions (MAP estimates, which are not obviously computable in polynomial time), and one may loosely speculate here about computational hardness barriers in our non-linear inverse problem. These issues pose formidable challenges for future research and are beyond the scope of the present paper.

3. Proofs

We assume without loss of generality that $\text{vol}\left(\mathcal{O}\right)=1$. In the proof, we will repeatedly exploit properties of the (re-parametrised) solution map $\mathcal{G}$ defined in (5), which was studied in detail in Nickl et al (2020). Specifically, in the proof of theorem 9 in Nickl et al (2020) it is shown that, for all α > 1 + d/2 and any ${F}_{1},{F}_{2}\in {H}_{c}^{\alpha }\left(\mathcal{O}\right)$,

Equation (23)

where we denote by X* the topological dual Banach space of a normed linear space X. Secondly, we have (lemma 20 in Nickl et al (2020)) for some constant c > 0 (only depending on $d,\;\mathcal{O}$ and Kmin),

Equation (24)

Therefore the inverse problem (7) falls in the general framework considered in appendix A below (with β = κ = 1, γ = 4 in (A2) and S = c||g|| in (A3)); in particular theorems 4 and 8 then follow as particular cases of the general contraction rate results derived in theorems 14 and 19, respectively. It thus remains to derive theorems 5 and 6 from theorem 4, and theorems 9 and 10 from theorem 8, respectively.

To do so we recall here another key result from Nickl et al (2020), namely their stability estimate lemma 24: for α > 2 + d/2, if G(f) denotes the solution of the PDE (3) with g satisfying ${\mathrm{inf}}_{x\in \mathcal{O}}g\left(x\right){\geqslant}{g}_{\mathrm{min}}{ >}0$, then for fixed ${f}_{0}\in {\mathcal{F}}_{\alpha ,{K}_{\mathrm{min}}}$ and all $f\in {\mathcal{F}}_{\alpha ,{K}_{\mathrm{min}}}$

Equation (25)

with multiplicative constant independent of f.

3.1. Proofs for section 2.2.1

Proof of theorem 5. The conclusions of theorem 4 can readily be translated for the push-forward posterior ${\tilde {{\Pi}}}_{N}\left(\cdot \vert {Y}^{\left(N\right)},{X}^{\left(N\right)}\right)$ from (15). In particular, (13) implies, for f0 = Φ◦F0, as N,

Equation (26)

and using lemma 29 in Nickl et al (2020) and (14) we obtain for sufficiently large M' > 0

Equation (27)

From the previous bounds we now obtain the following result.□

Lemma 11. For ΠN(⋅|Y(N), X(N)), δN and F0 as in theorem 4, let ${\tilde {{\Pi}}}_{N}\left(\cdot \vert {Y}^{\left(N\right)},{X}^{\left(N\right)}\right)$ be the push-forward posterior distribution from (15). Then, for f0 = Φ◦F0 and any D > 0 there exists L > 0 large enough such that, as N,

Proof. Using the continuous imbedding of ${C}^{\beta }\subset {H}^{\beta },\beta \in \mathbb{N}$, and (27), for some M' > 0 as N,

Now if fHβ with ${\Vert}f{{\Vert}}_{{H}^{\beta }}{\leqslant}{M}^{\prime }$, lemma 23 in Nickl et al (2020) implies G(f), G(f0) ∈ Hβ+1, with

and by the usual interpolation inequality for Sobolev spaces (Lions and Magenes 1972),

Thus, by what precedes and (26), for sufficiently large L > 0

as N.□

To prove theorem 5 we use (25), (27) and lemma 11 to the effect that for any D > 0 we can find L, M > 0 large enough such that, as N,

Proof of theorem 6. The proof largely follows ideas of Monard et al (2020) but requires a slightly more involved, iterative uniform integrability argument to also control the probability of events $\left\{F:{\Vert}F{{\Vert}}_{{C}^{\beta }}{ >}M\right\}$ on whose complements we can subsequently exploit regularity properties of the inverse link function Φ−1.

Using Jensen's inequality, it is enough to show, as N,

For M > 0 sufficiently large to be chosen, we decompose

Equation (28)

Using the Cauchy–Schwarz inequality we can upper bound the expectation in the second summand by

In view of (14), for all D > 0 we can choose M > 0 large enough to obtain

To bound the probability in the last line, let ${\mathcal{B}}_{N}$ be the sets defined in (A4) below, note that lemmas 16 and 23 below jointly imply that ${{\Pi}}_{N}\left({\mathcal{B}}_{N}\right){\geqslant}\mathrm{a}{\mathrm{e}}^{-AN{\delta }_{N}^{2}}$ for some a,A > 0. Also, let $\nu \left(\cdot \right)={{\Pi}}_{N}\left(\cdot \cap {\mathcal{B}}_{N}\right)/{{\Pi}}_{N}\left({\mathcal{B}}_{N}\right)$, and let ${\mathcal{C}}_{N}$ be the event from (A10), for which lemma 7.3.2 in Giné and Nickl (2016) implies that ${P}_{{F}_{0}}^{N}\left({\mathcal{C}}_{N}\right)\to 1$ as N. Then

which is upper bounded, using Markov's inequality and Fubini's theorem, by

Taking D > A + 2 (and M large enough in (28)), using the fact that ${E}_{{F}_{0}}^{N}\left({\prod }_{i=1}^{N}\;{p}_{F}/{p}_{{F}_{0}}\left({Y}_{i},{X}_{i}\right)\right)=1$, and that ${E}^{{{\Pi}}_{N}}{\Vert}F{{\Vert}}_{{L}^{2}}{< }\infty $ (by Fernique's theorem, e.g., Giné and Nickl (2016), exercise 2.1.5), we then conclude

Equation (29)

To handle the first term in (28), let f = Φ◦F and f0 = Φ◦F0. Then for all $x\in \mathcal{O}$, by the mean value and inverse function theorems,

for some η lying between f(x) and f0(x). If ${\Vert}F{{\Vert}}_{{C}^{\beta }}{\leqslant}M$ then, as Φ is strictly increasing, necessarily f(x) = Φ(F(x)) ∈ [Φ(−M), Φ(M)] for all $x\in \mathcal{O}$. Similarly, the range of f0 is contained in the compact interval [Φ(−M), Φ(M)] for M ⩾ ||F0||, so that

for a multiplicative constant not depending on $x\in \mathcal{O}$. It follows

and

Noting that for each L > 0 the last expectation is upper bounded by

we can repeat the above argument, with the event $\left\{F:{\Vert}F{{\Vert}}_{{C}^{\beta }}{ >}M\right\}$ replaced by the event $\left\{f:{\Vert}f-{f}_{0}{{\Vert}}_{{L}^{2}}{ >}L{N}^{-\lambda }\right\}$, to deduce from theorem 5 that for D > A + 2 there exists L > 0 large enough such that

which combined with (29) and the definition of δN concludes the proof.□

3.2. Sieve prior proofs

The proof only requires minor modification from the proofs of section 2.2.1. We only discuss here the main points: one first applies the L2-prediction risk theorem 14 with a sieve prior. In the proof of the small ball lemma 16 one uses the following observations: the projection ${P}_{{\mathcal{H}}_{J}}\left({F}_{0}\right)\in {\mathcal{H}}_{J}$ of ${F}_{0}\in {H}_{K}^{\alpha }$ defined in (B4) satisfies by (B6)

hence choosing J such that 2JN1/(2α+2+d), and noting also that ${\Vert}{P}_{{\mathcal{H}}_{J}}\left({F}_{0}\right){{\Vert}}_{{C}^{1}}{\leqslant}{\Vert}{F}_{0}{{\Vert}}_{{C}^{1}}{< }\infty $ for all J by standard properties of wavelet bases, it follows from (23) that

Therefore, by the triangle inequality,

The rest of the proof of lemma 16 then carries over (with ${P}_{{\mathcal{H}}_{J}}\left({F}_{0}\right)$ replacing F0), upon noting that (B3) and a Sobolev imbedding imply

for some constant c > 0 independent of J. Moreover, the last two properties are sufficient to prove an analogue of lemma 17 as well, so that theorem 14 indeed applies to the sieve prior. The proof from here onwards is identical to the ones of theorems 46 for the unsieved case, using also that what precedes implies that ${\mathrm{sup}}_{J} {E}^{{{\Pi}}_{J}^{\prime }}{\Vert}{F{\Vert}}_{{L}^{2}}^{2}{< }\infty $, relevant in the proof of convergence of the posterior mean.

3.3. Proofs for section 2.2.3

Inspection of the proofs for rescaled priors implies that theorems 9 and 10 can be deduced from theorem 8 if we can show that posterior draws lie in a α-Sobolev ball of fixed radius with sufficiently high frequentist probability. This is the content of the next result.

Lemma 12. Under the hypotheses of theorem 9, there exists α* > 0 (depending on α, d and a) such that for each ${F}_{0}\in {H}_{K}^{{\alpha }_{0}}\left(\mathcal{O}\right),{\alpha }_{0}{ >}{\alpha }^{{\ast}}$, and any D > 0 we can find M > 0 large enough such that, as N,

Proof. Theorem 8 implies that for all D > 0 and sufficiently large L, M > 0, if ${J}_{N}\in \mathbb{N}:{2}^{{J}_{N}}\simeq {N}^{1/\left(2{\alpha }_{0}+2+d\right)}$ and denoting by

then as N

Equation (30)

Next, note that if $F\in {\mathcal{H}}_{{J}_{N}}$, then by standard properties of wavelet bases (cf. (63)), ${\Vert}F{{\Vert}}_{{H}^{\alpha }}< sim {2}^{{J}_{N}\alpha }{\Vert}F{{\Vert}}_{{L}^{2}}$ for all N large enough. Thus, for ${P}_{{\mathcal{H}}_{{J}_{N}}}\left({F}_{0}\right)$ the projection of F0 onto ${\mathcal{H}}_{{J}_{N}}$ defined in (B4),

and a Sobolev imbedding further gives ${\Vert}F{{\Vert}}_{{L}^{\infty }}{\leqslant}{M}^{\prime }{2}^{{J}_{N}\alpha }\sqrt{N}{\xi }_{N}$, for some M' > 0. Now letting f = Φ◦F and f0 = Φ◦F0, by similar argument as in the proof of theorem 6 combined with monotonicity of Φ', we see that for all N large enough

Then, using the assumption on the left tail of Φ in condition 2, and the stability estimate (25),

Finally, by the interpolation inequality for Sobolev spaces (Lions and Magenes1972) and lemma 23 in Nickl et al (2020),

so that, in conclusion, for each $F\in {\mathcal{A}}_{N}$ and sufficiently large N,

The last term is bounded, using lemma 29 in Nickl et al (2020), by a multiple of

the last identity holding up to a log factor. Hence, if

then we conclude overall that ${\Vert}F{{\Vert}}_{{H}^{\alpha }}< sim 1+o\left(1\right)$ as N for all $F\in {\mathcal{A}}_{N}$, proving the claim in view of (30).□

Replacing β by α in the conclusion of lemma 11, the proof of theorem 9 now proceeds as in the proof of theorem 5 without further modification. Likewise, theorem 10 can be shown following the same argument as in the proof of theorem 6, noting that for Π the random series prior in (19), it also holds that ${E}^{{\Pi}}{\Vert}{F{\Vert}}_{{L}^{2}}^{2}{< }\infty $.

Acknowledgments

The authors are grateful to three anonymous referees for critical remarks and suggestions, as well as to Sven Wang for helpful discussions. We further acknowledge support by the European Research Council under ERC Grant agreement No. 647812 (UQMSI).

Appendix A.: Results for general inverse problems

Let $\mathcal{O}\subset {\mathbb{R}}^{d},\;d\in \mathbb{N}$, be a nonempty open and bounded set with smooth boundary, and assume that $\mathcal{D}$ is a nonempty and bounded measurable subset of ${\mathbb{R}}^{p},\;p{\geqslant}1$. Let $\mathcal{F}\subseteq {L}^{2}\left(\mathcal{O}\right)$ be endowed with the trace Borel σ-field of ${L}^{2}\left(\mathcal{O}\right)$, and consider a Borel-measurable 'forward mapping'

For $F\in \mathcal{F}$, we are given noisy discrete measurements of $\mathcal{G}\left(F\right)$ over a grid of points drawn uniformly at random on $\mathcal{D}$,

Equation (A1)

for some σ > 0. Above μ denotes the uniform (probability) distribution on $\mathcal{D}$ and the design variables ${\left({X}_{i}\right)}_{i=1}^{N}$ are independent of the noise vector ${\left({W}_{i}\right)}_{i=1}^{N}$. We assume without loss of generality that $\text{vol}\left(\mathcal{D}\right)=1$, so that μ = dx, the Lebesgue measure on $\mathcal{D}$.

We take the noise amplitude σ > 0 in (A1) to be fixed and known, and work under the assumption that the forward map $\mathcal{G}$ satisfies the following local Lipschitz condition: for given β, γ, κ ⩾ 0, and all ${F}_{1},{F}_{2}\in {C}^{\beta }\left(\mathcal{O}\right)\cap \mathcal{F}$,

Equation (A2)

where we recall that X* denotes the topological dual Banach space of a normed linear space X. Additionally, we will require $\mathcal{G}$ to be uniformly bounded on its domain:

Equation (A3)

As observed in (23), the elliptic inverse problem considered in this paper falls in this general framework, which also encompasses other examples of nonlinear inverse problems such as those involving the Schrödinger equation considered in Nickl et al (2020) and Nickl (2018), for which the results in this section would apply as well. It also includes many linear inverse problems such as the classical Radon transform, see Nickl et al (2020).

A.1. General contraction rates in Hellinger distance

Using the same notation as in section 2.1.2, and given a sequence of Borel prior probability measures ΠN on $\mathcal{F}$, we write ΠN(⋅|Y(N), X(N)) for the posterior distribution of F|(Y(N), X(N)) (arising as after (9) and (10)). We also continue to use the notation pF for the densities from (8) now in the general observation model (A1) (and implicitly assume that the map (F, (y, x)) ↦ pF(y, x) is jointly measurable to ensure existence of the posterior distribution). Below we formulate a general contraction theorem in the Hellinger distance that forms the basis of the proofs of the main results. It closely follows the general theory in Ghosal and van der Vaart (2017) and its adaptation to the inverse problem setting in Monard et al (2020)—we include a proof for conciseness and convenience of the reader.

Define the Hellinger distance h(⋅, ⋅) on the set of probabilities density functions on $\mathbb{R}{\times}\mathcal{D}$ (with respect to the product measure dy × dx) by

For any set $\mathcal{A}$ of such densities, let $N\left(\eta ;\mathcal{A},h\right),\;\eta { >}0$, be the minimal number of Hellinger balls of radius η needed to cover $\mathcal{A}$.

Theorem 13. Let ΠN be a sequence of prior Borel probability measures on $\mathcal{F}$, and let ΠN(⋅|Y(N), X(N)) be the resulting posterior distribution arising from observations (Y(N), X(N)) in model (A1). Assume that for some fixed ${F}_{0}\in \mathcal{F}$, and a sequence δN > 0 such that δN → 0 and $\sqrt{N}{\delta }_{N}\to \infty $ as N, the sets

Equation (A4)

satisfy for all N large enough

Equation (A5)

Further assume that there exists a sequence of Borel sets ${\mathcal{A}}_{N}\subset \mathcal{F}$ for which

Equation (A6)

for all N large enough, as well as

Equation (A7)

Then, for sufficiently large L = L(B, C) > 4 such that L2 > 12(BC), and all 0 < D < BA − 2, as N,

Equation (A8)

Proof. We start noting that by theorem 7.1.4 in Giné and Nickl (2016), for each L > 4 satisfying L2 > 12(BC) we can find tests (random indicator functions) ΨN = ΨN(Y(N), X(N)) such that as N

Equation (A9)

Next, denote the set whose posterior probability we want to lower bound by

and, using the first display in (A9), decompose the probability of interest as

Next, let $\nu \left(\cdot \right)={{\Pi}}_{N}\left(\cdot \cap {\mathcal{B}}_{N}\right)/{{\Pi}}_{N}\left({\mathcal{B}}_{N}\right)$ be the restricted normalised prior on ${\mathcal{B}}_{N}$, and define the event

Equation (A10)

for which lemma 7.3.2 in Giné and Nickl (2016) implies that ${P}_{{F}_{0}}^{N}\left({\mathcal{C}}_{N}\right)\to 1$ as N. We then further decompose

and in view of condition (A5) and the above definition of ${\mathcal{C}}_{N}$, we see that

We conclude applying Markov's inequality and Fubini's theorem, jointly with the fact that for all $F\in \mathcal{F}$

to upper bound the last probability by

as N since B > A + D + 2, having used the excess mass condition (A6) and the second display in (A9).□

A.2. Contraction rates for rescaled Gaussian priors

While the previous result assumed a general sequence of priors, we now derive explicit contraction rates in L2-prediction risk for the specific choices of priors considered in section 2.2. We start with the 're-scaled' priors of section 2.2.1.

Theorem 14. Let the forward map $\mathcal{G}$ satisfy (A2) and (A3) for given β, γ, κ ⩾ 0 and S > 0. For integer α > β + d/2, consider a Gaussian prior ΠN constructed as in (11) with scaling Nd/(4α+4κ+2d) and with base prior F' ∼ Π' satisfying condition 3 with RKHS $\mathcal{H}$. Let ΠN(⋅|Y(N), X(N)) be the resulting posterior arising from observations (Y(N), X(N)) in (A1), assume ${F}_{0}\in \mathcal{H}$ and set δN = N−(α+κ)/(2α+2κ+d).

Then for any D > 0 there exists L > 0 large enough (depending on σ, F0, D, α, and β, γ, κ, S, d) such that, as N,

Equation (A11)

and for sufficiently large M > 0 (depending on σ, D, α, β, γ, κ, d)

Equation (A12)

Remark 15. Inspection of the proof shows that if κ=0 in (A12), then the RKHS $\mathcal{H}$ in condition 3 can be assumed to be continuously imbedded in Hα($\mathcal{O}$) only instead of Hαc($\mathcal{O}$). The same remark in fact applies for κ < 1/2.

Proof. In view of the boundedness assumption (A3) on $\mathcal{G}$, we have by lemma 23 below that for some q > 0 (depending on σ, S)

Hence, for ${\mathcal{B}}_{N}$ the sets from (A4) we have $\left\{F:{\Vert}\mathcal{G}\left({F}_{0}\right)-\mathcal{G}\left(F\right){{\Vert}}_{{L}^{2}\left(\mathcal{D}\right)}{\leqslant}{\delta }_{N}/q\right\}\subseteq {\mathcal{B}}_{N},$ which in turn implies the small ball condition (A5) since by lemma 16 below (premultiplying if needed δN by a sufficiently large but fixed constant):

for some A > 0 and all N large enough. Next, for all D > 0 and any B > A + D + 2, we can choose sets ${\mathcal{A}}_{N}$ as in lemmas 17 and 18 and verify the excess mass condition (A6) as well as the complexity bound (A7). Note that ${\Vert}F{{\Vert}}_{{C}^{\beta }}{\leqslant}M$ for all $F\in {\mathcal{A}}_{N}$. We then conclude by theorem 13 that for some L' > 0 large enough

yielding the claim for some appropriate L > 0 using the first inequality in (A26).□

The following key lemma shows that the (non-Gaussian) prior induced on the regression functions $\mathcal{G}\left(F\right)$ assigns sufficient mass to a L2-neighbourhood of $\mathcal{G}\left({F}_{0}\right)$.

Lemma 16. Let ΠN, F0 and δN be as in theorem 14. Then, for all sufficiently large q > 0 there exists A > 0 (depending on q, F0, α, β, γ, κ, d) such that

Equation (A13)

for all N large enough.

Proof. Using (A2) and noting that ${\Vert}{F}_{0}{{\Vert}}_{{C}^{\beta }}{< }\infty $ for ${F}_{0}\in \mathcal{H}$ by a Sobolev imbedding, for any fixed constant M > 1∨ ||F0||Cβ,

having defined

whose intersection is a symmetric set in the ambient space $C\left(\mathcal{O}\right)$. Then, since ${F}_{0}\in \mathcal{H}$, recalling that the RKHS ${\mathcal{H}}_{N}$ of ΠN coincides with $\mathcal{H}$ with RKHS norm ${\Vert}\cdot {{\Vert}}_{{\mathcal{H}}_{N}}$ given in (12), now with scaling ${N}^{d/\left(4\alpha +4\kappa +2d\right)}=\sqrt{N}{\delta }_{N}$, we can use corollary 2.6.18 in Giné and Nickl (2016) to lower bound the last probability by

We proceed finding a lower bound for the prior probability of C1, which, by construction of ΠN, satisfies

For any integer α > 0 and any κ ⩾ 0, letting ${B}_{c}^{\alpha }\left(r\right){:=}\left\{F\in {H}_{c}^{\alpha },\;{\Vert}F{{\Vert}}_{{H}^{\alpha }}{\leqslant}r\right\},\;r{ >}0$, we have the metric entropy estimate:

Equation (A14)

see the proof of lemma 19 in Nickl et al (2020) for the case κ ⩾ 1/2, and theorem 4.10.3 in Triebel (1978) for κ < 1/2 (in the latter case, we note in fact that the estimate holds true also for balls in the whole space Hα). Hence, since $\mathcal{H}$ is continuously imbedded into ${H}_{c}^{\alpha }$, letting ${B}_{\mathcal{H}}\left(1\right)$ be the unit ball of $\mathcal{H}$, we have ${B}_{\mathcal{H}}\left(1\right)\subseteq {B}_{c}^{\alpha }\left(r\right)$ for some r > 0, implying that for all η > 0

Then, for all N large enough, the small ball estimate in theorem 1.2 in Li and Linde (1999) yields

Equation (A15)

implying ${{\Pi}}_{N}\left({C}_{1}\right){\geqslant}{\mathrm{e}}^{-{q}^{{\prime\prime}}N{\delta }_{N}^{2}},$ for some q'' > 0. Note that q'' can be made as small as desired by taking q in (A13) large enough.

We conclude obtaining a suitable upper bound for ΠN(C2c). In particular, by construction of ΠN, recalling ${N}^{d/\left(2\alpha +2\kappa +d\right)}=N{\delta }_{N}^{2}$,

By condition 3, F' defines a centred Gaussian Borel random element in a separable measurable subspace $\mathcal{C}$ of Cβ, and by the Hahn–Banach theorem and the separability of $\mathcal{C}$, ${\Vert}{F}^{\prime }{{\Vert}}_{{C}^{\beta }}$ can then be represented as a countable supremum

of actions of bounded linear functionals $\mathcal{T}={\left({T}_{m}\right)}_{m\in \mathbb{N}}\subset {\left({C}^{\beta }\right)}^{{\ast}}$. It follows that the collection ${\left\{{T}_{m}\left({F}^{\prime }\right)\right\}}_{m\in \mathbb{N}}$ is a centred Gaussian process with almost surely finite supremum, so that by Fernique's theorem Giné and Nickl (2016), theorem 2.1.20:

We then apply the Borell–Sudakov–Tirelson inequality Giné and Nickl (2016), theorem 2.5.8, to obtain for all N large enough,

Given our initial choice of M, the proof is then concluded taking q in lemma 16 sufficiently large so that q'' < (M/τ)2/8.□

We now construct suitable approximating sets for which we check the excess mass condition (A6).

Lemma 17. Let ΠN and δN be as in theorem 14. Define for any M, Q > 0

Equation (A16)

Then for any B > 0 and for sufficiently large M, Q (both depending on B, α, β, γ, κ, d), for all N large enough,

Equation (A17)

Proof. We note that the last inequality at the end of the proof of the previous lemma implies that for $M\gtrsim \sqrt{B}$ and all N large enough, ${{\Pi}}_{N}\left(F:{\Vert}F{{\Vert}}_{{C}^{\beta }}{\leqslant}M\right){\geqslant}1-{\mathrm{e}}^{-BN{\delta }_{N}^{2}}.$ Thus, the claim will follow if we can derive a similar lower bound for

having used that ${N}^{d/\left(4\alpha +4\kappa +d\right)}=\sqrt{N}{\delta }_{N}$. Using theorem 1.2 in Li and Linde (1999) as before (A15), we deduce that for some q > 0

so that taking any Q > (B/q)−(2α+2κd)/(2d) implies

Equation (A18)

Next, denote

where Φ is the standard normal cumulative distribution function. Then by standard inequalities for Φ−1 we have ${M}_{N}\simeq \sqrt{BN}{\delta }_{N}$ as N, so that taking $M\gtrsim \sqrt{B}$

By the isoperimetric inequality for Gaussian processes Giné and Nickl (2016), theorem 2.6.12, the last probability is then lower bounded, using (A18), by

concluding the proof.□

We conclude with the verification of the complexity bound (A7) for the sets ${\mathcal{A}}_{N}$.

Lemma 18. Let ${\mathcal{A}}_{N}$ be as in lemma 17 for some fixed M, Q > 0. Then,

for some constant C > 0 (depending on σ, M, Q, α, β, γ, κ, d, S) and all N large enough.

Proof. If $F\in {\mathcal{A}}_{N}$, then F = F1 + F2 with ${\Vert}{F}_{1}{{\Vert}}_{{\left({H}^{\kappa }\right)}^{{\ast}}}{\leqslant}Q{\delta }_{N}$ and ${\Vert}{F}_{2}{{\Vert}}_{{H}^{\alpha }}{\leqslant}{M}^{\prime },$ the latter inequality following from the continuous imbedding of $\mathcal{H}$ into ${H}_{c}^{\alpha }$. Thus, recalling the metric entropy estimate (A14), if

is a δN-net with respect to ${\Vert}\cdot {{\Vert}}_{{\left({H}^{\kappa }\right)}^{{\ast}}}$, we can find Hi such that ${\Vert}{F}_{2}-{H}_{i}{{\Vert}}_{{\left({H}^{\kappa }\right)}^{{\ast}}}{\leqslant}{\delta }_{N}$. Then, using the second inequality in (A26) below and the local Lipschitz estimate (A2),

Recalling that if $F\in {\mathcal{A}}_{N}$ then also ${\Vert}F{{\Vert}}_{{C}^{\beta }}{\leqslant}M$, and using the Sobolev imbedding of Hα into Cβ to bound ${\Vert}{H}_{i}{{\Vert}}_{{C}^{\beta }}$, we then obtain

It follows that {H1, ..., HP} also forms a q'δN-net for ${\mathcal{A}}_{N}$ in the Hellinger distance for some q' > 0, so that

A.3. Contraction rates for hierarchical Gaussian series priors

We now derive contraction rates in L2-prediction risk in the inverse problem (A1), for the truncated Gaussian random series priors introduced in section 2.2.3. The proof again proceeds by an application of theorem 13.

Theorem 19. Let the forward map $\mathcal{G}$ satisfy (A2) and (A3) for given β, γ, κ ⩾ 0 and S > 0. For any α > β + d/2, let Π be the random series prior in (19), and let Π(⋅|Y(N), X(N)) be the resulting posterior distribution arising from observations (Y(N), X(N)) in (A1). Then, for each α0α, any ${F}_{0}\in {H}_{K}^{{\alpha }_{0}}\left(\mathcal{O}\right)$ and any D > 0 there exists L > 0 large enough (depending on σ, F0, D, α, β, γ, κ, S, d) such that, as N,

Equation (A19)

where ${\xi }_{N}={N}^{-\left({\alpha }_{0}+\kappa \right)/\left(2{\alpha }_{0}+2\kappa +d\right)}\mathrm{log}N$. Moreover, for ${\mathcal{H}}_{J}$ the finite-dimensional subspaces from (18) and ${J}_{N}\in \mathbb{N}$ such that ${2}^{{J}_{N}}\simeq {N}^{1/\left(2{\alpha }_{0}+2\kappa +d\right)}$, we also have that for sufficiently large M > 0 (depending on D, α, β, d):

Equation (A20)

We begin deriving a suitable small ball estimate in the L2-prediction risk.

Lemma 20. Let Π, F0 and ξN be as in theorem 19. Then, for all sufficiently large q > 0 there exists A > 0 (depending on q, F0, α, β, γ, κ, d) such that

Equation (A21)

for all N large enough.

Proof. For each $j\in \mathbb{N}$, denote by Πj the Gaussian probability measure on the finite dimensional subspace ${\mathcal{H}}_{j}$ in (18) defined as after (19) with the series truncated at j. For ${J}_{N}\in \mathbb{N}:{2}^{{J}_{N}}\simeq {N}^{1/\left(2{\alpha }_{0}+2\kappa +d\right)}$, note

Equation (A22)

so that, recalling the properties (20) of the random truncation level J, for some s > 0,

for all N large enough. It follows

Next, let

be the 'projection' of F0 onto ${\mathcal{H}}_{{J}_{N}}$. Since ${F}_{0}\in {H}_{K}^{{\alpha }_{0}}\subset {C}^{\beta }$ by a Sobolev imbedding, it follows using (A2) and standard approximation properties of wavelets (B6),

which implies by the triangle inequality that

Using again that Hα imbeds continuously into Cβ as well as (A2) and (B5), we can lower bound the last probability by

which, by corollary 2.6.18 in Giné and Nickl (2016) and in view of (B2) is further lower bounded by

Now since $f{\mapsto}\chi f,\chi \in {C}^{\infty }\left(\mathcal{O}\right)$, is continuous on ${H}^{\alpha }\left(\mathcal{O}\right)$,

for some t > 0, where ${Z}_{m}\text{iid}{\sim }N\left(0,1\right)$, and where we have used the wavelet characterisation of the ${H}^{\alpha }\left({\mathbb{R}}^{d}\right)$ norm. To conclude, note that the last probability is greater than

Finally, a standard calculation shows that Pr(|Z1| ⩽ t) ≳ t if t → 0, and hence the last product is lower bounded, for large N, by

In the following lemma we construct suitable approximating sets, for which we check the excess mass condition (A6) and the complexity bound (A7) required in theorem 13.

Lemma 21. Let Π, ξN and JN be as in theorem 8, and let ${\mathcal{H}}_{{J}_{N}}$ be the finite dimensional subspace defined in (18) with J = JN. Define for each M > 0

Equation (A23)

Then, for any B > 0 there exists M > 0 large enough (depending on B, α, β, d) such that, for sufficiently large N

Equation (A24)

Moreover, for each fixed M > 0 and all N large enough

Equation (A25)

for some C > 0 (depending on σ, α, β, γ, κ, S, d).

Proof. Letting ${Z}_{m}\text{iid}{\sim }N\left(0,1\right)$, noting ${\Vert}{F{\Vert}}_{{H}^{\alpha }}^{2}{\leqslant}{2}^{2{J}_{N}\alpha }{\sum }_{\ell {\leqslant}{J}_{N},r\in {\mathcal{R}}_{\ell }}{F}_{\ell r}^{2}$ for all $F\in {\mathcal{H}}_{{J}_{N}}$ (cf. (B2)) and using (A22) and (20), we have for sufficiently large N

for any constant $0{< }\overline{M}{< }{M}^{2}-1$, since $\mathrm{dim}\left({\mathcal{H}}_{{J}_{N}}\right)< sim {2}^{{J}_{N}d}\simeq {N}^{d/\left(2\alpha +2+d\right)}=o\left(N{\xi }_{N}^{2}\right)$. The bound (A24) then follows applying theorem 3.1.9 in Giné and Nickl (2016) to upper bound the last probability, for any B > and for sufficiently large M and $\overline{M}$, by

We proceed with the derivation of (A25). By choice of JN, if $F\in {\mathcal{A}}_{N}$ then ${\Vert}F{{\Vert}}_{{H}^{\alpha }}^{2}< sim {N}^{\left(2\alpha \right)/\left(2\alpha +2\kappa +d\right)}N{\xi }_{N}^{2}.$ Hence, by the second inequality in (A26), using (A2) and the Sobolev imbedding of Hα into Cβ, if ${F}_{1},{F}_{2}\in {\mathcal{A}}_{N}$ then

Therefore, using the standard metric entropy estimate for balls ${B}_{{\mathbb{R}}^{p}}\left(r\right),\;r{ >}0$, in Euclidean spaces (Giné and Nickl (2016), proposition 4.3.34), we see that for N large enough

A.4. Information theoretic inequalities

In the following lemma due to Birgé (2004) we exploit the boundedness assumption (A3) on $\mathcal{G}$ to show the equivalence between the Hellinger distance appearing in the conclusion of theorem 13 and the L2-distance on the 'regression functions' $\mathcal{G}\left(F\right)$.

Lemma 22. Let the forward map $\mathcal{G}$ satisfy (A3) for some S > 0. Then, for all ${F}_{1},{F}_{2}\in \mathcal{F}$

Equation (A26)

Proof. Note ${h}^{2}\left({p}_{{F}_{1}},{p}_{{F}_{2}}\right)=2-2\rho \left({p}_{{F}_{1}},{p}_{{F}_{2}}\right)$, where

is the Hellinger affinity. Using the expression of the likelihood in (8) (with $\mathcal{D}$ instead of $\mathcal{O}$), the right hand side is seen to be equal to

having used that the moment generating function of ZN(0, σ2) satisfies $E{\mathrm{e}}^{tZ}={\mathrm{e}}^{{\sigma }^{2}{t}^{2}/2},\;t\in \mathbb{R}$. Thus, the latter integral equals

To derive the second inequality in (A26), we use Jensen's inequality to lower bound the expectation in the last line by

Hence

whereby the claim follows using the basic inequality 1 − ez/cz/c, for all c, z > 0.

To deduce the first inequality we follow the proof of proposition 1 in Birgé (2004): note that for all 0 ⩽ z1 < z2

Then taking ${z}_{1}={\left\{\mathcal{G}\left({F}_{1}\right)\left(X\right)-\mathcal{G}\left({F}_{2}\right)\left(X\right)\right\}}^{2}/\left(8{\sigma }^{2}\right)$ and z2 = S2/(2σ2),

which in turn yields the result.□

The next lemma bounds the Kullback–Leibler divergences appearing in (A4) in terms of the L2-prediction risk.

Lemma 23. Let the observation Yi in (A1) be generated by some fixed ${F}_{0}\in \mathcal{F}$. Then, for each $F\in \mathcal{F}$,

and

Proof. If ${Y}_{1}=\mathcal{G}\left({F}_{0}\right)\left({X}_{1}\right)+\sigma {W}_{1},$ then

Hence, since EW1 = 0 and X1μ,

On the other hand,

whence the second claim follows since $E{W}_{1}^{2}=1$.□

Appendix B.: Additional background material

In this final appendix we collect some standard materials used in the proofs for convenience of the reader.

Example 24. Take

and let $\psi :\mathbb{R}\to \left[0,\infty \right)$ be a smooth compactly supported function such that ${\int }_{\mathbb{R}}\psi \left(t\right)\mathrm{d}t=1$. Define for any Kmin ∈ (0, 1)

Equation (B1)

Then it is elementary to check that Φ is a regular link function that satisfies condition 2 (with a = 2).

Example 25. For any real α > d/2, the Whittle–Matérn process with index set $\mathcal{O}$ and regularity αd/2 > 0 (cf. example 11.8 in Ghosal and van der Vaart (2017) is the stationary centred Gaussian process $M=\left\{M\left(x\right),\;x\in \mathcal{O}\right\}$ with covariance kernel

From the results in chapter 11 in (Ghosal and van der Vaart 2017) we see that the RKHS of $\left(M\left(x\right):x\in \mathcal{O}\right)$ equals the set of restrictions to $\mathcal{O}$ of elements in the Sobolev space ${H}^{\alpha }\left({\mathbb{R}}^{d}\right)$, which equals, with equivalent norms, the space ${H}^{\alpha }\left(\mathcal{O}\right)$ (since $\mathcal{O}$ has a smooth boundary). Moreover, lemma I.4 in Ghosal and van der Vaart (2017) shows that M has a version with paths belonging almost surely to Cβ' for all β' < αd/2. Let now $K\subset \mathcal{O}$ be a nonempty compact set, and let M be a Cβ'-smooth version of a Whittle–Matérn process on $\mathcal{O}$ with RKHS ${H}^{\alpha }\left(\mathcal{O}\right)$. Taking F' = χM implies (cf. exercise 2.6.5 in Giné and Nickl (2016)) that ${{\Pi}}^{\prime }=\mathcal{L}\left({F}^{\prime }\right)$ defines a centred Gaussian probability measure supported on Cβ', whose RKHS is given by

and the RKHS norm satisfies that for all $F\in {H}^{\alpha }\left(\mathcal{O}\right)$ there exists ${F}^{{\ast}}\in {H}^{\alpha }\left(\mathcal{O}\right)$ such that χF = χF* and

Thus if F' = χF is an arbitrary element of $\mathcal{H}$, then

which shows that $\mathcal{H}$ is continuously embedded into ${H}_{c}^{\alpha }\left(\mathcal{O}\right)$.

Remark 26. Let $\left\{{{\Psi}}_{\ell r},\;\ell {\geqslant}-1,r\in {\mathbb{Z}}^{d}\right\}$ be an orthonormal basis of ${L}^{2}\left({\mathbb{R}}^{d}\right)$ composed of S-regular and compactly supported Daubechies wavelets (see chapter 4 in Giné and Nickl (2016) for construction and properties). For each 0 ⩽ αS, we have

and the square root of the latter series defines an equivalent norm to ${\Vert}\cdot {{\Vert}}_{{H}^{\alpha }\left({\mathbb{R}}^{d}\right)}$. Note that S > 0 can be taken arbitrarily large.

For any α ⩾ 0 the Gaussian random series

defines a centred Gaussian probability measure supported on the finite-dimensional space ${\overline{\mathcal{H}}}_{j}$ spanned by the $\left\{{{\Psi}}_{\ell r}, \ell {\leqslant}j,r\in {\mathcal{R}}_{\ell }\right\}$, and its RKHS equals ${\overline{\mathcal{H}}}_{j}$ endowed with norm

(cf. example 2.6.15 in Giné and Nickl 2016). Basic wavelet theory implies $\mathrm{dim}\left({\overline{\mathcal{H}}}_{j}\right)< sim {2}^{jd}$.

If we now fix compact ${K}^{\prime }\subset \mathcal{O}$ such that KK', and consider a cut-off function $\chi \in {C}_{c}^{\infty }\left(\mathcal{O}\right)$ such that χ = 1 on K', then multiplication by χ is a bounded linear operator $\chi :{H}^{s}\left({\mathbb{R}}^{d}\right)\to {H}_{c}^{s}\left(\mathcal{O}\right).$ It follows that the random function

defines, according to exercise 2.6.5 in Giné and Nickl (2016), a centred Gaussian probability measure ${{\Pi}}_{j}=\mathcal{L}\left({F}_{j}\right)$ supported on the finite dimensional subspace ${\mathcal{H}}_{j}$ from (18), with RKHS norm satisfying

Equation (B2)

Arguing as in the previous remark one shows further that for some constant c > 0,

Equation (B3)

Remark 27. Using the notation of the previous remark, for fixed ${F}_{0}\in {H}_{K}^{\alpha }\left(\mathcal{O}\right)$, consider the finite-dimensional approximations

Equation (B4)

Then in view of (B2), we readily check that for each j ⩾ 1

Equation (B5)

Also, for each κ ⩾ 0, and any $G\in {H}^{\kappa }\left(\mathcal{O}\right)$, we see that (implicitly extending to 0 on ${\mathbb{R}}^{d}{\backslash}\mathcal{O}$ functions that are compactly supported inside $\mathcal{O}$)

where ${\chi }^{\prime }\in {C}_{c}^{\infty }\left(\mathcal{O}\right)$, with χ' = 1 on supp(χ). We also note that, in view of the localisation properties of Daubechies wavelets, for some ${J}_{\mathrm{min}}\in \mathbb{N}$ large enough, if Jmin and the support of Ψℓr intersects K, then necessarily supp(Ψℓr) ⊆ K', so that

Therefore, for jJmin, by Parseval's identity and the Cauchy–Schwarz inequality

It follows by duality that for all j large enough

Equation (B6)

We conclude remarking that

Equation (B7)

Indeed, let jJmin, and fix $F\in {\mathcal{H}}_{j}$; then

But as ${\mathcal{H}}_{{J}_{\mathrm{min}}}$ is a fixed finite dimensional subspace, then we have ${\Vert}{P}_{{\mathcal{H}}_{{J}_{\mathrm{min}}}}\left(F\right){{\Vert}}_{{H}^{s}\left(\mathcal{O}\right)}< sim {\Vert}{P}_{{\mathcal{H}}_{{J}_{\mathrm{min}}}}\left(F\right){{\Vert}}_{{L}^{2}\left(\mathcal{O}\right)}{\leqslant}{\Vert}F{{\Vert}}_{{L}^{2}\left(\mathcal{O}\right)}$ for some fixed multiplicative constant only depending on Jmin. On the other hand, we also have

yielding (B7).

Example 28. Consider the integer-valued random variable

where ϕ(x) = x log x, x ⩾ 1. Then for any j ⩾ 1

On the other hand, since ${\mathrm{e}}^{-{2}^{jd}\left(1-{2}^{-d}\right)\mathrm{log}{2}^{\left(j-1\right)d}}\to 0$ as j,

Please wait… references are loading.
10.1088/1361-6420/ab7d2a