1 Introduction

In geophysical electromagnetic induction methods (Nabighian 1991; Berdichevsky and Dmitriev 2008; Chave and Jones 2012), natural or artificial source currents generate the primary electromagnetic fields which propagate through the Earth. In conductive bodies and over interfaces with conductivity contrasts, induced currents and accumulated surface charges generate secondary magnetic and electric fields. Depending on the method, either the primary field plus the secondary field or the secondary field is measured at receivers. The secondary field carries information of the conductive bodies and the locations of interfaces with conductivity contrasts. Therefore, using inversion algorithms (Parker 1994; Tikhonov et al. 1995; Zhdanov 2002; Tarantola 2005; Aster et al. 2012; Menke 2012), we are able to extract the underground conductivity distribution from measured electromagnetic field data sets. Currently, geo-electromagnetic induction methods are widely used in studies of the shallow (\(<300\) m depth) subsurface (Tezkan 1999; Commer et al. 2006; Pedersen et al. 2006; Günther et al. 2006; Kalscheuer et al. 2007; Rodriguez and Sweetkind 2015; Merz et al. 2016; Abtahi et al. 2016; Zhang et al. 2016; Costall et al. 2018) and monitoring and imaging of the deep crust and mantle structures (Chen et al. 1996; Nelson et al. 1996; Becken et al. 2011; Yan et al. 2016; Dong et al. 2016; Le Pape et al. 2017; Sarafian et al. 2018; Kühn et al. 2018).

However, many studies which are based on the inversion of electromagnetic data have ignored a key factor that is to evaluate the uncertainty and resolution of the preferred inversion model. In inversion problems, an infinite number of equivalent models exist that can explain the electromagnetic data to within a given misfit threshold. This is caused by the limited data coverage in time (or frequency) and space, the inevitable data noise and the strongly nonlinear relationship of data and model. The preferred inversion solution is only one of an infinitely large set of candidate models. In inversion, we need first establish a cost function which measures the data misfit between measurements and responses calculated by an approximation of the underlying physics. The responses of this physical system are computed based on a model parameterisation of the underground conductivity structure. Then, deterministic iterative (i.e. linearised) or stochastic inversion algorithms (Parker 1994; Zhdanov 2002; Tarantola 2005; Aster et al. 2012; Menke 2012) are employed to find a possible model which minimises the cost function to a certain level. The topography of the cost function measuring the data misfit may have an infinite number of local minima (valleys) and local maxima (hills) (Fernández-Martínez et al. 2012; Fernández-Martínez 2015). The preferred inversion model can locate at the bottom point of any such valley. A serious problem is that the valley containing this preferred inversion model may not include the geologically meaningful true model. To shorten the distance of the inversion model to the unknown true model or to make the inversion model locate in the same valley as the true model, a constraint or a reference point, such as a Tikhonov regularisation point (Jackson 1979; Tikhonov et al. 1995), must be fixed in this valley, which is implemented by adding a model regularisation term to the cost function. The preferred inversion model and the true model generally do not locate at the same point, because the bottom of this valley generally looks like an extended and distorted saddle shape. The distance of the preferred inversion model to the true model (resolution) and its possible variations (uncertainty) can be appraised using linearised or nonlinear model analysis.

Linearised model analysis tools can offer quantitative measures for both resolution and uncertainty of the preferred inversion model. A quantitative measure of model resolution can be computed in form of the model resolution matrix. Quantitative measures of model uncertainty can be computed in terms of the model covariance matrix (Backus and Gilbert 1968), and equivalent model domains can be calculated using the partially nonlinear concept of pseudo-hyperellipsoids (Johansen 1977; Kalscheuer and Pedersen 2007; Kalscheuer et al. 2010). Formally, the preferred inversion model is a sum of the true model weighted with the resolution matrix, the reference model (if applicable) weighted with the complement of the resolution matrix and the noise in the measurements weighted by the generalised inverse (Friedel 2003; Günther 2004; Kalscheuer et al. 2010). Hence, the model resolution matrix is considered a blurring filter through which we see the contribution of the true model in the inversion model. The model covariance matrix describes how uncertainties in the true model, the reference model and the data translate into uncertainties of the inversion model. It can be safely assumed that the uncertainty in the true model is zero. If no reference model is used (e.g., in a smoothness-constrained inversion) or if the reference model is not associated with uncertainty, the only uncertainty projecting into the uncertainty of the inversion model is that of the field data. It is valuable to mention another quantitative measure of model uncertainty in terms of funnel functions (Oldenburg 1983; Menke 2012). In this approach, the upper and lower bounds of the model parameters (or the local averages of model parameters) for both linear and nonlinear optimisation problems are estimated. Subsequently, model parameter uncertainties are computed as differences between these estimated upper and lower bounds.

Whereas linearised model analysis tools estimate model uncertainty and resolution using exactly the same data, data uncertainties and model constraints that were used to compute the inversion model, nonlinear model analysis tools do not only account for the nonlinearity of the forward problem but utilise the fact that the preferred inversion model reflects changes in data, data uncertainties and model constraints (initial model, reference model, smoothness constraint, structural and petrophysical constraints). Therefore, in these nonlinear methods, the equivalent model domain of the preferred inversion model is generated by varying the data set (Schnaidt and Heinson 2015), the data uncertainties (Li et al. 2009), the initial model (Bai and Meju 2003; Schmoldt et al. 2014), the reference model (such as methods computing depth-of-investigation indices, in short DOI indices, e.g., Oldenburg and Li 1999; Oldenborger et al. 2007), the regularisation (or smoothness) operators (Constable et al. 1987; de Groot-Hedlin and Constable 1990, 2004; Kalscheuer et al. 2007), and, if applicable, the structural or petrophysical model constraints (Gallardo and Meju 2004; Shamsipour et al. 2012; Gao et al. 2012; Kamm et al. 2015; Giraud et al. 2017, 2019). Only models with responses that agree with the field data to within an acceptable data misfit threshold are accepted as equivalent models. In its most simplistic form, nonlinear model analysis is based on the generation of equivalent models by manually applying changes to the preferred inversion model and evaluating the data fit or running inversions in which the resistivities of the modified cells are fixed (Becken et al. 2008; Thiel et al. 2009; Thiel and Heinson 2010; Juanatey et al. 2013; Dong et al. 2014; Lindau and Becken 2018). In addition, further nonlinear model analysis approaches generate equivalent models by globally or stochastically searching the model domain (Tarantola 2005). Compared to the linearised model analysis tool, where its equivalent models are more or less in the vicinity of the preferred inversion model, most approaches to nonlinear model analysis explore a larger part of the model domain. However, we should keep in mind that most currently available methods of nonlinear model analysis do not offer information on model resolution, and many offer only qualitative appraisals of model uncertainty. With regard to model uncertainty, this is so, because most nonlinear schemes do not enforce convergence to a particular target misfit. However, in linear inversion theory, model parameter uncertainties corresponding to a 68 % confidence level are associated with a misfit deviation of 1.0 from the misfit of the optimal model. Hence, using the simplistic approach mentioned above, the stability of a large-scale target structure with a pronounced resistivity contrast to the background medium can be grossly evaluated, and in many practical applications this approach is fully sufficient. However, in applications where targets are associated with comparatively small resistivity contrasts, e.g., tracing a contaminant plume through an aquifer, the acceptable levels of model uncertainty are small. Hence, it is important to calculate quantitative estimates of model uncertainty controlling misfit changes stringently. Furthermore, since EM inversion problems are nonlinear, extreme model parameters may not correspond to confidence levels of 68 %, even though the extreme models have misfit deviations of 1.0. Hence, a comprehensive stochastic analysis of the model space may be necessary.

With the above linearised and nonlinear model analysis tools, we are able to estimate model uncertainty and/or resolution. Considering the uncertainty estimates, we can establish confidence levels for the various structures observed in the preferred inversion model (Fig. 1). Those parts of a model space that have low uncertainties and resolution matrix entries with little spread around the main diagonal might be used for assisting the geological interpretation or the design of drilling campaigns. In addition, using the guideline that the preferred inversion model should have adequately good resolution and small ranges of uncertainties, we can apply the results of model analysis to inversely optimise the acquisition layout (Fig. 1) or the inversion grid. Optimal survey configurations can help us to collect minimum sets of electromagnetic data, which lead to adequate constraints for the target structures, cost-effective surveys (Roux and Garcia 2014) and reduced computational loads of inversions (Yang and Oldenburg 2016).

In this review, we will first comprehensively introduce the basic theory and methodologies of linearised and nonlinear model analysis. Second, practical computational strategies are presented to accelerate the computation of linearised model uncertainty and resolution estimates. Third, applications using model analysis to appraise the preferred inversion model are summarised. Fourth, we discuss the developments in using model analysis as tools to design optimal survey configurations and inversion grids. Fifth, we conclude this paper with general recommendations both on inversion and model analysis and an outlook on future research directions.

Fig. 1
figure 1

Illustration of the relationships among measurements, model constraints, the preferred inversion model and the results of model uncertainty and resolution analysis

2 Methods of Model Analysis

The conductivity distribution of the Earth is discretised into an inversion grid \({\mathbb {G}}\) and an associated model vector \({\mathbf {m}}\) (where \({\mathbf {m}}\) is generally a logarithmic function of conductivity). Assuming there are M elements or cells in the inversion grid, the size of the model vector is M. The underlying physical system is controlled by Maxwell’s equations (Nabighian 1991; Berdichevsky and Dmitriev 2008; Chave and Jones 2012), and it is denoted by the forward operator \({\mathbf {F}}\left[ {\mathbb {G}}, {\mathbf {m}}\right]\) (henceforth, in short \({\mathbf {F}}\left[ {\mathbf {m}}\right]\)). The measured electromagnetic responses such as scalar and tensor impedances (Lilley 2017) at receivers are arranged into an observation data vector \({\mathbf {d}}\) of size N, where N is the number of measurements. The electromagnetic responses can be acquired on the ground, from aircraft, in the ocean and even in boreholes. Uncertainties generally exist in the measured data set due to slight perturbations in the geometric sensor layout, intrinsic noise in the sensors, amplifiers and analogue-to-digital converters and ambient electromagnetic noise, which may behave like random variables with zero mean or lead to a systematic bias away from the true signal. In most cases, we assume that the uncertainty \(\delta _{i}\), \(i=1,\ldots ,N\) in each observation originates from random noise with a normally distributed probability of zero mean and nonzero standard deviation. Then, we design a data misfit term \(Q_{\mathrm{d}}\) by measuring the difference between the observations and the predicted responses for an as yet to be determined model vector \({\mathbf {m}}\), that is (Abubakar et al. 2009; Menke 2012; Aster et al. 2012):

$$\begin{aligned} Q_{\mathrm{d}} = \left( {\mathbf {d}}-{\mathbf {F}}\left[ {\mathbf {m}}\right] \right) ^{T}{\mathbf {W}}_{d}^{T}{\mathbf {W}}_{d}\left( {\mathbf {d}}-{\mathbf {F}}\left[ {\mathbf {m}}\right] \right) , \end{aligned}$$
(1)

where the weighting matrix \({\mathbf {W}}_{d}\) is diagonal, \({W}_{d}^{ii}= 1/\delta _{i}\), \(i=1,\ldots ,N\), and makes the data misfit dimensionless. Superscript T denotes vector or matrix transposition. Now, the weighted data misfit describes a \(\chi ^{2}\)-distribution with an expectation value of N. Typically, \(Q_d\) is expressed as a root-mean-squared fit \(\text {RMS} = \sqrt{Q_d/N}\) of the forward responses to the field measurements.

In the hyperdimensional \(Q_{\mathrm{d}}-{\mathbf {m}}\) plot, infinite numbers of valley and hills exist (Snieder 1998). Unless there are profound misconceptions (e.g., model dimensionality, anisotropic or frequency-dependent material parameters), the true model, which can be considered a simplified form of the real Earth, can be assumed to be in one specific valley. Our purpose of inversion is to try to come close to this unique true model using the information from the observation data \({\mathbf {d}}\). However, due to data uncertainty, the limited data coverage, and the inherent nonlinear characteristics and fundamental equivalences of the underlying physical system, the inversion problem cannot be solved ad hoc, and meaningful solution approaches depend on the particular properties of the given inversion problem (Menke 2012). To this end, we categorise inversion problems following Menke (2012):

  • In purely under-determined inversion problems, there is not enough information in the data to uniquely determine a single model parameter. Formally, cases with \(M>N\) may qualify for being under-determined. However, depending on the information content in the data, even cases with \(M<N\) may be under-determined. Under-determined inversion problems have model solutions with zero prediction error, i.e. \(Q_d = 0\).

  • In contrast to the previous category, over-determined inversion problems are characterised by the fact that all model parameters are uniquely determined, and \(M<N\) is a necessary, but not a sufficient formal requirement. Over-determined inversion problems have solutions with \(Q_d > 0\).

  • In even-determined inversion problems, all model parameters are uniquely determined by the data, \(M=N\), and \(Q_d = 0\).

  • Mixed-determined inversion problems are the most general case. Here, parts of the model domain are over-determined, whereas other parts are under-determined. 2D and 3D geophysical inversion models are typically mixed-determined, and we will predominantly consider problems that fall into this category.

Under- and mixed-determined inversion problems are by definition non-unique (Miensopust 2017). One remedy to this non-uniqueness problem is to introduce additional model constraints or model regularisation (Jackson 1979; Tikhonov et al. 1995; Zhdanov 2002; Menke 2012). The model constraints can enforce the conductivity distribution to vary smoothly (Constable et al. 1987; de Groot-Hedlin and Constable 1990; Kalscheuer et al. 2010), they can include prior geological knowledge, restrict the conductivity distribution by rock sampling tests, or be based on structural constraints (Gallardo and Meju 2011; Yan et al. 2017b) and petrophysical constraints (Moorkamp 2017; Haber and Holtzman Gazit 2013) using information offered by other geophysical data sets (such as seismic, gravity, magnetic and logging data). Using a properly designed model constraint, the inversion algorithm can converge to a model that is close to the true model. Therefore, to obtain a geologically meaningful model close to the true one, it is inevitable to enforce model constraints, which is accomplished by designing a model misfit term (typically in form of a quadratic functional) which measures the distance from the inversion model to the reference model \({\mathbf {m}}_{r}\), as (Abubakar et al. 2009; Menke 2012; Aster et al. 2012):

$$\begin{aligned} Q_{\mathrm{m}}=({\mathbf {m}}-{\mathbf {m}}_{r})^{T}{\mathbf {W}}_{\mathrm{m}}^{T}{\mathbf {W}}_{\mathrm{m}}({\mathbf {m}}-{\mathbf {m}}_{r}), \end{aligned}$$
(2)

where the weighting matrix \({\mathbf {W}}_{\mathrm{m}}\) contains either purely mathematical operators to enforce model smoothness or accounts for geological or geophysical constraints (Constable et al. 1987; Zhdanov 2002). Then, these two misfit terms in form of quadratic functionals can be assembled into one single cost function Q:

$$\begin{aligned} Q = Q_{\mathrm{d}}+\alpha Q_{\mathrm{m}}, \end{aligned}$$
(3)

where the weighting factor or trade-off parameter \(\alpha >0\) balances the contribution of the data misfit term \(Q_{\mathrm{d}}\) and the model misfit term \(Q_{\mathrm{m}}\) in the cost function Q.

It is a standard choice in the geo-electromagnetic induction community to minimise the above \(l_2\)-norm variant of the cost function Q (Menke 2012; Aster et al. 2012; Farquharson and Oldenburg 2004; Krakauer et al. 2004; Schwarzbach and Haber 2013; Usui 2015; Mojica and Bassrei 2015; Jahandari and Farquharson 2017; Wang et al. 2018a). However, it may be desirable to use general \(l_p\) norms instead. To obtain more robustness against data outliers and more compact anomalies, for instance, the \(l_1\) norm may be used. General \(l_p\) norms can be conveniently implemented in a least-squares algorithm using re-weighting matrices that are adjusted in each iteration. This technique is generally known as iteratively re-weighted least-squares (IRLS) and has been occasionally used in electromagnetic geophysics (e.g., Farquharson and Oldenburg 1998; Oldenburg and Li 2005; Rosas-Carbajal et al. 2012). However, we are not aware of any work, where formal model uncertainty and resolution estimates were derived for an IRLS algorithm.

2.1 Linearised Model Analysis

Since the forward operator in electromagnetic problems is a nonlinear operator with respect to the model parameters, the data misfit function \(Q_{\mathrm{d}}\) will generally vary with the model parameters with larger than quadratic order. This means that the hyperdimensional \(Q_{\mathrm{d}}-{\mathbf {m}}\) plot may have multiple side lobes with local minima or a (possibly large) number of equivalent global minima. To transform the misfit function Q to a quadratic one for which a solution to the optimisation problem can easily be found, the nonlinear forward operator of the model \({\mathbf {m}}^{k+1}\) of the \(k+1\)-th iteration is expanded to first-order around the model \({\mathbf {m}}^{k}\) of the kth iteration using a Taylor series:

$$\begin{aligned} {\mathbf {F}}\left[ {\mathbf {m}}^{k+1} \right] \approx {\mathbf {F}}\left[ {\mathbf {m}}^{k}\right] + {\mathbf {J}} \left( {\mathbf {m}}^{k+1} - {\mathbf {m}}^{k}\right) , \end{aligned}$$
(4)

where \({\mathbf {J}}\) is the sensitivity matrix or the Jacobian matrix at the kth iteration. Substituting Eq. (4) into Eq. (3), we get an approximation to the cost function that is quadratic in model \({\mathbf {m}}^{k+1}\). Minimising this quadratic cost function, we get an iterative formula to calculate the \(k+1\)-th model (Siripunvaraporn and Egbert 2000; Kalscheuer et al. 2010; Menke 2012):

$$\begin{aligned} {\mathbf {m}}^{k+1} = {\mathbf {J}}_{w}^{-g}{\mathbf {W}}_{d}\hat{{\mathbf {d}}}^{k}+{\mathbf {m}}_{r}, \end{aligned}$$
(5)

where \({\mathbf {J}}_{w}^{-g} = \left[ {\mathbf {J}}^{T}{\mathbf {W}}_{d}^{T}{\mathbf {W}}_{d}{\mathbf {J}}+\alpha {\mathbf {W}}_{\mathrm{m}}^{T}{\mathbf {W}}_{\mathrm{m}}\right] ^{-1}{\mathbf {J}}^{T}{\mathbf {W}}_{d}^{T}\) is the generalised inverse matrix, and \(\hat{{\mathbf {d}}}^{k}={\mathbf {d}}-{\mathbf {F}}\left[ {\mathbf {m}}^{k}\right] +{\mathbf {J}}\left( {\mathbf {m}}^{k}-{\mathbf {m}}_{r}\right)\) was used.

If model \({\mathbf {m}}^{k}\) is linearly close to the true model \({\mathbf {m}}_{\mathrm{true}}\), we get (Friedel 2003; Kalscheuer et al. 2010)

$$\begin{aligned} {\mathbf {d}}&= {\mathbf {F}}\left[ {\mathbf {m}}_{\mathrm{true}}\right] +{\mathbf {n}}\approx {\mathbf {F}}\left[ {\mathbf {m}}^{k}\right] +{\mathbf {J}}\left( {\mathbf {m}}_{\mathrm{true}}-{\mathbf {m}}^{k}\right) +{\mathbf {n}} \end{aligned}$$

and, hence,

$$\begin{aligned} \hat{{\mathbf {d}}}^{k} \approx {\mathbf {J}}\left( {\mathbf {m}}_{\mathrm{true}}-{\mathbf {m}}_{r}\right) +{\mathbf {n}}, \end{aligned}$$
(6)

where \({\mathbf {n}}\) denotes the data noise. Substituting Eq. (6) into Eq. (5), we get

$$\begin{aligned} {\mathbf {m}}^{k+1}&= {\mathbf {J}}_{w}^{-g}{\mathbf {W}}_{d}[{\mathbf {J}}\left( {\mathbf {m}}_{\mathrm{true}}-{\mathbf {m}}_{r}\right) +{\mathbf {n}}]+{\mathbf {m}}_{r} \nonumber \\&= \underbrace{{\mathbf {J}}_{w}^{-g}{\mathbf {W}}_{d}{\mathbf {J}}}_{{\mathbf {R}}_{\mathrm{M}}}{\mathbf {m}}_{\mathrm{true}}+({\mathbf {I}}-\underbrace{{\mathbf {J}}_{w}^{-g}{\mathbf {W}}_{d}{\mathbf {J}}}_{{\mathbf {R}}_{\mathrm{M}}}){\mathbf {m}}_{r}+{\mathbf {J}}_{w}^{-g}{\mathbf {W}}_{d}{\mathbf {n}} \nonumber \\&= {\mathbf {R}}_{\mathrm{M}} {\mathbf {m}}_{\mathrm{true}} + ({\mathbf {I}}-{\mathbf {R}}_{\mathrm{M}}){\mathbf {m}}_{r} + {\mathbf {J}}_{w}^{-g}{\mathbf {W}}_{d}{\mathbf {n}} , \end{aligned}$$
(7)

where \({\mathbf {R}}_{\mathrm{M}}={\mathbf {J}}_{w}^{-g}{\mathbf {W}}_{d}{\mathbf {J}}\) is defined as the model resolution matrix. It indicates the distance from the inverted model to the unknown true model. If we assume that at iteration \(k+1\) the inversion terminates, then \({\mathbf {m}}^{k+1}\) can be considered as the preferred inversion model. Equation (7) shows that this preferred inversion model consists of contributions by the true model, the reference model and the data noise. If the resolution matrix \({\mathbf {R}}_{\mathrm{M}}\) is an identity matrix, the preferred inversion model is said to be perfectly resolved and does not depend on the reference model. It is equal to the sum of the true solution and the noise in the data weighted with the generalised inverse. In the opposite case with poor model resolution, the preferred inversion model consists mostly of the reference model. For a given data set, the resolution properties of the preferred model depend strongly on the trade-off parameter \(\alpha\) (which is contained in \({\mathbf {J}}_{w}^{-g} = \left[ {\mathbf {J}}^{T}{\mathbf {W}}_{d}^{T}{\mathbf {W}}_{d}{\mathbf {J}}+\alpha {\mathbf {W}}_{\mathrm{m}}^{T}{\mathbf {W}}_{\mathrm{m}}\right] ^{-1}{\mathbf {J}}^{T}{\mathbf {W}}_{d}^{T}\) and Eq. 3). Going to a smaller value of the trade-off parameter, the resolution matrix will come closer to the identity matrix, as long as the generalise inverse still exists, i.e. \({\mathbf {R}}_{\mathrm{M}} \rightarrow [{\mathbf {J}}^{T}{\mathbf {W}}_{d}^{T}{\mathbf {W}}_{d}{\mathbf {J}}]^{-1}{\mathbf {J}}^{T}{\mathbf {W}}_{d}^{T} {\mathbf {W}}_{d}{\mathbf {J}}={\mathbf {I}}\) as \(\alpha \rightarrow 0\). At the same time, the generalised inverse \({\mathbf {J}}_{w}^{-g}\) will become very large in amplitude as \(\alpha \rightarrow 0\) such that the noise term is amplified. In the opposite case, when the trade-off parameter is set to a very large value, the nonzero entries of the resolution matrix will become more spread around the diagonal of the resolution matrix, meaning poorer resolution, but the noise contribution will be smaller. In the first case, we retrieve a well-resolved but unstable model. In the second case, we retrieve a stable, but poorly resolved model. Both of these extreme cases are far from optimal, and it is clear that a compromise needs to be made when it comes to the choice of trade-off parameters. Formally, this trade-off between model resolution and stability is explored by plotting the spread of the resolution matrix versus the size of the model covariance matrix (see below) for a large range of trade-off parameters (Hansen 1992). The only options that we have to obtain a more favourable trade-off curve is to add more measurements which should shift the whole trade-off curve to the better or to employ additional geo-scientific information in the reference model \({\mathbf {m}}_{r}\) or the model regularisation operator \({\mathbf {W}}_{\mathrm{m}}\).

The model resolution matrix \({\mathbf {R}}_{\mathrm{M}}\) can be rewritten as

$$\begin{aligned} {\mathbf {R}}_{\mathrm{M}} = \begin{bmatrix} {\mathbf {a}}^{1} \\ \vdots \\ {\mathbf {a}}^{i} \\ \vdots \\ {\mathbf {a}}^{M} \end{bmatrix} = \begin{bmatrix} {\mathbf {b}}^{1}&\ldots ,&{\mathbf {b}}^{i},&\ldots&{\mathbf {b}}^{M} \end{bmatrix}, \end{aligned}$$
(8)

where \({\mathbf {a}}^{i}\) and \({\mathbf {b}}^{i}\) denote the ith row and column resolution vectors, respectively, for \(i=1,\ldots ,M\). The row vector \({\mathbf {a}}^{i}\) acts as a linear projection of the true model to the ith inversion model parameter (see Fig. 2), which is referred to as the averaging function, or upon normalisation by the areas or volumes of all cells, as the resolving kernel (see Fig. 3, effectively a resolution density). The concept of averaging functions were originally presented by Backus and Gilbert (1968). The column resolution vector \({\mathbf {b}}^{i}\) describes how a delta-like perturbation in the ith true model parameter spreads over the inversion model vector and is generally referred to as the point-spread function (Alumbaugh and Newman 2000). The ideal shape of both the averaging function and the point-spread function is a delta function which has the only nonzero value (a unit value) on the diagonal entry of the model resolution matrix, i.e. at the target model parameter (see Fig. 3). There are two definitions for unimodularity of a given resolving kernel. In the first definition, unimodularity means that the resolving kernel has a single main lobe without any positively or negatively valued side lobes. In the second definition, unimodularity refers to the 2D or 3D integrals of the resolving kernel over the areas or volumes of all model cells evaluating to one (e.g., Ory and Pratt 1995). In the latter case, there is no bias in the form of an average amplitude shift from the true model (Treitel and Lines 1982).

Fig. 2
figure 2

Conceptual model resolution matrix with the averaging function (\({\mathbf {a}}^{i}\)) and point-spread function (\({\mathbf {b}}^{i}\)), \({\mathbf {m}}_{\mathrm{true}}\) is the true (unknown) model, \(m^{k+1}_i\) is the ith component of model vector \({\mathbf {m}}^{k+1}\) obtained at the \(k+1\)-th inversion iteration, and \({m}_{true,i}\) is the ith component of the true model vector

Fig. 3
figure 3

Conceptual resolving kernels for model parameters with nearly perfect (a) and poor (b) resolution. Positive and negative side lobes, an offset of the main lobe from the position of the investigated parameter on the diagonal (dashed line) and a large spread of the resolving kernel are characteristics of poor resolution

The model resolution matrix only indicates how close the preferred inversion model is to the true model. It is incapable of estimating the range of all candidate models or equivalent models which all fit the cost function Q equally well to within a certain threshold or, conversely, how uncertainty in the measurements translates to uncertainty in the model parameters. The (a posteriori) model covariance matrix gives us a quantitative measure of how uncertainty in the true model, the reference model and the data propagates into model uncertainty. Using Eq. (7), the (a posteriori) model covariance matrix \({\mathbf {C}}\) for the \(k+1\)th model (the preferred inversion model) is computed as (Menke 2012):

$$\begin{aligned} {\mathbf {C}}&= \left[ cov\,{\mathbf {m}}^{k+1}\right] \nonumber \\&= E\left[ \left( {\mathbf {m}}^{k+1}-E\left[ {\mathbf {m}}^{k+1}\right] \right) \left( {\mathbf {m}}^{k+1}-E\left[ {\mathbf {m}}^{k+1}\right] \right) ^{T}\right] , \end{aligned}$$
(9)

where \(E\left[ \cdot \right]\) is the statistical expectation operator. Hence,

$$\begin{aligned} {\mathbf {C}}&= {\mathbf {R}}_{\mathrm{M}} \left[ cov\,{\mathbf {m}}_{\mathrm{true}}\right] {\mathbf {R}}_{\mathrm{M}}^{T} + ({\mathbf {I}}-{\mathbf {R}}_{\mathrm{M}})\left[ cov\,{\mathbf {m}}_{r}\right] ({\mathbf {I}}-{\mathbf {R}}_{\mathrm{M}})^{T}+{\mathbf {J}}_{w}^{-g}{\mathbf {W}}_{d}\left[ cov\,{\mathbf {n}}\right] {\mathbf {W}}_{d}^{T}{{\mathbf {J}}_{w}^{-g}}^{T} = ({\mathbf {I}}-{\mathbf {R}}_{\mathrm{M}})\left[ cov\,{\mathbf {m}}_{r}\right] ({\mathbf {I}}-{\mathbf {R}}_{\mathrm{M}})^{T}+{\mathbf {J}}_{w}^{-g}{{\mathbf {J}}_{w}^{-g}}^{T},\end{aligned}$$
(10)

where we assume that there is no variability in the true model (\(\left[ cov\,{\mathbf {m}}_{\mathrm{true}}\right] =0\)) and that the covariance of data noise \(\left[ cov\,{\mathbf {n}}\right]\) was correctly identified in the data weighting matrix such that \({\mathbf {W}}_{d}\left[ cov\,{\mathbf {n}}\right] {\mathbf {W}}_{d}^{T}={\mathbf {I}}\), i.e. \(\left[ cov\,{\mathbf {n}}\right] = {(\delta _{i})}^{2}\) (\(i=1,\ldots ,N\)) when only variances are accounted for. As for actual field data sets, the true mean values and covariance structure of data noise are rarely known. In particular, mean values of data noise deviating from zero represent systematic noise and may lead to pronounced artefacts in inversion models, because the mean values of data noise are assumed to be zero in inversion. If the covariance of the reference model can be expressed as \(\left[ cov\,{\mathbf {m}}_{r}\right] = \left( \alpha {\mathbf {W}}_{\mathrm{m}}^{T}{\mathbf {W}}_{\mathrm{m}}\right) ^{-1}\), we have \({\mathbf {C}} = \left[ {\mathbf {J}}^{T}{\mathbf {W}}_{d}^{T}{\mathbf {W}}_{d}{\mathbf {J}}+\alpha {\mathbf {W}}_{\mathrm{m}}^{T}{\mathbf {W}}_{\mathrm{m}}\right] ^{-1}\). If the reference model is considered a fixed vector (implying that \(\left[ cov\,{\mathbf {m}}_{r}\right] = {\mathbf {0}}\)) or zero as in some deterministic inversion schemes (such as the Occam inversion; Constable et al. 1987), we have:

$$\begin{aligned} {\mathbf {C}} = {\mathbf {J}}_{w}^{-g}({{\mathbf {J}}_{w}^{-g}})^{T}. \end{aligned}$$
(11)

Both the model resolution matrix and the model covariance matrix are of size \(M\times M\), where M is the number of model parameters, and symmetric. For 2D inversion problems, storage and direct computation of the generalised inverse matrix \({\mathbf {J}}_{w}^{-g} = \left[ {\mathbf {J}}^{T}{\mathbf {W}}_{d}^{T}{\mathbf {W}}_{d}{\mathbf {J}}+\alpha {\mathbf {W}}_{\mathrm{m}}^{T}{\mathbf {W}}_{\mathrm{m}}\right] ^{-1}{\mathbf {J}}^{T}{\mathbf {W}}_{d}^{T}\) as well as the model resolution and covariance matrices is not a problem on modern computer systems. However, for 3D problems with large values of M it will be difficult to store and directly compute the generalised inverse matrix \({\mathbf {J}}_{w}^{-g}\), the model resolution matrix and the model covariance matrix.

Using a Taylor series expansion of the forward model response \({\mathbf {F}}\left[ {\mathbf {m}}^{k+1}\right]\) around that of the model of the kth iteration and substitution of Eq. (5) for the preferred model \({\mathbf {m}}^{k+1}\) (iteration \(k+1\)), the predicted data \({\mathbf {d}}^{k+1,pre}\) (equalling the model responses \({\mathbf {F}}\left[ {\mathbf {m}}^{k+1}\right]\)) can be related to the measured data \({\mathbf {d}}\) (Friedel 2003; Kalscheuer et al. 2010; Menke 2012):

$$\begin{aligned} {{\mathbf {d}}^{k+1,pre}}&= {\mathbf {F}}\left[ {\mathbf {m}}^{k+1}\right] \nonumber \\ &\approx \,{\mathbf {F}}\left[ {\mathbf {m}}^{k}\right] + {\mathbf {J}}\left( {\mathbf {m}}^{k+1}-{\mathbf {m}}^{k}\right) \nonumber \\&= {\mathbf {J}}{\mathbf {J}}^{-g}_{w}{\mathbf {W}}_d{\mathbf {d}} + \left( {\mathbf {I}}-{\mathbf {J}}{\mathbf {J}}^{-g}_{w}{\mathbf {W}}_d\right) \left( {\mathbf {F}}\left[ {\mathbf {m}}^{k}\right] - {\mathbf {J}}\left( {\mathbf {m}}^{k}-{\mathbf {m}}_r\right) \right) \nonumber \\&= {\mathbf {R}}_D{\mathbf {d}} + \left( {\mathbf {I}}-{\mathbf {R}}_D\right) \left( {\mathbf {F}}\left[ {\mathbf {m}}^{k}\right] - {\mathbf {J}}\left( {\mathbf {m}}^{k}-{\mathbf {m}}_r\right) \right) . \end{aligned}$$
(12)

Here, the data resolution matrix\({\mathbf {R}}_D = {\mathbf {J}}{\mathbf {J}}^{-g}_{w}{\mathbf {W}}_d\) can be thought of as a set of blurring filters (rows) through which the field data \({\mathbf {d}}\) are reproduced in the predicted data \({\mathbf {d}}^{k+1,pre}\). In the more general case of a mixed-determined inversion problem, but also for an over-determined problem, the model responses \({\mathbf {F}}\left[ {\mathbf {m}}^{k+1}\right]\) consist of terms related to the field data and the responses of the model of the kth iteration and of the reference model. If the response of the model of the kth iteration was linearly close to that of the reference model, Eq. (12) would further simplify to \({\mathbf {d}}^{k+1,pre} \approx {\mathbf {R}}_D{\mathbf {d}} + \left( {\mathbf {I}}-{\mathbf {R}}_D\right) {\mathbf {F}}\left[ {\mathbf {m}}_r\right]\). In the case of an under-determined inversion problem, \({\mathbf {R}}_D = {\mathbf {I}}\) leading to perfect data fit \({\mathbf {d}}^{k+1,pre}={\mathbf {d}}\). The diagonal elements of \({\mathbf {R}}_D\) indicate how much relevance a given measurement has in its own forward response or prediction. Hence, the diagonal elements are typically referred to as data importances. Data importances play an important role in some approaches to optimal survey design (see below).

The singular value decomposition (SVD, Hansen 1990) technique is widely used to approximate the weighted sensitivity matrix \({\mathbf {J}}_{d}={\mathbf {W}}_{d}{\mathbf {J}}\). Using a threshold (trade-off) value for the minimum permissible singular value, all contributions with singular values smaller than the threshold singular value are dropped, or, alternatively, all singular values are damped. The concept of using the SVD technique to accelerate the computation of resolution and covariance matrices was initially proposed for 1D inversion problems by several pioneers (Gilbert 1971; Wiggins 1972; Jupp and Vozoff 1975; Lines and Treitel 1984). To determine the truncation level, there are various different techniques (see below). One particular technique selects the truncation level or damping factor, such that the so-called mean-square error (MSE) is minimised (Shomali et al. 2002; Pedersen 2004; Plattner and Simons 2017). The mean-square error is defined as the sum of the model variances (trace of the a posteriori model covariance matrix) and the model bias squared (in SVD terms, the null-space projection squared). Essentially, the null-space projection in the latter term can be considered the complement \({\mathbf {I}}-{\mathbf {R}}_p\) of the model resolution matrix \({\mathbf {R}}_p\) for truncation level p. Hence, the mean-square error describes the trade-off between the size (trace) of the model covariance matrix and the spread of the model resolution matrix. Whereas the first term (sum of model variances) increases with increasing truncation level p, the second term (spread of resolution matrix) decreases with increasing p. Hence, the mean-square error attains its minimum at a specific p.

Using the standard SVD technique, it is impossible to consider the effect of the model constraint or model regularisation term. Therefore, to be capable of including the effects of model constraints in model resolution and covariance matrices, the generalised singular value decomposition (GSVD) technique (Christensen-Dalsgaard et al. 1993; Golub and Van Loan 2012) is presented here. Using the GSVD technique, the weighted sensitivity matrix \({\mathbf {J}}_{d}\) and the model regularisation matrix \({\mathbf {W}}_{\mathrm{m}}\) are simultaneously decomposed as follows:

$$\begin{aligned} {\mathbf {J}}_{d} = {\mathbf {U}}_{1}{\varvec{\varLambda }}_{1} {\mathbf {V}}^{T}, \end{aligned}$$
(13)
$$\begin{aligned} {\mathbf {W}}_{\mathrm{m}} = {\mathbf {U}}_{2}{\varvec{\varLambda }}_{2} {\mathbf {V}}^{T} . \end{aligned}$$
(14)

Here, the sizes of the orthogonal matrices \({\mathbf {U}}_{1}\) and \({\mathbf {U}}_{2}\) are \(N\times N\) and \(M\times M\), respectively, and the columns of \({\mathbf {U}}_{1}\) and \({\mathbf {U}}_{2}\) are the generalised left singular vectors (eigenvectors spanning data and model constraint spaces, respectively). Matrix \({\mathbf {V}}\) is orthogonal and of size \(M\times M\), and its columns are the generalised right singular vectors (eigenvectors spanning model space). The diagonal matrices \({\varvec{\varLambda }}_{1}\) and \({\varvec{\varLambda }}_{2}\) contain N and Msingular values\(\lambda _{1}^{i}\) (\(i=1,\ldots ,N\)) and \(\lambda _{2}^{j}\) (\(j=1,\ldots ,M\)) on their diagonals, respectively. Substituting Eqs. (13) and (14) into the generalised inverse \({\mathbf {J}}_{w}^{-g}\), we have:

$$\begin{aligned} {\mathbf {J}}_{w}^{-g}&= \left[ \left( {\mathbf {U}}_{1}{\varvec{\varLambda }}_{1} {\mathbf {V}}^{T}\right) ^{T} {\mathbf {U}}_{1}{\varvec{\varLambda }}_{1} {\mathbf {V}}^{T}+\alpha \left( {\mathbf {U}}_{2}{\varvec{\varLambda }}_{2} {\mathbf {V}}^{T}\right) ^T{\mathbf {U}}_{2}{\varvec{\varLambda }}_{2} {\mathbf {V}}^{T}\right] ^{-1} \left( {\mathbf {U}}_{1}{\varvec{\varLambda }}_{1} {\mathbf {V}}^{T}\right) ^{T} \nonumber \\&= \left( {{\mathbf {V}}^{T}}\right) ^{-1} \left( {\varvec{\varLambda }}_{1}^{T}{\varvec{\varLambda }}_{1}+\alpha {\varvec{\varLambda }}_{2}^{T} {\varvec{\varLambda }}_{2}\right) ^{-1} {\varvec{\varLambda }}_{1}^{T} {\mathbf {U}}_{1}^{T}. \end{aligned}$$
(15)

Now, the model resolution matrix in Eq. (7) becomes

$$\begin{aligned} {\mathbf {R}}_{\mathrm{M}}&= {\mathbf {J}}_{w}^{-g}{\mathbf {J}}_{d} = \left( {{\mathbf {V}}^{T}}\right) ^{-1} \left( {\varvec{\varLambda }}_{1}^{T}{\varvec{\varLambda }}_{1}+\alpha {\varvec{\varLambda }}_{2}^{T}{\varvec{\varLambda }}_{2}\right) ^{-1} {\varvec{\varLambda }}_{1}^{T}{\varvec{\varLambda }}_{1} {\mathbf {V}}^{T} . \end{aligned}$$
(16)

Using Eqs. (11) and (15), the model covariance matrix is

$$\begin{aligned} {\mathbf {C}}&= \left( {{\mathbf {V}}^{T}}\right) ^{-1} \left( {\varvec{\varLambda }}_{1}^{T}{\varvec{\varLambda }}_{1}+\alpha {\varvec{\varLambda }}_{2}^{T}{\varvec{\varLambda }}_{2}\right) ^{-1} {\varvec{\varLambda }}_{1}^{T}{\varvec{\varLambda }}_{1} \left( {\varvec{\varLambda }}_{1}^{T}{\varvec{\varLambda }}_{1}+\alpha {\varvec{\varLambda }}_{2}^{T}{\varvec{\varLambda }}_{2}\right) ^{-1}{\mathbf {V}}^{-1} . \end{aligned}$$
(17)

After the inversion procedure has converged, we can employ the above techniques to compute the model resolution and covariance matrices of the resulting preferred inversion model. The model covariance matrix defines an equivalent model domain for the preferred inversion model. However, we should keep in mind that the thus computed resolution and covariance matrices \({\mathbf {R}}_{\mathrm{M}}\) and \({\mathbf {C}}\) (Ogawa et al. 1999) only indicate local and linearised estimates of the reliability of the preferred model as an approximation to the true model, as the assumption that the converged model should be linearly close to the unknown true model is used. Therefore, the above procedure is also named as linearised model analysis. Based on this discussion of linearised model analysis, we describe the general effects of varying data quality, model discretisation and model regularisation on model uncertainty and resolution estimates. We note that the general trends and patterns described below will also be found in nonlinear analyses described in later sections.

2.1.1 Effects of Data Quality on Model Uncertainty and Resolution

For subsequent inversion to be meaningful, it is of vital importance to estimate data with reliable statistical uncertainty during processing and to identify and remove data with systematic noise. Depending on the measurement methodology, the retrieved numbers of samples may not allow for representative data uncertainty to be estimated. In passive methods, such as magnetotellurics, long (days to weeks) time series are recorded to increase the number of samples, and advanced time-series processing routines are robust with regard to outliers (Egbert 1997; Ritter et al. 1998; Smirnov 2003; Garcia and Jones 2008; Chave 2017). Hence, one may argue that the estimated uncertainties of MT data are meaningful. In modern controlled-source electromagnetic methods, the options for data processing are similarly advanced as in magnetotellurics (e.g., Pankratov and Geraskin 2010; Streich et al. 2013). In some active methods, one may resort to reciprocal measurements (Parasnis 1988) to have a gross handle on data control, data editing and data uncertainty. In fact, this is a standard approach in geoelectrics, where modern multi-channel data loggers can optionally record measurements from reciprocal electrode configurations. For these methods, we note that assignment of uncertainty floors and data editing are subjective choices strongly depending on the interpreter’s experience. With more simplistic equipment, assumptions about data quality and uncertainty are even more subjective.

Increasing data uncertainty floors means that the acceptable ranges of forward responses and with that the acceptable ranges of model parameters generally increase. However, model regularisation counteracts strong model variability. Thus, when the target RMS (typical value of one) is not changed after increasing uncertainty floors, the trade-off parameter is often increased such that the variability of the model parameter with position in model space may be lower than for a lower uncertainty floor. In turn, this leads to increased spreads of resolving kernels, whereas the parameter uncertainties may show little change. As evident from Eqs. (10) and (11), a partial increase of model parameter uncertainties owing to an increased data uncertainty floor is, to some extent, counterbalanced by the increased trade-off parameter. Note the exact behaviour will depend on the particular choice of model regularisation and trade-off parameter and whether the increase in data uncertainty floor is balanced by a corresponding increase in the trade-off parameter.

Any systematic noise (e.g., from source effects of infrastructure, buried cables, insufficient system calibration) remaining after data editing may generate artefacts in inversion models. Depending on how strongly the data deviate from their nominal values without systematic noise, the associated model parameters may be biased quite heavily into the wrong direction generating artefactual model structures with high sensitivity and seemingly good model resolution.

2.1.2 Effects of Model Discretisation on Model Uncertainty and Resolution

The model resolution, model covariance and data resolution matrices depend strongly upon model discretisation. In the limiting cases, there may be too many model parameters for even a single parameter to be uniquely determined (i.e. the under-determined case), or there may be too few parameters in an over-determined inversion problem to explain the given data adequately. In the former case, model resolution is poor, but data resolution is perfect. In the latter case, model resolution is perfect, but poor data fit makes the model meaningless. Hence, the number of model parameters needs to be increased to the point where data fit becomes acceptable and model resolution is still good. Increasing the number of model parameters may render a previously over-determined inversion problem mixed-determined leading to increased model uncertainty and deteriorated resolution (e.g., Kalscheuer et al. 2015). Similar effects are observed when the number of model parameters is increased turning a mixed-determined problem to an under-determined problem. In the mixed- and under-determined 2D and 3D inversion problems that we are concerned with here, one may argue that regions with focused model resolution should be further refined, if measurements with high sensitivities to these model regions have too high misfit (see below). Similarly, one may argue that the inversion grid should be coarsened in model regions with poor resolution to make the inversion problem more economic. If the model was discretised sufficiently to explain a given data set, one might argue that further refinement of the inversion model should not lead to significant changes in model structure, other than smoother transitions, given the refinement was balanced by a corresponding adjustment in model regularisation. Clearly, such refinement will lead the nonzero entries in the model resolution matrix to be spread over a correspondingly larger number of cells. Despite such rather profound changes in the model resolution matrices, there will be little change in the resolving kernels beyond a certain level of discretisation. This is simply so, because resolving kernels can be thought of as resolution densities that approach there continuous limits with advancing refinement. Considering Eqs. (10) and (11), mesh refinement leads to decreased sensitivities and, with that, to an increase of the part of model parameter uncertainty related to the sensitivity matrix. However, mesh refinement is often accompanied by an increased trade-off parameter, possibly offsetting the increase in parameter uncertainty related to sensitivity. Note the exact behaviour will depend on whether Eq. (10) or Eq. (11) is used to compute uncertainties, the particular choice of model regularisation and whether the mesh refinement is balanced by a corresponding increase in the trade-off parameter. For instance, if Eq. (11) is used (i.e. uncertainty in the reference model is neglected), the effects of diminishing sensitivity will dominate leading to increasing model uncertainty with progressing refinement. In contrast, if (1) Eq. (10) is used (i.e. uncertainty in the reference model is accounted for), (2) the model regularisation matrix is regular, and (3) the trade-off parameter is increased, the model regularisation term may balance the effects of diminishing sensitivity with progressing refinement.

2.1.3 Effects of Model Regularisation on Model Uncertainty and Resolution

Often, we hardly have any prior knowledge and, hence, the model regularisation is based on ordinary smoothness constraints, where the adjustable parameters are the weights on horizontal and vertical smoothness, the order of the smoothness constraints and the type of the smoothness constraints (direct differences or spatial derivatives). Higher horizontal and vertical weights will lead to reduced vertical and horizontal spreads, respectively, of the resolving kernel. Certainly, if we use prior information in the form of a reference model or a model weighting matrix \({\mathbf {W}}_m\) that is not correct, this will bias the inversion model to a wrong solution. This means that both the model weighting matrix \({\mathbf {W}}_m\) and, if applicable, the reference model \({\mathbf {m}}_r\) should be judiciously selected. In particular, non-unimodular resolving kernels (here those that do not integrate to one) introduce systematic bias, as the non-unimodularity leads to an average amplitude shift of the model parameters (Treitel and Lines 1982; Ory and Pratt 1995). This non-unimodularity is a direct consequence of the choice of the regularisation operator. It was shown by Ory and Pratt (1995) that regular model constraints (i.e. a regular model weighting matrix \({\mathbf {W}}_m\) for which an inverse exists) introduce non-unimodularity, whereas singular first- or second-order smoothness constraints give unimodular resolving kernels. As a consequence, we advocate construction of a preferred inversion model using smoothness constraints with local modifications to account for known structural contrasts (e.g., Yan et al. 2017b), if applicable. Other types of regularisation may introduce unintended bias and, in the absence of compelling prior evidence, reference models should only be used for hypothesis testing. In case that both smoothness constraints and a reference model are used, the inversion will project structural boundaries in the reference model to the inversion model preserving the resistivity contrasts of the reference model. This becomes directly evident by considering the model regularisation in, for instance, a 1D inversion with first-order direct differences as smoothness constraints and a structural boundary in the reference model between cells j and \(j+1\). In this case, the corresponding term of the model misfit function attains its minimum, if

$$\begin{aligned} 0&= (m_{j+1}-m_{r,j+1}) - (m_{j}-m_{r,j}) \nonumber \\ \Leftrightarrow m_{j+1}- m_{j}&= m_{r,j+1} -m_{r,j}, \end{aligned}$$
(18)

meaning the inversion tries to preserve the resistivity contrasts of the reference model in the inversion model. In model regions where the information in the reference model is less reliable, this effect may introduce artefacts (which in this case is undesirable bias) to the inversion model. In such cases, it is advisable to remove or reduce the strength of the smoothness constraints across structural boundaries in the reference model, to modify the reference model or to use separate regularisation terms for model smoothness and proximity to the reference model (i.e. model smallness involving a diagonal regularisation matrix, e.g., Loke 2001; Oldenburg and Li 2005; Yan et al. 2017a). Finally, we note that biasing the inverse model using well-established prior constraints (e.g., from borehole logs) may be the only meaningful approach to model construction when data coverage is sparse and/or data quality is low.

In probabilistic inversion schemes, the product \({\mathbf {W}}_m^T{\mathbf {W}}_m\) is nothing else but the a priori model covariance matrix. Thus, if the a priori variances and covariances are small in size, the resulting inversion model will be more tightly constrained than for larger a priori variances and covariances.

2.2 Partially Nonlinear Model Analysis

Now, we investigate the range of model variations \(\triangle {\mathbf {m}}\) or models \({\mathbf {m}}\) (\({\mathbf {m}}= {\mathbf {m}}^{*} + \triangle {\mathbf {m}}\)) in the equivalent model domain around a preferred model \({\mathbf {m}}^{*}\) that corresponds to a maximum permissible variation \(\triangle Q\) of the cost function, i.e. \(Q \le Q_*+\triangle Q\) where \(Q_*\) is the value of the cost functional for \({\mathbf {m}}^{*}\). Re-examining the cost function, we have

$$\begin{aligned} Q = & \ ({\mathbf {d}}-{\mathbf {F}}\left[ {\mathbf {m}}^{*}+\triangle {\mathbf {m}}\right] )^{T}{\mathbf {W}}_{d}^T{\mathbf {W}}_{d}({\mathbf {d}}-{\mathbf {F}}\left[ {\mathbf {m}}^{*}+\triangle {\mathbf {m}}\right] ) \nonumber \\& \, +\alpha ({\mathbf {m}}^{*}+\triangle {\mathbf {m}}-{\mathbf {m}}_{r})^{T}{\mathbf {W}}_{\mathrm{m}}^{T}{\mathbf {W}}_{\mathrm{m}}({\mathbf {m}}^{*}+\triangle {\mathbf {m}}-{\mathbf {m}}_{r}). \end{aligned}$$
(19)

Using the linear approximation \({\mathbf {F}}\left[ {\mathbf {m}}^{*}+\triangle {\mathbf {m}}\right] \approx {\mathbf {F}}\left[ {\mathbf {m}}^{*}\right] +{\mathbf {J}}\left[{\mathbf {m}}^{*}\right]\triangle {\mathbf {m}}\), we get:

$$\begin{aligned} Q\approx & \ Q({\mathbf {m}}^{*}) + 2\triangle {\mathbf {m}}^{T}\left[ {\mathbf {J}}\left[{\mathbf {m}}^{*}\right]^{T} {\mathbf {W}}_{d}^T{\mathbf {W}}_{d}({\mathbf {d}}-{\mathbf {F}}\left[ {\mathbf {m}}^{*}\right] )+\alpha {\mathbf {W}}_{\mathrm{m}}^T{\mathbf {W}}_{\mathrm{m}}({\mathbf {m}}^{*}-{\mathbf {m}}_{r})\right] \nonumber \\& \, +\triangle {\mathbf {m}}^{T} \left[ {\mathbf {J}}\left[{\mathbf {m}}^{*}\right]^{T}{\mathbf {W}}_{d}^T{\mathbf {W}}_{d} {\mathbf {J}}\left[{\mathbf {m}}^{*}\right]+\alpha {\mathbf {W}}_{\mathrm{m}}^T{\mathbf {W}}_{\mathrm{m}}\right] \triangle {\mathbf {m}}. \end{aligned}$$
(20)

The condition of \(\nabla_{\mathbf{m}} Q\) vanishing at \({\mathbf {m}}^{*}\), i.e. at \(\triangle {\mathbf {m}}=0\), implies

$$\begin{aligned} {\mathbf {J}}\left[{\mathbf {m}}^{*}\right]^{T} {\mathbf {W}}_{d}^T{\mathbf {W}}_{d}({\mathbf {d}}-{\mathbf {F}}\left[ {\mathbf {m}}^{*}\right] )+ \alpha {\mathbf {W}}_{\mathrm{m}}^T{\mathbf {W}}_{\mathrm{m}}({\mathbf {m}}^{*}-{\mathbf {m}}_{r})={\mathbf {0}}. \end{aligned}$$
(21)

Thus, we get

$$\begin{aligned} Q({\mathbf {m}})\approx & \,Q({\mathbf {m}}^{*}) + \triangle {\mathbf {m}}^{T}\left[ {\mathbf {J}}_{d}^T{\mathbf {J}}_{d}+\alpha {\mathbf {W}}_{\mathrm{m}}^T{\mathbf {W}}_{\mathrm{m}}\right] \triangle {\mathbf {m}}, \end{aligned}$$
(22)

where

$$\mathbf {J}_d = {\mathbf {W}}_{d} {\mathbf {J}}\left[{\mathbf {m}}^{*}\right].$$
(23)

This formula tells us that the cost function is a quadratic function in the model variation \(\triangle {\mathbf {m}}\) in the neighbourhood of the preferred model (Johansen 1977; Kitanidis 1996). Depending on the degree of nonlinearity in the forward problem, the cost function may have valley- and hill-like structures at larger distance (Fernández-Martínez et al. 2012).

Defining \(\triangle Q = Q({\mathbf {m}})-Q({\mathbf {m}}^{*})\), and using the GSVD results in Eqs. (13) and (14), we obtain:

$$\begin{aligned} \triangle Q \approx & \ \triangle {\mathbf {m}}^{T}\left[ {\mathbf {V}}{\varvec{\varLambda }}_{1}^{T}{\varvec{\varLambda }}_{1}{\mathbf {V}}^{T}+\alpha {\mathbf {V}}{\varvec{\varLambda }}_{2}^{T}{\varvec{\varLambda }}_{2}{\mathbf {V}}^{T}\right] \triangle {\mathbf {m}}\\ = & \ {\mathbf {p}}^{T} \left[ {\varvec{\varLambda }}_{1}^{T}{\varvec{\varLambda }}_{1}+\alpha {\varvec{\varLambda }}_{2}^{T}{\varvec{\varLambda }}_{2}\right] {\mathbf {p}} , \end{aligned}$$
(24)

where \({\mathbf {p}}={\mathbf {V}}^{T}\triangle {\mathbf {m}}\) is the generalised (or transformed) model parameter vector in the space of matrix \({\mathbf {V}}\). (If we set either \({\mathbf {W}}_{\mathrm{m}}={\mathbf {I}}\) or \(\alpha =0\), the above GSVD result is reduced to the SVD result.) If a threshold value for the minimum permissible singular value is used, the dimension of vector \({\mathbf {p}}\) will be much less than M. The relationship between \(\triangle Q\) and the vector \(\triangle {\mathbf {m}}\) in Eq. 24 defines a hyperellipsoid that can be rewritten in terms of the transformed model parameters \({\mathbf {p}}\) as

$$\begin{aligned} \sum _{i=1}^{M} \frac{{p_{i}}^{2}}{{s_{i}}^{2}} = 1, \end{aligned}$$
(25)

where \(p_{i}\) is the ith generalised model parameter, \(s_{i}=\sqrt{ \triangle Q/\left( (\lambda _{1}^{i})^2+\alpha \cdot (\lambda _{2}^{i})^{2}\right) }\) is the length of the semi-axis along the ith column vector of \({\mathbf {V}}\) (or model eigenvector). The above formula defines a hyperellipsoidal domain of equivalence in which each model generates electromagnetic responses within a variation \(\triangle Q\) of the cost function value \(Q({\mathbf {m}}^{*})\) of the preferred model. If the inversion problem was linear, the choice \(\triangle Q=1\) would correspond to an equivalent model domain with a 68 % confidence level for the model parameters (Johansen 1977). However, for nonlinear inversion problems, the association of a given misfit variation with a confidence level is, at best, approximate (Kalscheuer and Pedersen 2007). Usage of the linear approximation to the forward operator \({\mathbf {F}}\) in Eq. (21) implies that the obtained equivalent model domain is still a result of linearisation. The linear approximation is evident by the equality of the semi-axes in the directions of both negative and positive model eigenvectors.

Due to the nonlinearity, the actual semi-axes along the negative and positive directions of the eigenvector can be different. Thus, as a first step that accounts for nonlinearity takes this inequality into account. Since the nonlinearity in directions other than those of the model eigenvectors is neglected, this procedure generalises the hyperellipsoid to a pseudo-hyperellipsoid (see Fig. 4 for an example of two model parameters). For a given misfit variation \(\triangle Q\), the lengths of the semi-axes are determined by the positions in model space where the pseudo-hyperellipsoid (black line) coincides with the true misfit surface (red line). The computation of the actual semi-axis can be formulated as finding the roots of the following equation (Johansen 1977; Kalscheuer and Pedersen 2007):

$$\begin{aligned} Q({\mathbf {m}}) - Q({\mathbf {m}}^{*}) - \triangle Q = 0 \end{aligned}$$
(26)

with \({\mathbf {m}}={\mathbf {m}}^{*}+s_{i}^{\pm }{\mathbf {v}}_{i}\), \(i=1,\ldots ,M\). Equation (26) is a function of typically higher than quadratic order in the nonlinear semi-axes \(s_{i}^{\pm }\), where \(s_{i}^{-}\) and \(s_{i}^{+}\) indicate the semi-axes along the negative and positive directions of model eigenvector \({\mathbf {v}}_{i}\), respectively (Fig. 4). The stronger the nonlinearity, the larger is the difference between the lengths of \(s_{i}^{-}\) and \(s_{i}^{+}\). For electrical resistance tomography (ERT) and magnetotelluric (MT) 2D inversion problems, Kalscheuer and Pedersen (2007) and Kalscheuer et al. (2010) demonstrated that the lengths of the linearised semi-axes and the nonlinear semi-axes differ strongly along those eigenvector associated with small singular values. This pseudo-hyperellipsoid spanned by the nonlinear semi-axes defines an equivalent model domain around the desired model \({\mathbf {m}}^{*}\) over which the condition of the cost function to vary by no more than \(\triangle Q\) is fulfilled more closely than by the hyperellipsoid stemming from a linear approximation. A large axis length means that dramatic model changes along this direction only lead to small variations of the cost function. If the axis has an infinite length, an infinite number of candidate models exists so that the model parameter is not constrained. In contrast, if the axis has a small length, small model changes lead to significant variations of the cost function, which implies that the parameter is well constrained (Fernández-Martínez et al. 2012). The concept of using the pseudo-hyperellipsoid to describe the nonlinear equivalent model domain was introduced by Johansen (1977) and Pedersen and Rasmussen (1989).

Fig. 4
figure 4

Conceptual misfit hypersurface (red line) defined by misfit deviation \(\triangle Q\) from misfit \(Q\left( {\mathbf {m}}^{*}\right)\) of a preferred model \({\mathbf {m}}^{*}\) for a model \({\mathbf {m}}=\left( m_1,m_2\right) ^T\) with two parameters. The pseudo-hyperellipsoid (black line) is described by the semi-axes \(s_1^+\), \(s_1^-\), \(s_2^+\) and \(s_2^-\) along the model eigenvectors and yields a first improvement in the description of nonlinear effects on model uncertainty (e.g., deviations of \(m_1^{{\rm min}}-m_{1}^{*}\) and \(m_1^{\mathrm{{\rm max}}}-m_{1}^{*}\) from \(m_{1}^{*}\)). In contrast, the most-squares inversion determines the de facto models with minimum (e.g., \(m_{1 {\rm true}}^{{\rm min}}\)) and maximum deviations (e.g., \(m_{1 {\rm true}}^{\mathrm{{\rm max}}}\))

For the jth model parameter, its minimum and maximum values (e.g., \(j=1\), \(m^{{\rm min}}_{1}\) and \(m^{\mathrm{{\rm max}}}_{1}\) in Fig. 4) on this pseudo-hyperellipsoidal approximation to the cost function can be estimated by locating the coincident points of this pseudo-hyperellipsoid and the hyperplane characterised by a normal vector \({\mathbf {w}}_{j}\) with \(w_{j,i}=\pm 1\) if \(j=i\) else \(w_{j,i}=0\) if \(j\ne i\). This requires the gradient of the cost function to be parallel to the normal vector \({\mathbf {w}}_{j}\), which implies the following relationship

$$\begin{aligned} \nabla (\triangle Q)&= e_{j}{\mathbf {w}}_{j}, \end{aligned}$$
(27)

where \(\triangle Q\) is given in Eq. (24), and \(e_{j}\) is an arbitrary scalar to be determined. Thus, Eq. (27) becomes (Johansen 1977)

$$\begin{aligned} {\mathbf {V}}\varvec{\varLambda }_{1,2}{\mathbf {V}}^{T}\triangle {\mathbf {m}} = e_{j} {\mathbf {w}}_{j}, \end{aligned}$$
(28)

where we set \(\varvec{\varLambda }_{1,2}= {\varvec{\varLambda }}_{1}^{T}{\varvec{\varLambda }}_{1}+\alpha {\varvec{\varLambda }}_{2}^{T}{\varvec{\varLambda }}_{2}\). Using Eqs. (23) and (26), we get

$$\begin{aligned} e_{j} = \sqrt{\frac{\triangle Q}{{\mathbf {w}}_{j}^{T} \left[ {\mathbf {V}}\varvec{\varLambda }_{1,2}{\mathbf {V}}^{T}\right] ^{-1}{\mathbf {w}}_{j} }}. \end{aligned}$$
(29)

Now, we get the extreme model variation for the jth parameter as the jth component of

$$\begin{aligned} \triangle \mathbf{m} = \sqrt{\frac{\triangle Q}{{\mathbf {w}}_{j}^{T}{\mathbf {b}}}} {\mathbf {b}} . \end{aligned}$$
(30)

where \({\mathbf {b}}= \left[ {\mathbf {V}}\varvec{\varLambda }_{1,2}{\mathbf {V}}^{T}\right] ^{-1}{\mathbf {w}}_{j}\), and the diagonal values \(\sqrt{\frac{\triangle Q}{{\varLambda }_{1,2}^{ii}}}\) are systematically replaced by the nonlinear semi-axes \(s_{i}^{\pm }\) (for detail cf. Johansen 1977; Kalscheuer and Pedersen 2007). If \(w_{j,j}=-1\), the estimated parameter \(m_j^* + \triangle {m}_{j}\) is the extreme minimum value, otherwise if \(w_{j,j}=1\), it is the extreme maximum one. Performing the above procedure for all model parameters, we can obtain an extreme parameter set which is an equivalent model domain for which the cost function varies by no more than \(\triangle Q\). If the inversion problem was linear, the choice \(\triangle {Q}=1\) would yield extreme parameter sets with model parameter changes \(\triangle {m}_j\) that correspond to one standard deviation (see above). Note that this method takes the nonlinearity of the forward problems into account only through the nonlinear semi-axes along the model eigenvectors. Therefore, the above model uncertainty analysis is referred to as a partially nonlinear model analysis. In comparison with uncertainties estimated from linearised analysis (Eqs. 10 and 17), partially nonlinear model uncertainty estimates may give an indication on the severity of nonlinear effects. However, owing to the limitation of studying nonlinearity in the directions of model eigenvectors, uncertainty estimates from partially nonlinear model analysis may be far off those computed using methods that account for nonlinearity to a fuller extent, such as most-squares inversion (see below). For radio-magnetotelluric and geoelectric 2D inversion examples, Kalscheuer and Pedersen (2007) and Kalscheuer et al. (2010) found that, depending on the case, uncertainties estimated using most-squares inversion were 0.7 to 100 times the partially nonlinear model uncertainty estimates. Hence, even partially nonlinear model analysis may grossly underestimate model variability.

2.3 Nonlinear Model Analysis

We divide the available nonlinear model analyses into deterministic and stochastic approaches and discuss these in the following. In particular for the deterministic methods, we try to sort the methods into different categories. However, there is conceptual overlap between the methods and the categorisation is, to some extent, arbitrary.

2.3.1 Deterministic Methods

The equivalent models obtained by the above linearised or partially nonlinear model analysis tools operate solely in the vicinity of the preferred inversion model \({\mathbf {m}}^{*}\). To get more reliable estimates of the confidence levels of the model parameters, larger parts of the model domain need to be explored, involving nonlinear model analysis tools. Recalling the inversion procedure, the inversion solution depends on the data, the data weighting matrix (data uncertainties), the forward responses and their accuracy, the data misfit, the regularisation weighting factor, the initial model, the reference model, the model weighting matrix (model constraints), the model misfit and the value of the cost function. Therefore, considering different combinations of these factors, we can take different approaches to analyse the effects of nonlinearity. Typically, these approaches aim to find \(\triangle {\mathbf {m}}\) around the preferred inversion model \({\mathbf {m}}^{*}\)

$$\begin{aligned} \triangle {\mathbf {m}} = \triangle {\mathbf {m}} ({\mathbf {m}}^{*}, \triangle {\mathbf {d}}, \triangle {\mathbf {W}}_{d}, \triangle {\mathbf {F}}, \triangle Q_{\mathrm{d}}, \triangle \alpha , \triangle {\mathbf {m}}^{0}, \triangle {\mathbf {m}}_{r}, \triangle {\mathbf {W}}_{\mathrm{m}}, \triangle Q_{\mathrm{m}}, \triangle Q) \end{aligned}$$
(31)

subject to variations of the data \(\triangle {\mathbf {d}}\), variations of data uncertainties \(\triangle {\mathbf {W}}_{d}\), different numerical errors in computed forward responses \(\triangle {\mathbf {F}}\), variations of data misfits \(\triangle Q_{\mathrm{d}}\), variations of trade-off factors \(\triangle \alpha\) , changes of initial models \(\triangle {\mathbf {m}}^{0}\), changes of reference models \(\triangle {\mathbf {m}}_{r}\), variations of model weighting matrices (e.g., smoothness constraints) \(\triangle {\mathbf {W}}_{\mathrm{m}}\), variations of model misfits \(\triangle Q_{\mathrm{m}}\), and variations of values of cost functions \(\triangle Q\).

Generally, we only vary one parameter in Eq. (31) at a time and fix the other parameters. The various methods listed below explore different parts of the model space. Hence, it is meaningful to use different approaches to explore the equivalent domain more comprehensively. In particular, we need strategies which have the capability of jumping over local maxima in the cost function or traversing through possible valleys of the cost functional. No single method is guaranteed to have these properties. Again, this emphasises the need to consider different methods. As described in Sect. 2.2, deviations of a given model parameter from its value in the preferred model \({\mathbf {m}}^*\) by one standard deviation correspond to variations \(\triangle Q\) in the cost function or \(\triangle Q_{\mathrm{d}}\) in the data misfit (depending on the case) by a value of one. Thus, in exploring model uncertainty, models obtained by the various methods listed below should be compared and evaluated based on the actual variations \(\triangle Q\) or \(\triangle Q_{\mathrm{d}}\) from the values of the cost function \(Q\left( {\mathbf {m}}^*\right)\) or the data misfit \(Q_d\left( {\mathbf {m}}^*\right)\) for the preferred model \({\mathbf {m}}^*\). Except for the edgehog method and most-squares inversion, strict threshold values are often not applied to \(\triangle Q\) or \(\triangle Q_{\mathrm{d}}\). In some cases, e.g., when the data uncertainties are changed, a stringent comparison may not even be possible. There is also a trivial reason why model variations \(\triangle {\mathbf {m}}\) derived by a comparison of different methods may not be consistent. Most researchers are used to consider RMS misfits (where \(\text {RMS}=\sqrt{Q_{\mathrm{d}}/N}\)) and do not translate deviations in RMS misfit back to misfit deviations \(\triangle Q_{\mathrm{d}}\). Eventually, it is also a decision made by the user as to what misfit deviation seems to be acceptable and meaningful for a given problem. However, it is important to note that different “equivalent” models should generally be compared based on their misfits and that model variations \(\triangle {\mathbf {m}}\) for values of \(\triangle Q\) or \(\triangle Q_{\mathrm{d}}\) differing from one would even in close to linear cases not correspond to model uncertainties associated with 68 % confidence levels.

In the categories outlined above, these are the most commonly used methods:

  • Variations \(\triangle {\mathbf {m}}^*\):

    This approach builds on manual variation of the preferred inversion model. It is also loosely referred to as a nonlinear sensitivity test, because the forward responses of the modified model are computed to study the effect of model variation on data fit. Several model regions of particular geophysical or geological interest are selected. A set of new synthetic models is built by varying the resistivities or geometries of these selected areas. Using forward modelling, those synthetic models which generate acceptable data misfit (RMS) are treated as equivalent candidate models. Hence, models \({\mathbf {m}} = {\mathbf {m}}^{*}+\triangle {\mathbf {m}}\) that fulfil the condition

    $$\begin{aligned} Q_{\mathrm{d}} ({\mathbf {m}}^{*}+\triangle {\mathbf {m}}) < Q_{\mathrm{d}}^{*}, \end{aligned}$$
    (32)

    are added to the equivalent model domain, where \(Q_{\mathrm{d}}^{*}\) is a threshold value for data misfit deemed acceptable. Compared to other uncertainty analysis tools, this tool may be the most economical tool as only several forward modelling computations are needed to evaluate the model variations (Nolasco et al. 1998; Hübert et al. 2009; Juanatey et al. 2013; Hübert et al. 2013). More advanced versions of this approach account for the fact that model parameters outside a given region of interest need to be adjusted to generate forward responses with sufficiently low misfit. Thus, the model parameters of the region of interest are kept fixed in an inversion where the remaining model parameters are allowed to change (Park and Mackie 2000; Hübert et al. 2009; Juanatey et al. 2013; Hübert et al. 2013). We also can generate equivalent models by inverting responses computed from these new synthetic models and contaminated with noise representative of the field conditions (Kühn et al. 2014; Haghighi et al. 2018; Attias et al. 2018).

  • Variations \(\triangle {\mathbf {d}}\):

    Bootstrap re-sampling builds on randomly sampling a data set (population) K times and replacing L individual data points in each random sampling step. This makes it possible to calculate statistical properties in the form of averages and covariance matrices of these K data sets. In model analysis, this can be taken advantage of by computing K inversion models from these K data sets. For MT model analysis based on bootstrapping, Schnaidt and Heinson (2015) suggested to replace the complete impedance tensor by drawing from a Gaussian distribution in which the distribution mean and standard deviation are the actual measurement and its uncertainty, respectively. For appraisal of the model ensemble, two measures were introduced—one for the variability of cell wise model resistivities and one for the variability of model structures using the cell wise cross-gradient of two models. Schnaidt and Heinson (2015) emphasised that the bootstrapping method can be implemented without access to the source code of the inversion algorithm. The conceptual simplicity of the bootstrapping method is appealing. However, this comes at the cost of an increased computational load, as K full inversions need to be run. Generalised cross-validation (GCV, e.g., Aster et al. 2012) generates a suite of N different models by removing a certain number of data points at a time from the original data set and inverting the remaining data. These GCV-based modifications to the data set are used in order to identify a suitable Lagrange multiplier (see below). However, we are not aware of geo-electromagnetic examples where this sampling of the model space was used to appraise the preferred inversion model. Global seismological tomography problems often use travel time residuals with respect to the responses of a layered Earth model as input data to the inversion. The 2D or 3D model parameters are velocity perturbations to the layered background model. One of the approaches to model uncertainty analysis in global seismological tomography is to randomly redistribute the travel time residuals. Since the permuted residuals are assumed to correspond to pure noise, the resulting inverse solutions are considered representative of model uncertainty (Spakman and Nolet 1988; Bijwaard et al. 1998; Tryggvason et al. 2002). Typically, inversion models of several such permuted residual vectors are computed and the maximum velocity perturbations of the individual 2D or 3D cells are considered upper limits of model uncertainty. Note there is no direct analogue to this approach in electromagnetic methods, because we do not consider residuals.

  • Variations \(\triangle {\mathbf {W}}_{d}\):

    Changes to the data uncertainties in matrix \({\mathbf {W}}_{d}\) or application of error (uncertainty) floors to the data uncertainties generate a set of alternative models \({\mathbf {m}}\) (\({\mathbf {m}}= {\mathbf {m}}^{*}+\triangle {\mathbf {m}}\)). Since models should be evaluated for equivalence based on their data fit and the uncertainties are changed, we prefer to refer to these models as alternative models rather than as equivalent models. We can first use an initial uncertainty level to obtain the preferred inversion model \({\mathbf {m}}^{*}\). Then, a set of other inversion models can be obtained by gradually adjusting uncertainty levels. As the uncertainty values are changed gradually, it can be expected that these newly generated model variations are around the preferred model \({\mathbf {m}}^{*}\). When the maximum level of data uncertainty is tested, the maximum level of model variation is obtained. Changing uncertainty levels is a natural idea to test the model variability (Li et al. 2009), as actual noise levels are generally unknown in real field data.

  • Variations \(\triangle {\mathbf {F}}\):

    Over the last ten years, remarkable advances were made to decrease errors in the preferred inversion model caused by inaccurate approximations to the underlying physics. Improved forward modelling algorithms were developed to accurately include topographic effects using unstructured tetrahedral and hexahedral grids (Sambridge and Guðmundsson 1998; Ren and Tang 2010; Ansari and Farquharson 2014; Jahandari and Farquharson 2015; Yin et al. 2016; Cai et al. 2017; Yang et al. 2017; Li et al. 2017), to utilise accurate boundary conditions (Franke-Börner 2013), to employ accurate direct and iterative solvers for the resulting system of linear equations (Streich 2009; Kordy et al. 2016) and to adaptively refine the forward modelling mesh across pronounced contrasts in material parameters generating sharp changes in the electromagnetic field (Franke et al. 2007; Li and Pek 2008; Nam et al. 2010; Key and Ovall 2011; Schwarzbach et al. 2011; Ren et al. 2013; Ren and Tang 2014; Grayver and Kolev 2015; Ren et al. 2018). Therefore, we can ignore the effect of numerical errors of the approximation to the underlying physics, that is, \(\triangle {\mathbf {F}}\rightarrow 0\).

  • Variations \(\triangle \alpha\):

    To obtain the optimal weighting factor \(\alpha\), we should follow the physically meaningful idea that a reasonable inversion procedure is to gradually add small or local-scale features into the large-scale or background model during the inversion iterations. We can start with a large value of the weighting factor so that the initial inversions reconstruct predominantly the large-scale structures of the model. During later iterations, the value of \(\alpha\) can be gradually decreased so that small-scale changes in the data misfit term become more important. Thus, small-scale structures or anomalies are added to the inversion model. This approach of gradually reducing \(\alpha\) is often referred to as a cooling procedure. Alternatively, we can automatically estimate an optimal weighting factor by using the so-called discrepancy principle (Constable et al. 1987; Siripunvaraporn and Egbert 2000; Kalscheuer et al. 2010), the L-curve technique (Hansen 1992) or the generalised cross-validation (GCV) method (Farquharson and Oldenburg 2004; MacCarthy et al. 2011). The discrepancy principle determines the optimal weighting factor by performing a line search and selecting the weighting factor that yields a model with a data misfit close to the target data misfit, i.e. a model that is commensurate with the level of noise assumed in the data. Using, the L-curve criterion, the L-shaped \(Q_{\mathrm{d}}-Q_{\mathrm{m}}\) curve plot is built by running inversions with different weighting factors. The point where the maximum curvature occurs and the data misfit is acceptable is chosen as the optimal weighting factor. The GCV method applies the leave-one-out (or leave-a-certain-number-out) methodology to the data set in order to establish a new function in which \(\alpha\) is its only variable. Then, an optimal value of \(\alpha\) is obtained by minimising this newly created function. Both the L-curve and the GCV techniques can effectively determine optimal values of the weighting factor \(\alpha\) (Farquharson and Oldenburg 2004; Krakauer et al. 2004; Mojica and Bassrei 2015). However, the GCV technique has a tendency of biasing the final value of \(\alpha\) to its initial value. Therefore, it is better to use cooling strategies together with the GCV technique in initial iterations such that the constraints from the model regularisation term are adequately considered, which is essential to stabilise the inversion and guarantee the preferred inversion model approaching the geologically meaningful true model (Farquharson and Oldenburg 2004; Krakauer et al. 2004; Schwarzbach and Haber 2013; Usui 2015; Mojica and Bassrei 2015; Jahandari and Farquharson 2017; Wang et al. 2018a). The above three methods of changing \(\alpha\) are routinely used in inversions to produce the preferred inversion model. With different weighting factors \(\alpha\) (including the optimal one), we can explore the model space a bit.

  • Variations \(\triangle {\mathbf {m}}^{0}\):

    Owing to the nonlinear nature of electromagnetic inversion problems, different initial models \({\mathbf {m}}^{0}\) lead to differences in the inversion result \({\mathbf {m}}^*\). Depending on the nature of the forward problem and the model regularisation, the differences may be more or less pronounced. By including a Marquardt–Levenberg damping term that regularises to the model of the previous iteration, for instance, the dependence of the preferred inversion model on the initial model may be quite strong. Thus, varying the initial model \({\mathbf {m}}^{0}\) is a fairly direct approach of constructing a limited domain of equivalent models (Bai and Meju 2003; Schmoldt et al. 2014).

  • Variations \(\triangle {\mathbf {m}}_{r}\) and \(\triangle {\mathbf {W}}_{\mathrm{m}}\):

    In the following, we concisely describe three nonlinear model analysis tools which are designed for this task. Approaches based on variation of model constraints try to explore larger parts of the model domain by using different reference models (Oldenburg and Li 1999; Oldenborger et al. 2007), different regularisation (smoothness) constraints (Constable et al. 1987; Kalscheuer et al. 2007) and structural or petrophysical constraints relating to models from other geo-scientific methods (Gallardo and Meju 2004; Meju 2009; Gallardo and Meju 2011; Gao et al. 2012; Giraud et al. 2017) . It needs to be noted that variations in the model constraints change the misfit function \(Q_m\) and, with that, Q in Eq. (3), such that direct comparisons of misfit values \(Q_m\) computed using different model regularisations are not meaningful. However, in the absence of compelling evidence from other geo-scientific methods that is included in \(Q_m\), we can argue that the typically employed smoothness constraints are a mathematical method to overcome the non-uniqueness of the inversion problem and that we may not be particularly interested in the actual values of \(Q_m\). Most inversion algorithms follow this line of thought implicitly by evaluating convergence purely based on RMS fit. If the inversion is run with different reference models \({\mathbf {m}}_r\) or model weighting matrices \({\mathbf {W}}_m\), the inversion models will locate in different regions of the model domain. Unless the RMS misfits are unacceptable, these inversion models should, at least in part, be close to the true model. This assumption is used to derive a depth-of-investigation (DOI) index for each model cell and was originally presented by Oldenburg and Li (1999). To analyse the model using DOI indices, we require two inversion models to compute a model quality measure vector \({\mathbf {D}}=\{D_{j},j=1,\ldots ,M\}\)

    $$\begin{aligned} D_{j} = \frac{m_{j}^{1}-m_{j}^{2}}{m_{r,j}^{1}-m_{r,j}^{2}}, \end{aligned}$$
    (33)

    where \({\mathbf {m}}^{1}\) and \({\mathbf {m}}^{2}\) are two inversion models obtained using different reference models or constraints, in this case using different reference models \({\mathbf {m}}_{r}^{1}\) and \({\mathbf {m}}_{r}^{2}\). When \(D_{j}\) is close to zero, this suggests that the similarity in the inverted model parameters is caused by the data and, thus, that the jth model parameter is well resolved. The success of the DOI-based model analysis strongly depends on the experience of the users in trying different reference models. DOI indices can only be meaningfully computed for inversion models that have nearly identical RMS misfits. This is so, because systematic differences in data fit correspond to localised deviations between the inversion models. These localised deviations will falsely reflect in high DOI values, even though the model regions are well determined by the data. This means that the chosen model constraints are inappropriate in at least one of the inversions.

  • Variations \(\triangle Q_{\mathrm{d}}\) and \(\triangle Q_{\mathrm{m}}\):

    The Edgehog method (Jackson 1973) can be used to compute another larger equivalent model domain corresponding to a deviation \(\triangle Q\). It tries to generate the equivalent model domain \({\mathbf {m}}\) (\({\mathbf {m}}= {\mathbf {m}}*+\triangle {\mathbf {m}}\)) by using two inequality conditions \(\triangle Q_{\mathrm{d}} \le \triangle Q_{\mathrm{d}}^{*}\) and \(\triangle Q_{\mathrm{m}} \le \triangle Q_{\mathrm{m}}^{*}\), where \(\triangle Q_{\mathrm{d}}+ \alpha \triangle Q_{\mathrm{m}} =\triangle Q\), and \(\triangle Q_{\mathrm{d}}^{*}\) and \(\triangle Q_{\mathrm{m}}^{*}\) are two threshold values for maximum deviations of data misfit and model misfit, respectively. To find \(\triangle {\mathbf {m}}\), we need to solve the following two sub-problems:

    $$\begin{aligned} \triangle Q_{\mathrm{d}} = \triangle Q_{\mathrm{d}}^{*}\quad \text {and} \quad \triangle Q_{\mathrm{m}} \le \triangle Q_{\mathrm{m}}^{*}, \end{aligned}$$
    (34)

    and

    $$\begin{aligned} \triangle Q_{\mathrm{m}} = \triangle Q_{\mathrm{m}}^{*}\quad \text {and} \quad \triangle Q_{\mathrm{d}} \le \triangle Q_{\mathrm{d}}^{*}. \end{aligned}$$
    (35)

    The equality condition above defines a closed hypersurface and the inequality condition defines a solid volume, that is, a hyperellipsoid in linear inversion problems. In the Edgehog method, the intersections of these hypersurfaces and closed volumes define the equivalent model domain. Compared to the most-squares method (see below), where the goal is to estimate model uncertainty by allowing for maximum variations of the cost function which is a sum of the data misfit and model misfit terms, the Edgehog method estimates the model uncertainty by individually considering the variations of either data misfit or model misfit. The Edgehog method was initially used in seismic problems (Lines and Treitel 1985). However, it is rarely used in geo-electromagnetic induction problems.

  • Variations \(\triangle Q\):

    The most-squares method (Jackson 1976, 1979; Lines and Treitel 1985; Meju and Hutton 1992; Meju 1994, 2009; Kalscheuer and Pedersen 2007; Kalscheuer et al. 2010; Mackie et al. 2018) is generally used to compute a more representative equivalent model domain \({\mathbf {m}}\) (\({\mathbf {m}}= {\mathbf {m}}^{*}+\triangle {\mathbf {m}}\)) than that obtained by the partially nonlinear pseudo-hyperellipsoid technique. This equivalent model domain is constructed by identifying the extreme points of a scalar functional \({\mathbf {m}}^{T}{\mathbf {w}}\) subject to the requirement that the maximum allowed variations of objective function Q is \(\triangle Q\). This idea is formulated as:

    $$\begin{aligned}&\text {minimise or maximise} \quad {\mathbf {m}}^{T}{\mathbf {w}} \nonumber \\&\text {subject to} \quad Q = Q({\mathbf {m}}^{*}) + \triangle Q, \end{aligned}$$
    (36)

    where \({\mathbf {w}}\) is a constant unit vector along the direction of a parameter or a parameter combination of interest, and \(Q({\mathbf {m}}^{*})\) is the value of the cost function at the preferred inversion model \({\mathbf {m}}^{*}\). Using a Lagrange multiplier \(\gamma\), Eq. (36) is transformed into a unconstrained optimisation problem. Its goal is to find the extreme model increments (\(\triangle {\mathbf {m}}\)) around the given minimum-misfit model \({\mathbf {m}}^{*}\), such that the cost functional

    $$\begin{aligned} Q_{ms}&= {\mathbf {m}}^{T}{\mathbf {w}}+ \gamma [Q-Q({\mathbf {m}}^{*})-\triangle Q] \end{aligned}$$
    (37)

    is minimised. As \(Q_{ms}\) is a non-quadratic functional with respect to \(\triangle {\mathbf {m}}\), Eq. (37) has to be solved iteratively. To guarantee the stability of the approach and to allow the algorithm to adjust to deviations of the iso-surfaces of Q from hyperellipsoidal form, it seems imperative to keep the Lagrange multiplier \(\alpha\) in Q (Eq. 3) fixed at the value that generated the minimum-misfit model \({\mathbf {m}}^{*}\), to perform a relatively large number of iterations (20-30 per direction) with progressively increasing target misfit and to safeguard the inversion algorithm with additional Marquardt–Levenberg damping. The latter avoids abrupt jumps in model space, but requires an optimal damping factor to be searched in every iteration. It is not unusual that the analysis of one model parameter takes about twenty times the computational effort to construct \({\mathbf {m}}^{*}\). Hence, in 2D inversions only the uncertainty levels of a few to several representative cells are investigated. The reliability of a most-squares inversion is highly affected by the quality of the chosen preferred model \({\mathbf {m}}^{*}\). If this model does not have minimum misfit, deviations from the minimum-misfit model will directly reflect in modifications to the most-squares inversion results and mislead the interpretation by giving the false impression of an increased level of nonlinearity in the forward problem. Note that the nonlinear semi-axes in a pseudo-hyperellipsoidal description of the misfit hypersurface (cf. Section 2.2) are also adversely affected by a displacement of the preferred model from the minimum-misfit model. The most-squares method was developed by Jackson (1976) and Jackson (1979) for linear problems. Subsequently, Meju and Hutton (1992) extended its capability to dealing with nonlinear problems. After a further extension to include prior information (Meju 1994), Meju (2009) presented a generalised most-squares method with the additional capability of dealing with multiple uncertainties and constraints (e.g., cross-gradient constraints) to compute extreme parameter sets.

2.3.2 Stochastic Methods

Global search algorithms or stochastic inversion algorithms naturally perform a comprehensive model analysis as the entire model domain can be explored based mostly on forward computations. The key idea of these fully nonlinear stochastic optimisation algorithms is to randomly generate a large set of candidate models which represent the entire model space in terms of their cost function distribution. Using a particular rule to accept or reject these candidate models, we finally obtain an ensemble of models which forms the stochastic equivalent model domain.

Stochastic inversion algorithms are based on the Bayesian theorem which states (Tarantola 2005)

$$\begin{aligned} p\left( {\mathbf {m}}|{\mathbf {d}}\right) =\frac{p\left( {\mathbf {d}}|{\mathbf {m}}\right) p\left( {\mathbf {m}}\right) }{p({\mathbf {d}})}, \end{aligned}$$
(38)

where \(p({\mathbf {m}}|{\mathbf {d}})\) is the a posteriori probability density function, which indicates the probability of having the model \({\mathbf {m}}\) which satisfies the given measured data set \({\mathbf {d}}\), \(p({\mathbf {d}}|{\mathbf {m}})\) is the conditional probability density function of the data \({\mathbf {d}}\) for a given model \({\mathbf {m}}\), \(p({\mathbf {m}})\) is the prior probability density function for the model parameters, and \(p({\mathbf {d}})\) is the evidence density function. As the measured data set can be considered a fixed vector, \(p({\mathbf {d}})\) is a constant number which can be ignored. The conditional probability density function \(p({\mathbf {d}}|{\mathbf {m}})\) establishes a relationship between the measured data set and the model parameters, which can be written as:

$$\begin{aligned} p({\mathbf {d}}|{\mathbf {m}})&= C_{d} \exp \left( -\left( {\mathbf {d}}-{\mathbf {F}}\left[ {\mathbf {m}}\right] \right) ^{T}{\mathbf {W}}_{d}^{T}{\mathbf {W}}_{d}\left( {\mathbf {d}}-{\mathbf {F}}\left[ {\mathbf {m}}\right] \right) \right) \nonumber \\&= C_{d} \exp \left( -Q_{\mathrm{d}}({\mathbf {d}},{\mathbf {m}})\right) , \end{aligned}$$
(39)

and the prior probability density \(p({\mathbf {m}})\) can be written as:

$$\begin{aligned} p({\mathbf {m}})&= C_{m}\exp \left( -\left( {\mathbf {m}}-{\mathbf {m}}_{r}\right) ^{T}{\mathbf {W}}_{\mathrm{m}}^{T}{\mathbf {W}}_{\mathrm{m}}\left( {\mathbf {m}}-{\mathbf {m}}_{r}\right) \right) = C_{m}\exp \left( -Q_{\mathrm{m}}({\mathbf {m}})\right) . \end{aligned}$$
(40)

Substituting Eqs. (40) and (39) into Eq. (38), we have:

$$\begin{aligned} p({\mathbf {m}}|{\mathbf {d}})&= C_{Q}\exp \left( -(Q_{\mathrm{d}}({\mathbf {d}},{\mathbf {m}})+Q_{\mathrm{m}}({\mathbf {m}}))\right) = C_{Q}\exp (-Q), \end{aligned}$$
(41)

where \(C_{d}\), \(C_{m}\) and \(C_{Q}\) are three normalising constant factors. Comparing Eq. (3) to Eq. (41), we conclude that deterministic linearised inversion algorithms try to locate the desired model by minimising the quadratic function formed by Taylor expansion, whereas stochastic inverse algorithms try to build an ensemble of models sampling the above posterior probability density function \(p({\mathbf {m}}|{\mathbf {d}})\) in terms of different sampling methods such as the random walk or the Metropolis–Hastings sampling algorithm. When stochastic inverse algorithms terminate, the resulting model ensembles are supposed to be representative of the posterior probability density function and to have high and low numbers of ensemble members in high- and low-likelihood regions, respectively. Hence, for each model parameter \(m_{j}\), its marginal (normalised) density \(p_{j}(m_{j})\) is available. Based on this, the ensemble mean value of the jth model parameter (\(\bar{m_{j}}\)) and its variance (\(C_{j}\)) are estimated as:

$$\begin{aligned} \bar{m_{j}}&= \int m_{j} p_{j} (m_{j}) dm_{j}, \end{aligned}$$
(42)
$$\begin{aligned} C_{j}&= \int (m_{j}-\bar{m_{j}} )^2 p_{j} (m_{j}) d m_{j}. \end{aligned}$$
(43)

Generally speaking, stochastic inversion algorithms naturally offer model variances or the equivalent model domain, but they do not offer a mechanism to estimate the model resolution matrix. Also, on the order of hundreds of thousand ensemble models are required to obtain a reliable presentation of the posterior distribution leading to a high computational load in solving forward problems. For more detail on the progress of model uncertainty estimation using Bayesian algorithms in geo-electromagnetic problems, please refer to a recent review paper (Pankratov and Kuvshinov 2016).

2.4 Computationally Efficient Approaches

The construction of the equivalent model domain using linearised and nonlinear model analysis tools needs powerful computational resources. The speed of nonlinear model analysis mainly depends on the efficiency of the forward modelling solver and, where applicable, the speed of the inversion routine. The acceleration of nonlinear model analysis tools will depend on developing fast linear solvers (Egbert and Kelbert 2012; Chung et al. 2014; Newman 2014; Koldan et al. 2014; Kelbert et al. 2014; Puzyrev et al. 2016; Shantsev et al. 2017), efficient usage of supercomputers (Key and Ovall 2011; Egbert and Kelbert 2012; Puzyrev et al. 2013; Weiss 2013; Ren et al. 2014; Kelbert et al. 2014; Grayver and Kolev 2015; Kruglyakov and Bloshanskaya 2017; Castillo-Reyes et al. 2018), improvements of the available nonlinear model analysis methods and development of new nonlinear model analysis tools. In the remainder of this section, our emphasis is to introduce several acceleration strategies or algorithms to compute the model resolution matrix \({\mathbf {R}}_{\mathrm{M}}\) and model covariance matrix \({\mathbf {C}}\). Their performance is the key factor for practical linearised and nonlinear model analyses.

2.4.1 Parallel Direct Solvers

As evident from Eqs. (7), (10) and (11), if the sensitivity matrix is explicitly computed, then the direct computation approach is a natural way of computing the resolution matrix \({\mathbf {R}}_{\mathrm{M}}\) and covariance matrix \({\mathbf {C}}\). In the model analysis of the linear seismic tomography problem, direct matrix computation is popular, because the involved sensitivity matrices are highly sparse. For instance, Boschi (2003) and Soldati and Boschi (2005) successfully used the parallel Cholesky factorisation technique to compute the model resolution matrix in global body wave tomography. Bogiatzis et al. (2016) used the sparse QR factorisation to compute the full resolution matrix of the seismic tomography problem. In contrast, the sensitivity matrix is dense in geo-electromagnetic inversion problems. However, using state-of-the-art parallel matrix solvers (Verbosio et al. 2017; Amestoy et al. 2001), the direct matrix computation approach is preferred for 1D and 2D geo-electromagnetic data sets, as it can naturally account for both data misfit and model regularisation terms.

2.4.2 Partial or Truncated SVD Analysis

Instead of computing all eigenvectors in the SVD method, the truncated SVD (TSVD) technique only uses the first p eigenvectors with p much smaller than the number of model parameters M, \(p<M\). The TSVD technique is typically used to decompose the weighted sensitivity matrix and, thus, to study the contributions to model resolution and uncertainty that are purely related to the data. Ignoring the contributions that come from the model constraints in \(Q_m\) is justifiable in situations where no meaningful prior information exists and model constraints, e.g., smoothness constraints, are added to make the inversion problem unique (Kalscheuer and Pedersen 2007; Miller and Routh 2007). At most, the TSVD analysis allows for a damping model regularisation term (i.e. the Marquardt–Levenberg damping factor). The TSVD of the weighted sensitivity matrix is (Xu 1998; Golub and Van Loan 2012; Li and Li 2017):

$$\begin{aligned} {\mathbf {J}}_{d} \approx {\mathbf {U}}_p\varvec{\varLambda }_p{\mathbf {V}}_p^{T}, \end{aligned}$$
(44)

where the columns of orthogonal matrix \({\mathbf {U}}_p\) of size \(N \times p\) are the first p eigenvectors of \({\mathbf {J}}_{d}{\mathbf {J}}_{d}^{T}\), the columns of matrix \({\mathbf {V}}_p\) of size \(M \times p\) are the first p eigenvectors of \({\mathbf {J}}_{d}^{T} {\mathbf {J}}_{d}\), and diagonal matrix \(\varvec{\varLambda }_p\) contains the first p non-negative singular values in descending order. Considering the corresponding expressions of the GSVD method in Sect. 2.1, setting \({\mathbf {W}}_{\mathrm{m}}={\mathbf {I}}\) leads to \({\mathbf {V}}={\mathbf {U}}_{2}^T\) and \({\varvec{\varLambda }}_{2}={\mathbf {I}}\) and, with that, to the above truncated singular value decomposition with additional Marquardt–Levenberg damping. Using Eq. (44), the model resolution matrix can be computed as \({\mathbf {R}}_{\mathrm{M}} = {\mathbf {V}}_p (\varvec{\varLambda }_p^{T}\varvec{\varLambda }_p+\alpha {\mathbf {I}})^{-1} \varvec{\varLambda }_p^{T}\varvec{\varLambda }_p {\mathbf {V}}_p^{T}\), and the model covariance matrix is evaluated as \({\mathbf {C}}= {\mathbf {V}}_p (\varvec{\varLambda }_p^{T}\varvec{\varLambda }_p+\alpha {\mathbf {I}})^{-2} \varvec{\varLambda }_p^{T}\varvec{\varLambda }_p{\mathbf {V}}_p^{T}\). Rather than computing a full SVD and then only picking the contributions of the first p singular values, it is meaningful to compute a partial SVD and compute and add contributions of singular values with higher index p as needed. Gilbert (1971) first proposed that using eigenvector decomposition can provide flexibility and numerical stability for estimating the model resolution. Using the standard SVD algorithm, Wiggins (1972) analysed the model resolution and model uncertainty for surface-wave and free-oscillation observations, and Pedersen (1977) accelerated the generalised linear inverse approach. Jupp and Vozoff (1975) and Lines and Treitel (1984) applied the TSVD inversion and analysis to geoelectric and MT data and to reflection seismic and gravity data, respectively.

An analogy to the SVD analysis is the fast LSQR algorithm for sparse linear equations and sparse least squares (Paige and Saunders 1982), which can be thought of as an approach to iteratively reduce matrix \({\mathbf {J}}_{d}\) into a bi-diagonal matrix and two orthogonal matrices. The advantage of the LSQR method is that it only needs the multiplication of matrices and vectors during the iteratively performed decomposition. Therefore, it has the capability of speeding up the model analysis for large-scale 2D and 3D problems. Currently, the LSQR technique is widely adopted in the linear seismic tomography problem (Zhang and McMechan 1995; Yao et al. 1999; Tryggvason et al. 2002; Vasco et al. 2003; Osypov et al. 2013; Chen and Xie 2017). Until now, the LSQR technique has not been widely used in 2D and 3D geo-electromagnetic induction problems. But, we note that the potential advantages of the LSQR method deserve further investigation in dealing with geo-electromagnetic induction problems.

2.4.3 Krylov Subspace Methods

Conjugate gradient (CG), nonlinear conjugate gradient (NLCG) and limited-memory quasi Newton (LMQN) methods (Nocedal and Wright 2006) are popular choices for iteratively minimising the cost function in large-scale 2D and 3D electromagnetic inversion problems (Mackie and Madden 1993; Rodi and Mackie 2001; Avdeev and Avdeeva 2009; Egbert and Kelbert 2012). Therefore, development of 2D and 3D model analysis algorithms that utilise and integrate with these iterative solvers is highly desirable. To our knowledge, no attempts in this direction have been made in the electromagnetic induction community to date. However, there are several papers on seismic inversion (e.g., Zhang and McMechan 1995; Minkoff 1996; Yao et al. 1999) which may serve as a source of inspiration for future development.

In one of these examples, Minkoff (1996) applied the CG method to iteratively solve the normal equations. However, neither the sensitivity matrix nor the generalised inverse is explicitly computed and stored in the CG method. Thus, an approximate model resolution matrix was generated during the inversion by computing eigenvalues and eigenvectors of the generalised inverse with the Lanczos method (Minkoff 1996). Following a similar line of thought, Yao et al. (1999) computed approximations to the model covariance and resolution matrices while an augmented system of linear equations was iteratively solved using the LSQR method.

2.4.4 Extraction of Rows, Columns and Diagonals

For problems with large numbers of observations and model parameters, a practical way of conducting the model analysis is to extract only the rows, columns or diagonal entries corresponding to the model parameters of interest from the resolution and covariance matrices. Since \({\mathbf {R}}_{\mathrm{M}} ={\mathbf {J}}_{w}^{-g}{\mathbf {J}}_{d}\), the jth row of \({\mathbf {R}}_{\mathrm{M}}\) equals the jth row of the generalised inverse \({\mathbf {J}}_{w}^{-g}\) times the weighted Jacobian matrix \({\mathbf {J}}_{d}\), where \({\mathbf {J}}_{w}^{-g} = {\mathbf {A}}^{-1} {\mathbf {J}}_{d}^{T}\) with \({\mathbf {A}}={\mathbf {J}}_{d}^{T}{\mathbf {J}}_{d}+\alpha {\mathbf {W}}_{\mathrm{m}}^{T}{\mathbf {W}}_{\mathrm{m}}\), here matrix \({\mathbf {A}}\) is symmetric. Now, in computing the jth row of the generalised inverse matrix \({\mathbf {J}}_{w}^{-g}\), the time-consuming part is to compute the jth row of the inverse of matrix \({\mathbf {A}}\). Noting the identity \({\mathbf {A}}{\mathbf {A}}^{-1}={\mathbf {I}}\), where \({\mathbf {I}}\) is a unit square matrix, and denoting the jth column of \({\mathbf {A}}^{-1}\) (which is its jth row as \({\mathbf {A}}^{-1}\) is symmetric) by vector \({\mathbf {c}}^{j}\) and \({\mathbf {I}}^{j}\) as the jth column of the identity matrix, we have (Menke 2015):

$$\begin{aligned} {\mathbf {A}} {\mathbf {c}}^{j} = {\mathbf {I}}^{j} . \end{aligned}$$
(45)

Since matrix \({\mathbf {A}}\) is assumed to be symmetric positive definite, we can employ well-established iterative solvers such as the conjugate gradient solvers (Saad 2003) or the LSQR decomposition solver (Paige and Saunders 1982) to quickly obtain the jth column (\({\mathbf {c}}^{j}\)) of matrix \({\mathbf {A}}^{-1}\), and finally the desired row of the resolution matrix, \({\mathbf {a}}^{j}\). The routine of computing the multiplication of matrix \({\mathbf {A}}\) with vectors which is needed by iterative solvers can be accelerated, if the sensitivity matrix is arranged to have a suitable structure. For instance, in the popular ModEM inversion code (Egbert and Kelbert 2012; Kelbert et al. 2014), the authors observed that once the rows of the sensitivity matrix could be grouped into independent sets according to transmitter affiliation, computation of the sensitivity matrix could be easily parallelised over the transmitters and significantly accelerated. Similarly, the above methodology can be directly applied to compute rows (or columns) of the symmetric model covariance matrix \({\mathbf {C}}\) given in Eq. (11).

To compute the jth column (\({\mathbf {b}}^{j}\)) of the model resolution matrix (the point-spread function), we have \({\mathbf {b}}^{j}={\mathbf {R}}_{\mathrm{M}} {\mathbf {I}}^{j}\). Considering the discussion preceding Eq. (45), \({\mathbf {b}}^{j}\) satisfies the following linear system of equations:

$$\begin{aligned} {\mathbf {A}} {\mathbf {b}}^{j} = {\mathbf {J}}_{d}^{T} {\mathbf {J}}_{d} {\mathbf {I}}^{j} . \end{aligned}$$
(46)

Again, we can use conjugate gradient solvers (Saad 2003) or the LSQR decomposition solver (Paige and Saunders 1982) to quickly solve the above linear system of equations.

When all model parameters are perfectly resolved, i.e. when the problem is even- or over-determined, we have \({\mathbf {R}}_{\mathrm{M}}={\mathbf {I}}\). Hence, diagonal entries that differ from one indicate deteriorated model resolution. Furthermore, the diagonal elements of the model covariance matrix are the standard variances of model parameters. Therefore, the diagonal elements of the model resolution and covariance matrices are used as proxies in evaluating model uniqueness and uncertainty. We can either use the above naive direct matrix computation approach or the above row and column extraction approaches to compute the diagonal entries. Johansen (1977) suggests that the success of using diagonal elements is from the fact that the nonlinearity of the original inversion problem with conductivities or resistivities as model parameters is reduced to a moderate one when the model parameter is transformed to the logarithm of conductivity or resistivity. For simple layered Earth models, Auken and Christiansen (2004) directly extracted the diagonal elements of the model covariance matrix to assist the interpretation of laterally constrained inversion models computed from direct-current resistivity data. Similarly, Juhojuntti and Kamm (2015) used the directly extracted diagonal elements of the covariance matrix to evaluate single and joint inversion models of seismic refraction and resistivity data. For the computed three-layered models with undulating interfaces, the joint inversion model was demonstrated to have the lowest uncertainty.

For large-scale 2D problems and 3D problems, we may prefer the following stochastic algorithm to quickly extract the diagonal elements of resolution and covariance matrices. Using a set of s random vectors with length of m, \({\mathbf {v}}^{j}\), \(j=1,\ldots ,s\), the sth estimate of the diagonal matrix \({\mathbf {D}}^{s}\) of a general square matrix \({\mathbf {B}}\) can be calculated as:

$$\begin{aligned} {\mathbf {D}}^{s} = \left[ \left( \sum _{j=1}^{s}{\mathbf {v}}^{j} \odot {\mathbf {B}}{\mathbf {v}}^{j}\right) \oslash \left( \sum _{j=1}^{s}{\mathbf {v}}^{j} \odot {\mathbf {v}}^{j}\right) \right] , \end{aligned}$$
(47)

where \(\odot\) represents element-wise multiplication of vectors and \(\oslash\) denotes element-wise division of matrices. When the components of the random vectors \({\mathbf {v}}^{j}\) are equally probably drawn between the values of 1 and -1, the convergence of the estimated diagonal matrix \({\mathbf {D}}^{s}\) to the true diagonal of matrix \({\mathbf {B}}\) can be mathematically guaranteed (Hutchinson 1990). In a practical implementation of this stochastic-based algorithm to compute the diagonal, we can try different sets of random vectors to check its convergence, for details please refer to Bekas et al. (2007). In Eq. (47), we only need to compute the multiplication of matrix \({\mathbf {B}}\) to a random vector, that is, \({\mathbf {B}}{\mathbf {v}}^{j}\). Using the relationship \({\mathbf {R}}_{\mathrm{M}}={\mathbf {A}}^{-1}{\mathbf {J}}_{d}^{T}{\mathbf {J}}_{d}\) (with \({\mathbf {A}}={\mathbf {J}}_{d}^{T}{\mathbf {J}}_{d}+\alpha {\mathbf {W}}_{\mathrm{m}}^{T}{\mathbf {W}}_{\mathrm{m}}\)), this multiplication is transformed to solve the following linear system of equations by setting \({\mathbf {B}}={\mathbf {R}}_{\mathrm{M}}\):

$$\begin{aligned} {\mathbf {A}} \left[ {\mathbf {R}}_{\mathrm{M}}{\mathbf {v}}^{j}\right] = {\mathbf {J}}_{d}^{T} {\mathbf {J}}_{d} {\mathbf {v}}^{j} , \end{aligned}$$
(48)

which will produce the diagonal of resolution matrix \({\mathbf {R}}_{\mathrm{M}}\). By setting \({\mathbf {B}}={\mathbf {C}}={\mathbf {A}}^{-1}\), we have:

$$\begin{aligned} {\mathbf {A}} \left[ {\mathbf {C}}{\mathbf {v}}^{j}\right] = {\mathbf {v}}^{j} , \end{aligned}$$
(49)

which will produce the diagonal of covariance matrix \({\mathbf {C}}\). Similar results can be obtained for the case of \({\mathbf {C}}={\mathbf {A}}^{-1}{\mathbf {J}}_{d}^{T}{\mathbf {J}}_{d}{\mathbf {A}}^{-1}\) as defined in Eq. (11) where the reference model is considered a fixed vector. Since matrix \({\mathbf {A}}\) is symmetric positive definite using appropriate model regularisation terms, the above linear system of equations can be easily solved by fast iterative or direct solvers (Saad 2003). This stochastics-based estimation algorithm for matrix diagonals was adopted by MacCarthy et al. (2011) who achieved their goal of quickly computing the diagonal of the model resolution matrix for a Tikhonov-regularised teleseismic inversion problem with approximately 260,000 model parameters. Similarly, Trampert et al. (2013) successfully carried out model resolution analyses for large-scale seismic tomography problems with 8000 model parameters. Please refer to An (2012) and Fichtner and van Leeuwen (2015) (and references therein) for other stochastics-based fast matrix extraction techniques. The success of the above algorithm to compute matrix diagonals arises from the capability of random probing techniques to extract properties of general and large matrices. Currently, these fast computation algorithms are only widely used in seismic tomography problems, but the geo-electromagnetic induction community could expect benefits by using these methods in model analysis (Mitchell and Oldenburg 2016).

2.4.5 Iterative Update Formula

Sometimes, we want to compute the updated resolution and covariance matrices for a newly acquired data set starting from existing resolution and covariance matrices computed from the data of an earlier experiment. This assumes that the modelling mesh does not have to be changed when adding new data. For simplicity, we assume that only one new measurement is added. Then, the new sensitivity matrix with \(N+1\) measurements takes the form

$$\begin{aligned} {\mathbf {J}}^{N+1}&= \begin{bmatrix} {\mathbf {J}}^{N} \\ {\mathbf {g}} \end{bmatrix} , \end{aligned}$$
(50)

where \({\mathbf {J}}^{N}\) is the existing sensitivity matrix for an experiment with N measurements and M model parameters, and \({\mathbf {g}}\) is the new row sensitivity vector of size \(1\times M\) containing the sensitivities of the new measurement to the model parameters. We rewrite the model resolution matrix as \({\mathbf {R}}_{\mathrm{M}}={\mathbf {E}}{\mathbf {F}}\), where \({\mathbf {E}} =[{\mathbf {J}}_{d}^{T}{\mathbf {J}}_{d}+\alpha {\mathbf {W}}_{\mathrm{m}}^{T}{\mathbf {W}}_{\mathrm{m}}]^{-1}\) and \({\mathbf {F}}= {\mathbf {J}}_{d}^{T}{\mathbf {J}}_{d}\), and denote the resolution matrix of the base problem (with N measurements) by \({\mathbf {R}}_{\mathrm{M}}^{N}={\mathbf {E}}^{N}{\mathbf {F}}^{N}\). Using the Sherman–Woodbury formula (Shelrman and Morrison 1950; Hager 1989), \({\mathbf {F}}^{N+1}\) and \({\mathbf {E}}^{N+1}\) for the updated set of measurements can be computed as:

$$\begin{aligned} {\mathbf {F}}^{N+1}&= {\mathbf {F}}^{N} + {\mathbf {g}}^{T}{\mathbf {g}} , \end{aligned}$$
(51)
$$\begin{aligned} {\mathbf {E}}^{N+1}&= {\mathbf {E}}^{N} - \frac{1}{1+{\mathbf {g}}{\mathbf {z}}^{T}}{\mathbf {z}}^{T}{\mathbf {z}}, \end{aligned}$$
(52)

where \({\mathbf {z}} = {\mathbf {g}}{\mathbf {E}}^{N}\). The resolution and covariance matrices can be easily updated as \({\mathbf {R}}_{\mathrm{M}}^{N+1}= {\mathbf {E}}^{N+1}{\mathbf {F}}^{N+1}\) and \({\mathbf {C}}^{N+1}= {\mathbf {E}}^{N+1}\), respectively. The above formula also works in the case that, instead of one measurement, a set of measurements is added (where \({\mathbf {g}}\) would contain the sensitivities for these new measurements), and it avoids the computationally heavy task of performing the matrix inverse operation in \({\mathbf {R}}_{\mathrm{M}}\). Only multiplications of matrices to vectors and vectors to vectors are needed. Therefore, the computational cost of the update formula is significantly less than that of the original one where the matrix inverse is involved. In practical applications, we can start with a base problem with a moderate set of measurements. Then, by iteratively adding new measurements, the final resolution and covariance matrices for large-scale 3D problems can be efficiently computed in terms of the above update formula.

2.4.6 Model Reduction Techniques

The huge computational cost makes 3D model analysis nearly impossible. To reduce the computational cost, we may be able to reduce the number of model parameters by transformation. Assume \({\mathbf {m}}^{*}\) is the optimal model obtained by an inverse algorithm. Taking the 2D case as an example, the model vector can be naturally arranged as a model matrix with size of \(r \times c = M\), where r and c are the dimensions along the horizontal and vertical directions, respectively. The model matrix can be decomposed using the TSVD technique:

$$\begin{aligned} {\mathbf {m}}^{*} = {\mathbf {U}}_p\varvec{\varLambda }_p{\mathbf {V}}_p^{T} = \sum _{i=1}^{p} \lambda _{i} {\mathbf {u}}_{i}{{\mathbf {v}}_{i}}^{T}, \end{aligned}$$
(53)

where the diagonal matrix \(\varvec{\varLambda }_p\) contains p nonzero singular values \(\lambda _{i}\), and \({\mathbf {u}}_{i}\) and \({\mathbf {v}}_{i}\) are the ith columns (eigenvectors) of \({\mathbf {U}}_p\) and \({\mathbf {V}}_p\), respectively. The benefit of the above decomposition is that, after applying a threshold value to the singular values, the number of nonzero singular values should be much less than the dimension of the real model matrix, \(p<r,p<c\) so that \(p<M\). Therefore, the model dimensionality has been significantly reduced. Here, the p matrices \({\mathbf {u}}_{i}{\mathbf {v}}_{i}^{T}\) with \(i=1,\ldots ,p\) act as the basis to expand the model matrix in a linear space, and the singular values \(\lambda _{i}\) are the expansion coefficients. Imaging the geometry of the topography of the cost function where the desired optimal model \({\mathbf {m}}^{*}\) locates in a valley where the cost function is minimised, its nearby models should belong to the same linear space as model \({\mathbf {m}}^{*}\). The bounds of these equivalent nearby models can be estimated by using prior constraints, such as its upper limit (\({\mathbf {m}}_{up}\)) and lower limit (\({\mathbf {m}}_{down}\)). Then, all equivalent models in this valley can be expressed as (Tompkins et al. 2011; Fernández-Martínez 2015):

$$\begin{aligned} {\mathbf {m}}_{down} \le \sum _{i=1}^{p} a_{i} {\mathbf {u}}_{i}{{\mathbf {v}}_{i} }^{T}\le {\mathbf {m}}_{up}. \end{aligned}$$
(54)

Using this expansion, the coefficients \(a_{i}\) need to be determined.

Equation (54) actually defines a hyperpolyhedron which not only contains the equivalent models but also contains geologically non-meaningful models. Therefore, this hyperpolyhedron may contain the pseudo-hyperellipsoidal domain (see Sect. 2.2). As models in the pseudo-hyperellipsoidal domain should satisfy the requirement of the misfit being less than a given threshold, we need to compute the forward responses to screen out models in the hyperpolyhedron domain which violate the misfit criterion. Although Eq. (53) is for a 2D case, higher-dimensional cases such as the 3D case can be formulated using a similar approach (De Lathauwer et al. 2000), as we can simply treat 3D model parameters as a set of 2D slides along one fixed direction. The above model dimensionality reduction technique offers a way of cheaply constructing the equivalent model domain. It generates a sampling space in which each model has high probability to satisfy the misfit requirement. Therefore, it can naturally work well with stochastic inversion algorithms to reduce the sampling cost. Its benefits have been demonstrated in stochastics-based linear seismic tomography problems (Fernández-Martínez 2015). Other model reduction techniques using either Fourier bases (Lin et al. 2012; Fernández-Martínez et al. 2017) or wavelet bases (Kumar and Foufoula-Georgiou 1997; Nittinger and Becken 2016; Liu et al. 2018b) have been adopted to reduce the size of the inversion problem and, thus, to accelerate the inversion computation. However, to our knowledge, none of these techniques have been adopted in uncertainty and resolution analysis of inversion models computed from electromagnetic data. For time-lapse monitoring of salt water tracers, an adequate model reduction technique parameterises the tracer plume using a reduced Legendre moment decomposition (Laloy et al. 2012; Rosas-Carbajal et al. 2015). Using this approach enables stochastic time-lapse 3D inversion for the plume parameters and also provides model uncertainties. The forward solver of this approach relies upon the plume parameters being translated to resistivity perturbations on a finite-difference or finite-element grid. Similar problem-oriented parameterisations could be adopted in inversion problems where the aim is to determine the geometric shapes of anomalies using a limited number of parameters (Abubakar et al. 2009).

In the above, we have listed several techniques to accelerate model analysis. For 1D and 2D cases, it is desirable to use the direct matrix computation approach as detailed information about model resolution and covariance remains in the analysis results. For 1D and 2D problems, the TSVD approach works fine, if the effect of the model regularisation is deemed of lower importance than that of the data constraints, and the potential of the GSVD technique should be explored, if the model regularisation term is to be included in the analysis. For 3D cases, extraction of a few rows and columns from the resolution and covariance matrices, or consideration of their diagonal elements as proxies seem to be practical approaches. The iterative update formula is suitable for 3D problems when more and more new measurements are acquired across a given target and modification of the model parameterisation is not necessary. Model reduction approaches offer a cheap way of constructing the equivalent model domain in the valley of the cost function. They have great potential in stochastic inversion algorithms to reduce the sampling space, so that the otherwise prohibitive computational costs of stochastic algorithms can be dramatically reduced for 2D and 3D problems.

3 Examples of Linearised and Partially Nonlinear Model Analyses

3.1 Sensitivity Analyses

First, we consider the usage of the sensitivity matrix, because it establishes a first-order approximation in how a perturbation of the model parameters affects the measurements, that is, \(\triangle {\mathbf {d}} = {\mathbf {J}}\triangle {\mathbf {m}}\) with \({\mathbf {J}}=\left( \frac{\partial F_i}{\partial m_j}\right)\), \(i=1,\ldots ,N\) and \(j=1,\ldots ,M\) (see above). For each model parameter, we have a vector to measure the influence on all N measurements. It is natural to introduce a scalar measure for each model parameter, which sums all entries in this vector. For the jth model parameter, this cumulative sensitivity measure \(s_{j}\) (Park and Van 1991; Schwalenberg et al. 2002; Robert et al. 2012) is

$$\begin{aligned} s_{j} = \frac{1}{\triangle _{j}}\sum _{i=1}^{N} \left| \frac{1}{\delta _{i}}\frac{\partial F_{i}({\mathbf {m}})}{\partial m_{j}}\right| , \end{aligned}$$
(55)

where \(\triangle _{j}\) is the size of the jth model cell and \(\delta _{i}\) is the standard deviation of the ith measured data. A low value of \(s_{j}\) indicates that the measurements are insensitive to the jth model parameter. Large variations in this model parameter would generate only small variations in the measured data set. Consequently, this model parameter is poorly resolved in the inversion result. In contrast, model parameters with high values of the above sensitivity measure can generally be thought of as being well resolved, with high confidence levels. However, it is important to note that inversion artefacts, i.e. model structures caused by systematic noise or insufficient sensor calibration, may also have very high cumulative sensitivity measures. In a general plot of this sensitivity measure, elements around receivers and transmitters would have high values. Cumulative sensitivities would decrease with depth and approach relatively low values at depths larger than about 1.5 skin depths of the lowest signal frequency, due to the skin effect. Normalised by its maximum value, the sensitivity measure \(s_{j}\) is in the range of [0, 1]. It is difficult to choose a threshold value for this measure, under which the associated model parameters are shown to have low confidence. Schwalenberg et al. (2002) used a threshold value of \(10^{-4}\) to rule out the unreliable deep conductivity structures under the Central Andes using 2D magnetotelluric data. The minimum depth of the Altiplano conductor was estimated as about 50 km, and a localised conductor below the Precordillera was confirmed by high sensitivities. Schmoldt et al. (2014) also used \(10^{-4}\) as the threshold value to identify the well-resolved areas of a 2D resistivity model obtained from magnetotelluric soundings in the Tajo Basin, Spain (Fig. 5). A similar idea was applied to analyse electrical structures imaged by magnetotelluric data (Ledo et al. 2004; Adepelumi et al. 2005; Padilha et al. 2013; Yan et al. 2017a), controlled-source electromagnetic (CSEM) data (Christiansen and Auken 2012; Komori et al. 2013; Grayver et al. 2014; Troiano et al. 2014; Lindau and Becken 2018; Goswami et al. 2016; Attias et al. 2018), ground penetrating radar data (Meles et al. 2012) and direct-current resistivity data (Kemna et al. 2002; Hermans et al. 2012).

Fig. 5
figure 5

Cumulative sensitivity (black iso-lines) computed using Eq. (55) overlain on 2D model of crustal structures in the Tajo Basin (Schmoldt et al. 2014), which was inverted from magnetotelluric data collected at sites pic0001 to pic020. Labels of iso-lines are given on a logarithmic (base 10) scale. Areas with values less than the threshold value of \(10^{-4}\) are assumed to be poorly resolved. The dashed white line denotes the crust–mantle boundary

Now, we focus on appraising the inversion results in terms of formal model uncertainty and resolution analysis. Since most published inversion results did not contain model uncertainty or resolution analysis, we try to cover all published studies of model analysis for geo-electromagnetic induction and electrical resistivity tomography problems in the following sections. We try to categorise the following studies into two subsections: the first is on models inverted only from electromagnetic data, and the second is on models inverted from multiple geophysical data sets.

3.2 Uncertainty and Resolution Analyses of Models Inverted from Electromagnetic Data

For 2D inversion models computed from electrical resistivity tomography (ERT) data, Friedel (2003) successfully used the SVD technique to compute resolution and covariance matrices (Fig. 6). He also presented two simple measures of model resolution that are the radius of resolution and the distortion flag. The diagonal elements of the model resolution matrix are used to compute radii of resolution under the assumption that the averaging functions have small spreads and unique main lobes. The distortion flag is true, if the maximum value of the averaging function is off-diagonal; otherwise, it is false. Numerical experiments show that these two measures are consistent to each other. Slater and Binley (2003) used the diagonal elements of the model resolution matrix to analyse the inverted conductivity model in a 2D cross-well configuration with 770 measurements for monitoring a permeable reactive barrier. According to the model resolution analysis, the conductivity structure has decreasing resolution away from the boreholes (and the current sources). Using the extraction technique for the columns of the model resolution matrix in Eq. (46), Oldenborger and Routh (2009) carried out the analysis of the point-spread function for 3D ERT problems. In this study, three measures of the point-spread function were introduced, which are the spread attribute, the localisation error and the departure. The spread attribute quantifies how the inversion algorithm disperses structure. The localisation error is able to quantify the inability of the inversion algorithm to accurately map impulse variations in the true model. The departure term measures the combined effects of both the spread attribute and the localisation error. It is expected that good resolution has good localisation.

Fig. 6
figure 6

Resolving kernels \(R_{M,jk}/max_{k}(R_{M,jk})\) normalised by their respective maximum values of a 2D ERT inversion model (Friedel 2003) where four cells (cell indices \(j=125, j=588, j=348, j=722\)) were investigated (a). The radius of resolution \(r_{res}\) is derived from the diagonal elements \(R_{M,jj}\) of the resolution matrix (b). Cells with white outlines indicate well-resolved model parameters with distortion flags of value false

For geo-electromagnetic induction problems, Pedersen and Rasmussen (1989) applied the TSVD technique to compute the actual nonlinear semi-axes of the pseudo-hyperellipsoid for a 1D magnetotelluric inversion problem. Alumbaugh and Newman (2000) used the extraction technique for the column resolution vector to compute the point-spread functions of selected cells in 2D and 3D models computed from cross-well CSEM data sets. The capability of dealing with 3D large-scale problems is granted using the column extraction technique shown in Eq. (46). The burden of computing the point-spread function over selected model parameters was eased by iteratively solving a set of large-scale linear systems of equations using the conjugate gradient method. In this study, the authors also introduced a measure to evaluate the width of the point-spread function (PSF) for each model parameter. This measure can be thought of as being continuous in model regions that are homogeneous or have gradual changes in model parameters. Given sufficiently dense sampling around resistivity contrasts, the entire map of this measure can be approximated by interpolating this width function using its values computed for selected model parameters. Muñoz and Rath (2006) proposed an interesting method to explore the equivalence and resolution of 2D and 3D models using a hybrid two-step scheme. Starting from a preferred inversion model, modified models are randomly generated by replacing the resistivities of cells in a model domain of interest. Subsequently, the differences between the modified models and the preferred inversion model are projected onto the null-space of the latter to identify the equivalent models. This null-space is constructed by TSVD of the sensitivity matrix of the preferred inversion model. This approach goes back to earlier work in seismic tomography by Deal and Nolet (1996) and Rowbotham and Pratt (1997). For selected cells of 2D radio-magnetotelluric inversion models, Kalscheuer and Pedersen (2007) computed model uncertainties applying the TSVD technique in both pseudo-hyperellipsoidal descriptions of the data misfit surface and most-squares inversions. Subsequently, the rows of the model resolution matrices pertaining to the analysed cells were computed for the truncation level determined in the uncertainty analysis. For both a synthetic model and an RMT field data set, the estimated uncertainties of conductive model cells agree well between the extreme parameter set. However, for resistive cells underneath conductive structures, the uncertainties computed from the nonlinear semi-axes are significantly smaller than those of the more accurate most-squares inversion. Additionally, in terms of the estimated resolution centres and resolution lengths (Braile and Keller 1975; Pedersen 1977), the latter cells have large resolution lengths and their resolution centres strongly deviate from the cells themselves. This study confirmed the difficulty of constraining resistive structures below conductive structures in electromagnetic induction problems. For a 3D land CSEM data set collected to image the CO2 storage formation in Ketzin, Germany, Grayver et al. (2014) computed cumulative sensitivity measures and averaging functions of the model resolution matrix (by using TSVD and 7194 singular vectors). For the model with 94400 parameters treated in the analysis, this task was time-consuming and could be accomplished using parallel techniques on clusters. The results show that the cumulative sensitive measure is highest along the profile, and near the transmitters and the receivers. Furthermore, areas with high cumulative sensitivity values were demonstrated to have low values of PSF width. More recently, Kalscheuer et al. (2018) applied linearised model uncertainty and resolution analysis to evaluate the benefits of adding borehole MT data to surface MT data sets. Linearised analysis suggests that the joint inversion of both surface and borehole electromagnetic data sets leads to much better constrained models than single inversions of surface data. For borehole MT transfer functions, this experiment also confirmed that the vertical electric transfer function is sensitive to structures close to a borehole, whereas skin-effect transfer functions are sensitive also to structures at greater distance from the borehole.

In summary, for inversions of single-method electromagnetic data sets, most studies on model uncertainty and resolution analysis were limited to 1D and 2D cases. This is because the computational cost would become prohibitive for 3D problems. 3D model analysis might be done using powerful computer clusters, but it might be better to consider the above acceleration techniques (such as the row/column/diagonal extraction techniques, the iterative update formula and model reduction techniques), because these have good potential to analyse the reliability and uncertainties of 3D conductivity models at moderate cost. Due to the inherent strong nonlinearity of electromagnetic data inversion, different model analyses can generate inconsistent indicators. Hence, we must become more cautious when it comes to trusting our inverted conductivity model. Unfortunately, this fact is often ignored. Therefore, to increase the confidence in the inverted conductivity model, a comprehensive model analysis should be done to offer a measure of the reliability of the inversion model, which is particularly important for single inversions of electromagnetic data that suffer from equivalence and suppression problems (Menke 2012; Zhdanov 2002).

3.3 Uncertainty and Resolution Analyses of Models Inverted from Multiple Geophysical Data Sets

Data sets of different geophysical methods constrain different parts of the model space. Hence, each geophysical data set reveals partial information of the unique true model. Despite disparities in the material parameters that the different methods are sensitive to (e.g., electromagnetic data to conductivity, seismic data to velocity, magnetic data to permeability and gravity data to density), models for different material parameters can often be expected to have geometrical similarity (Gallardo and Meju 2004; Linde et al. 2006, 2008; Gallardo and Meju 2011; Brown et al. 2012; Zhou et al. 2014; Wiik et al. 2015; Takam Takougang et al. 2015; Le et al. 2016; Yan et al. 2017b; Giraud et al. 2019; Ogunbo et al. 2018; Agostinetti and Bodin 2018; Wang et al. 2018b) or can be related by some determined or stochastic petrophysical relationship (Moorkamp et al. 2007, 2011; Moorkamp 2017; Haber and Holtzman Gazit 2013). Thus, joint inversion of multiple geophysical data sets can significantly reduce uncertainty and improve resolution of the resulting models, i.e. these will come closer to the unknown true models than models from single inversions (Abubakar et al. 2012; Gao et al. 2012; Molodtsov et al. 2013).

Until now, model resolution and uncertainty analyses have only been conducted for a limited number of 1D and 2D joint inversion studies. For 2D joint inversion of electrical resistance tomography (ERT) and radio-magnetotellurics (RMT) data sets, Candansayar and Tezkan (2008) computed the model resolution matrix \({\mathbf {R}}_{\mathrm{M}}\) and used its diagonal elements to evaluate the benefit of joint inversion (Fig. 7). For this particular data set, the diagonal resolution elements from the inversions of the individual data sets showed that the ERT data were more sensitive to the shallow structure (Fig. 7a) and the RMT data set could better resolve the deep structures (Fig. 7b). As evidenced by high resolution values, the resistivities of both the shallow and deep structures were better resolved using joint inversion of the RMT and ERT data sets (Fig. 7c). Yogeshwar et al. (2012) drew similar conclusions for the 2D joint inversion of RMT and ERT data sets acquired across a groundwater contamination site in India. Kalscheuer et al. (2010) estimated the reliability of 2D models computed using smoothness-constrained joint inversions of ERT and RMT data sets comparing the results of linearised analysis and most-squares inversion. Model resolution tests demonstrated that inappropriate weighting among different data sets can lead to unnecessarily large model uncertainties and large spreads of the resolving kernels, emphasising the importance of proper data weights in joint inversions. The 2D model uncertainty and resolution analysis introduced by Kalscheuer et al. (2010) was successfully applied to 2D inversion models computed from single and joint inversions of ERT and RMT data (Bastani et al. 2011; Kalscheuer et al. 2013; Wang et al. 2018b). In a review paper, Gallardo and Meju (2011) derived a theoretical framework to compute the model covariance matrix for joint inversions using geometrical similarity constraints (cross-gradient method). For the joint inversion of a few electromagnetic data sets (transient electromagnetic, audio-magnetotelluric and controlled-source audio-magnetotelluric data) collected in the central Okavango Delta of Botswana, Kalscheuer et al. (2015) examined the effect of seismic constraints on 1D joint inversion models. Due to the relatively high number of electromagnetic data (about 600 to 700 per sounding) and the low number of model parameters (7 layer parameters and 14 distortion parameters), each layer parameter is perfectly resolved leading to a unit diagonal model resolution matrix. Moreover, the seismic constraints could significantly reduce model uncertainty for the resistivities and thicknesses of the layers. Giraud et al. (2017) tried to use geological information to constrain the petrophysical relationship for 2D joint inversions of magnetic and gravity data. Using this geological prior information, the reliability of joint inversion results was significantly improved as evident by a measure of the uncertainty in the petrophysical domain. For joint inversions of electric resistivity and seismic refraction tomography data at Äspö Hard Rock Laboratory close to Oskarshamn in Sweden, Ronczka et al. (2017) used cumulative sensitivity values (Eq. 55) and the model resolution radius to evaluate the reliability of the inverted conductivity models. The results of both analysis tools match very well to each other. In the final conductivity model, the shallow part has low resolution radii and high cumulative sensitivities, which means the shallow part can be trusted with confidence. Using the diagonal elements of the model resolution matrices (which are calculated by the stochastics-based diagonal extraction technique) from individual magnetotelluric, active source seismic P-wave and gravity data sets, Heincke et al. (2017) developed an adaptive strategy to determine the weights for different data misfit terms in 1D joint inversion. Resolution analysis indicates that, even with joint inversion, the deep structures off the shore of the Faroe Islands (Denmark) are poorly resolved, which suggests more geophysical data sets should be acquired.

Fig. 7
figure 7

Diagonal elements of model resolution matrices computed for a 2D single inversion of ERT data, b 2D single inversion of RMT data and c 2D joint inversion of ERT and RMT data from Candansayar and Tezkan (2008). The diagonal entries of the resolution matrix computed for the joint inversion model are higher than those for the single inversions indicating an improvement of model constraints by combining the data sets

4 Examples of Nonlinear Model Analyses

Changing the preferred inversion model\({\mathbf {m}}^{*}\) (also called nonlinear sensitivity test) may be the most economical tool as, in its simplest form, only several forward modelling computations are needed to evaluate the misfit variations and to construct a very limited equivalent model domain. Therefore, this model analysis tool is widely used in geo-electromagnetic inversion problems (Becken et al. 2008; Thiel and Heinson 2010). For instance, Hübert et al. (2009), Hübert et al. (2013) and Garcı́a Juanatey et al. (2013) carried out nonlinear sensitivity tests of the preferred inversion models obtained for different parts of the Kristineberg mining area in the Skellefte Ore District, northern Sweden. Along two perpendicular magnetotelluric profiles, Garcı́a Juanatey et al. (2013) performed the nonlinear sensitivity test by forward modelling a set of synthetic models which were built by varying the resistivities of selected regions or cells (Fig. 8). The bodies RI, RIII, RIV, RVI and RVII are resistors, and bodies CIVa, CIVb, CVa, CVb, CVI and CVII are conductors. For each body, the resistivity of a selected region was varied in the range of \(10^{-1}\) to \(10^{4}\,\Omega \text {m}\), whereas the resistivities of the remaining model regions were kept unchanged. The differences of model responses were measured by the RMS values. This analysis shows that the data are sensitive to resistors RI, RIII and RVI and conductors CVa and CIV. In contrast, the data seem to be insensitive to resistor RIV and conductors CVI and CVII, because the RMS misfits are unchanged as the resistivities of these bodies are varied.

Fig. 8
figure 8

Nonlinear sensitivity tests for 2D magnetotelluric inversion models along two perpendicular profiles in the Skellefte Ore District, northern Sweden (Garcı́a Juanatey et al. 2013). Bodies RI, RIII, RIV, RVI and RVII are resistors, and bodies CIVa, CIVb, CVa, CVb, CVI and CVII are conductors. The data are sensitive to resistors RI, RIII and RVI and conductors CVa and CIV, as variations of model resistivities lead to significant changes of RMS values. In contrast, the data seem to be insensitive to resistor RIV and conductors CVI and CVII

Dong et al. (2014) presented an example of 3D magnetotelluric inversion to study the electrical structure of the Ordos Block in the North China Craton. In their preferred inversion model (Fig. 9a) which contains three conductors C1, C2 and C3, the large-scale conductor C1 at depths of 20–60 km has a resistivity of \(3-100\,\Omega \text {m}\). C2 is at upper mantle depths (90–150 km) with a resistivity of tens of \(\Omega \text {m}\). C3 is vertically directed and connects shallow body C1 and deep body C2. C3 has a resistivity of \(3-10\,\Omega \text {m}\). To test the sensitivity of the data to C3, a synthetic model was built by assigning \(300\,\Omega \text {m}\) to its resistivity and \(50-90\,\text {km}\) to its depth range (Fig. 9a, b). The forward modelling results show that although the total RMS misfit has only slightly changed (from 1.60 to 1.66), 10–20 % changes appear in the responses at receiver sites close to C3 (Fig. 9c–e). Hence, these synthetic forward modelling tests suggest a high level of confidence associated with the existence of C3.

Fig. 9
figure 9

Nonlinear sensitivity test for conductor C3 in a 3D MT inversion model of the Ordos Block (Dong et al. 2014). Replacement of conductor C3 in the preferred model (a) by a homogeneous unit of \(300\,\Omega \text {m}\) (b) leads to data fit that is deteriorated by 10-20 % at sites in the vicinity of C3 (ce). Symbols indicate measurements. Dashed and solid lines indicate responses of the preferred inversion model in a and the modified model in b, respectively

For an CSEM problem using currents impressed on a 35-km-long segment of a gas pipeline in northern Germany as a source, Lindau and Becken (2018) calculated cumulative sensitivities of the inverted subsurface model using Eq. (55). The inversion model indicates a buried conductor C1, but the cumulative sensitivity plot shows the area containing the conductor has low sensitivity values. The authors then generated a new synthetic model by removing this conductor C1, which leads to a significantly increased RMS data misfit. Therefore, the authors conclude upon the existence of the buried conductor C1. In an example of inverting CSEM towed and ocean-bottom receiver data, Attias et al. (2018) first obtained a preferred inversion model, then a new synthetic model was constructed with structures from this preferred inversion model. Finally, the authors generated another inversion model by inverting the responses calculated for this synthetic model contaminated with noise. Similar structures appear in the inversion models for the field data and synthetic data establishing confidence in the reliability of the model inverted from the field data.

Due to its simplicity, the nonlinear sensitivity test is also widely used in other problems, such as analysis of the significance of different electromagnetic field components in resolving a given structure (Spies and Habashy 1995; Colombo and McNeice 2013), determination of minimum amplitudes of resistivity changes for resistivity monitoring (Didana et al. 2017; Thiel 2017; Liu and Zoback 2017), evaluation of acquisition configurations (such as surface-to-surface or surface-to-borehole) (Wirianto et al. 2010; Schamper et al. 2011; Vilamajó et al. 2013) or 4D time-lapse problems (Auken et al. 2014; Rosas-Carbajal et al. 2015). Readers are encouraged to read the above papers and references therein for detailed coverage.

Variations of the initial model\({\mathbf {m}}^{0}\) can generate different equivalent models satisfying the data misfit criterion (Agarwal et al. 1993). The general approach to designing the initial model is to use a simple half-space model with initial resistivity estimated from, for instance, an average apparent resistivity of the MT data or an average resistivity of rock samples collected in the survey area. If a priori geological knowledge indicates the underground structure is dominated by layers, the initial model can be a layered Earth model. An optimal initial model does not only accelerate the convergence rate of the inversion, but also increases the confidence in the preferred inversion model approaching the true model. Therefore, we may wish to design different initial models based of different geophysical models, such as such as seismic velocity models and density models, or geological maps. Given satisfying data misfit, all generated equivalent models should be equally considered and interpreted. In using magnetotelluric data to image the deep structure under the Himalayan syntaxis, Bai and Meju (2003) ran inversions with half-space initial models of four different resistivities, as shown in Fig. 10. In these four inversion models, resistivities below 15 km depth differ significantly from each other (up to a factor of four). Hence, regions below 15 km are poorly constrained by the data. It might be worthy to note that Tao et al. (2013) introduced a strategy for designing initial models in magnetotelluric inversion studies.

Fig. 10
figure 10

2D resistivity models inverted from magnetotelluric data (Bai and Meju 2003) using half-space initial models of different resistivities, 50 \(\Omega \text {m}\) (a), 100 \(\Omega \text {m}\) (b), 300 \(\Omega \text {m}\) (c) and 500 \(\Omega \text {m}\) (d). Regions below 15 km depth are poorly resolved, as their resistivity values differ from another by up to a factor of four

Variations of the reference model\({\mathbf {m}}_{r}\) lead to the depth of investigation (DOI) technique. The popular DOI tool was originally presented by Oldenburg and Li (1999) for inversion of ERT and IP data. For 2D inversion of ERT data with a total electrode spread of 400 m, two homogeneous reference model of 40 and 400 \(\Omega \text {m}\) were used to compute DOI indices (Fig. 11). In this particular example, the DOI isoline with a value of 0.4 is located at a depth of roughly 75 m and used for fading colours at greater depth, where the model is not believed to be constrained by the data. Subsequently, Marescot et al. (2003) used the DOI index to estimate the reliability of 2D ERT models in mountain permafrost studies. Miller and Routh (2007) established a relationship between the model resolution matrix \({\mathbf {R}}_{\mathrm{M}}\) and the DOI index. To our knowledge, this connection between the DOI index and the model resolution matrix is the only currently used technique to take into account nonlinear effects in the model resolution matrix. As described above, Kalscheuer and Pedersen (2007) used partially nonlinear uncertainty estimates to determine a TSVD truncation level for which \({\mathbf {R}}_{\mathrm{M}}\) is computed. However, this approach of accounting for nonlinearity in \({\mathbf {R}}_{\mathrm{M}}\) is at best indirect. More rigorous approaches would re-compute \({\mathbf {R}}_{\mathrm{M}}\) at the extreme models and, possibly, study the rotation of the model eigenvectors by moving from the preferred model to the extreme models.

Oldenborger et al. (2007) extended the concept of DOI to the 3D case and compared the 3D DOI to the row-summed resolution matrix and the sensitivity matrix. Hilbich et al. (2009) computed the DOI index and the model resolution matrix for 2D ERT monitoring of permafrost landforms with strong resistivity contrasts. In addition to a model resolution analysis based on linearisation, the DOI approach was used to account for the nonlinear characteristics of the ERT method. The comparison shows that both approaches could generate similar results. The agreement between these two approaches gave the authors confidence in applying ERT to monitor the spatial-temporal heterogeneity of permafrost degradation. Meju (2009) extended the concept of the DOI index to cross-gradient joint inversions of data sets from methods that are sensitive to dislike material parameters. The author used the cross-gradient of the minimum model and the maximum model to measure the resolution existing in these two extreme model sets. Monteiro Santos and El-Kaliouby (2011) calculated the DOI index for 2D joint inversion models of ERT and time-domain EM data sets, where only model roughness was considered in the regularisation, but not model smallness. Nenna et al. (2011) used the 3D DOI for experimental design in electrical resistivity imaging. Caterina et al. (2013) also computed the DOI index, the model resolution matrix and the sensitivity matrix in an application of ERT monitoring to hydrogeological problems. The performances of the model resolution matrix, sensitivity matrix and the DOI index were compared. This comparison shows that the model resolution matrix and the sensitivity matrix appear to be more suitable quantitative tools and the DOI index seems to be more suitable for a qualitative model analysis, because it is difficult to select a proper threshold value for the DOI index. Deceuster et al. (2014) tried to identify an optimal DOI threshold for ERT inversions. Oldenborger and LeBlanc (2015) compared the DOI to the diagonal elements of the resolution matrix for electrical resistivity surveys at Iqaluit International Airport, Nunavut, Canada. Recently, Carriere et al. (2017) compared the performance of the DOI index approach using two model parameter transformations, i.e. linear resistivity and logarithmic resistivity. The results strongly recommend the usage of logarithmic resistivity. For 2D joint inversions of TDEM and ERT data sets, Martinez-Moreno et al. (2017) used the DOI index to evaluate interpreted seawater intrusions in coastal areas. We conclude that the success of the DOI-based model analysis strongly depends on the experience of the users in trying different reference models. DOI indices can only be meaningfully computed for inversion models that have nearly identical RMS misfits. This is so, because systematic differences in data fit correspond to localised deviations between the inversion models. These localised deviations will falsely reflect in high DOI values, even though the model regions are well determined by the data. This means that the chosen model constraints are inappropriate in at least one of the inversions.

Fig. 11
figure 11

DOI map (c) of 2D ERT inversion (Oldenburg and Li 1999) with reference models being homogeneous half-spaces of 4000 \(\Omega \text {m}\) and 40 \(\Omega \text {m}\). The white regions of the inverted model are areas insensitive to the data (a). The inverted model is superposed by contour lines of the DOI map at an interval of 0.1 (b)

Changing the target value of the cost function\(\ Q\) is the paradigm common to the most-squares inversion and the edgehog method. The most-squares approach estimates model uncertainty by perturbing the target value of the cost function. For 2D single and joint inversions of ERT and RMT data, Kalscheuer et al. (2010) used both the TSVD technique and smoothness constraints to estimate extreme parameter sets by the most-squares method. In the analyses that take smoothness constraints into account, the results for the linearised model uncertainty estimates and the most-squares method are in good agreement with each other. This suggests that when a model regularisation term is included, the linearised model covariance can yield similar results as the more elaborate most-squares method. This result is not a surprise, because addition of linear constraints, such as smoothness constraints, renders the inversion problem more linear. Since a logarithmic transformation of model parameters (resistivities \(\rho\)) is used to ensure that resistivities are positive, the errors computed by the partially nonlinear hyperellipsoidal description (nonlinear semi-axes) and most-squares inversion are factors \(f^-\) and \(f^+\), i.e. the extreme parameter values \(\log _{10}\rho \pm \varDelta \log _{10}\rho ^{\pm }\) correspond to a range \(\left[ 1/f^-\,\rho ,f^+\,\rho \right]\). When disregarding smoothness constraints, the model uncertainties estimated by the most-squares method compare to those of the nonlinear semi-axes calculated by the TSVD technique favourably for conductive cells (e.g., cell D in Fig. 12 and Table 1) and satisfactorily for shallow resistive cells that are within the depth of investigation ranges of both methods (e.g., cell C in Figs. 1213 and Table 1). However, for resistive cells underneath conductive units, the most-squares errors are up to two and a half orders of magnitude larger than those estimated from the nonlinear semi-axes (e.g., cell B in Fig. 12 and Table 1). This indicates that model analyses can be in profound error, if nonlinearity is disregarded. Together, the nonlinear model errors computed using most-squares inversion and the model resolution matrices (not shown here) proved that the joint inversion did improve the overall model constraints. Other field examples with application of the most-squares inversion were presented by Bastani et al. (2011) and Kalscheuer et al. (2015) (see above). For 1D models computed from electromagnetic induction data acquired across Antarctic sea ice, Hunkeler et al. (2016a) and Hunkeler et al. (2016b) successfully applied the TSVD-based most-squares algorithm presented by Kalscheuer and Pedersen (2007) to estimate the reliability of layer thicknesses and conductivities as free parameters. Rosas-Carbajal et al. (2014) applied most-squares inversions to determine the uncertainties of 2D RMT and ERT single and joint inversion models and conclusions similar to those presented by Kalscheuer et al. (2010) were obtained. As an important result of this work, the model uncertainties computed using most-squares inversions were basically confirmed by Monte-Carlo-Markov-Chain (MCMC) inversions. Note that there is a difference regarding the model that the uncertainties are relative to. Whereas the most-squares method computes uncertainties relative to the minimum-misfit model, those of the MCMC method are relative to the ensemble mean. This is so, because the MCMC method generates an ensemble of models by sampling the likelihood function. From this ensemble, an ensemble mean is computed by averaging, and, subsequently, an ensemble covariance matrix is computed relative to the ensemble mean (Eqs. 42 and 43). Nevertheless, these results suggest that, for the given examples with measurements on the surface, there were not any significant local minima and maxima in the cost functions of the RMT and ERT inversion problems. However, Kalscheuer et al. (2018) pointed out that for inversion of borehole magnetotelluric data, there are general difficulties for the most-squares method to overcome the highly nonlinear effects, in particular in the vertical electric transfer functions. Thus, in its present form, the most-squares method cannot be used for uncertainty analysis of models computed from borehole magnetotelluric data. The Edgehog method, where either data misfit \(Q_{\mathrm{d}}\) or model misfit \(Q_{\mathrm{m}}\) is changed, was rarely used in geo-electromagnetic induction problems.

Fig. 12
figure 12

Synthetic resistivity model (a) and resistivity models inverted from ERT data (b), RMT data (c), joint ERT and RMT data (d) (from Kalscheuer et al. 2010). Cells B, C and D are located in the deep resistor (purple colour), shallow resistor and deep conductor (orange colour), respectively

Fig. 13
figure 13

TSVD-based most-squares models for minimum resistivity (left column) and maximum resistivity (right column) of cell C calculated for ERT (a), RMT (b) and joint ERT–RMT inversion models (c) (from Kalscheuer et al. 2010). Cell C is located in a shallow resistor (cf. Fig. 12). Error factors corresponding to the most-squares uncertainties are presented in Table 1

Table 1 TSVD-based most-squares error factors of cells B (deep resistor), C (shallow resistor) and D (deep conductor, cf. Fig. 12) for the minimum-misfit models computed in ERT, RMT and joint ERT–RMT inversions (from Kalscheuer et al. 2010)

Global search algorithms or stochastic inversion algorithms can be considered as generalised nonlinear model analysis methods. Stochastic inversion algorithms build synthetic models by following certain model space sampling rules. As compared to the nonlinear model analysis methods described above, the model domain is sampled much more densely and systematically. For 2D inversions, the model ensemble that serves to describe the a posteriori probability density function consists easily of 100,000 models (Rosas-Carbajal et al. 2014). Today, the research field of designing stochastic algorithms is very active. There are many stochastic algorithms such as the Bayesian approach (Minsley 2011; Guo et al. 2014; Rosas-Carbajal et al. 2014, 2015; Berube et al. 2017; Xiang et al. 2018), genetic algorithms (Moorkamp et al. 2007; Akca and Basokur 2010; Akca et al. 2014), simulated annealing (Prasad 1999; Wang et al. 2012), neural networks (El-Qady and Ushijima 2001; Singh et al. 2013), evolution algorithms (Balkaya et al. 2017; Grayver and Kuvshinov 2016; Grayver et al. 2016) and particle swam optimisation algorithms (Srivastava and Agarwal 2010; Santilano et al. 2018; Liu et al. 2018a). For their applications in 1D and 2D geo-electromagnetic inversion, please refer to the above references and also a recent review paper (Pankratov and Kuvshinov 2016).

5 Uncertainty and Resolution Analysis of Models Inverted from Seismic and Potential Field Data

Linear or linearised model uncertainty and resolution analysis is a standard appraisal method also in other geophysical sub-disciplines, such as seismic and potential field methods. Here, we try to identify similarities and disparities to the approaches taken in the geo-electromagnetic community. Further, we provide detail on peculiarities of seismic and potential field inversion problems.

In 2D and 3D teleseismic tomography problems, the inversion parameters are typically velocity perturbations with respect to a layered background model. For appraisal of 3D velocity models computed from seismological delay time data (travel time residuals) with a ray approximation, Spakman and Nolet (1988) combined usage of two inversion algorithms, i.e. the simultaneous iterative reconstruction technique (SIRT, an iterative back-projection method) and LSQR, analysis of cell hit-counts (i.e. the number of rays passing through a cell and, as such, not applicable to diffusive EM problems), cell spike tests (synthetic tests, where the velocity perturbations of individual cells are set to have the same value), harmonic model tests (synthetic tests, where the velocity perturbation is based on superposition of sine functions) and random redistribution of delay times (see above). For a whole Earth tomography problem, Vasco et al. (2003) computed linearised model resolution and covariance matrices based on a truncated SVD. Using a partial SVD up to the truncation level, the computational burden of the SVD could be reduced. In an inversion of teleseismic data for P and S wave velocities, Eken et al. (2008) considered the diagonal elements of the model resolution matrix and synthetic modelling studies to evaluate model reliability. Also in teleseismic tomography, Shomali et al. (2002) and Shomali et al. (2011) considered a variety of nonlinear approaches to evaluate model reliability including usage of different inversion routines such as SVD and quadratic programming with inequality constraints, inversion of synthetic data and checkerboard tests. The applications described in these two papers are to the Trans-European suture zone and the Zagros collision zone. In order to study the non-uniqueness in global and regional tomography problems, De Wit et al. (2012) explore the null-space of the forward operator following the original idea presented by Deal and Nolet (1996). Lebedev et al. (2013) presented extensive 1D modelling studies including multi-dimensional plots of the data misfit to explore trade-offs between various model parameters involved in identifying the Moho from seismic surface-wave data. The study includes an evaluation of maximum data uncertainties that still permits identification of the Moho.

In seismic tomography, cell spike tests (Spakman and Nolet 1988; Aster et al. 2012), harmonic model tests (Spakman and Nolet 1988) and checkerboard tests (Shomali et al. 2002, 2011; Aster et al. 2012) use synthetic models and the same experimental setup (including ray coverage) as a given field data set. These tests illustrate the potential loss of resolution (smearing) in the resulting inversion models and deliver important information on the general suitability of a given experimental setup (mostly station coverage) to constrain a given target structure. Menke (2015) describes the conceptual similarity of checkerboard tests and PSFs computed from the columns of model resolution matrices. The inversion model resulting from a checkerboard test is essentially the sum of a related set of PSFs. However, owing to the use of travel time residuals in seismic tomography and the strong damping of electromagnetic fields in conductive structures, these types of tests do not seem to have been popular in the geo-electromagnetic induction community.

In potential field problems, one would assume linear model uncertainty and resolution analysis to be directly applicable given the linearity of the forward modelling operator in the model parameters. However, as noticed by Kamm et al. (2015), transformation of data or model parameters (e.g., to ensure model parameters are positive) may make it necessary to use iterative inversion and linearised or nonlinear model analysis. In contrast to EM problems, depth resolution is a major concern in potential field problems, and depth weighting of the model regularisation has been used to address this problem (e.g., Li and Oldenburg 1996; Kamm et al. 2015; Pilkington 2016, and references therein). Based on damped SVD and GSVD, Fedi et al. (2005) considered the so-called Picard plot (a comparison of singular values and the data vector projected onto the left singular vectors vs singular value number) to set an optimal damping factor and derived a depth resolution plot. A vector \({\mathbf {s}}_i\) with a number of entries \(N_z\), which equals the number of depth layers in a 3D model, is constructed for each right (model) singular vector \({\mathbf {v}}_i\), such that the entries of \({\mathbf {s}}_i\) are the sums of squares of the entries of \({\mathbf {v}}_i\) which correspond to a certain depth level. Essentially, the depth resolution plot is a column-wise plot of the \({\mathbf {s}}_i\) and is one way of condensing the huge amount of information contained in the model singular vectors. Given appropriate consideration for damping and the Picard plot, the depth resolution plot helps identify the maximum depth to which a given data set, including its noise, can constrain the model. Based on Picard and depth resolution plots, Paoletti et al. (2016) evaluated the differences in depth resolution between gravity and gravity gradient tensor data. The authors concluded that, given comparability in the noise levels, gravity and gravity gradient tensor data lead to similar inversion models. For 3D inversion of magnetic field data, Pilkington (2016) gave a concise exposure to the theory of formal model resolution analysis including averaging functions, point-spread functions, extraction of rows and columns of \({\mathbf {R}}_{\mathrm{M}}\) and different measures of resolution length (or width). Based on a comparison to checkerboard tests, Pilkington concluded that the resolution widths estimated based on the averaging or point-spread functions are too optimistic and attributed this to model parameter transformation, i.e. the nonlinearity introduced through it, rather than depth weighting. Hence, Pilkington (2016) suggested to compute a more conservative model resolution matrix in form of the one without parameter transformation accelerating computations with a fast Fourier transform. Approximating the laterally varying sequence of ice, lake water, lake-floor sediments and basement of Lake Vostok, Antarctica, with a limited of number of rectangular prisms, Roy et al. (2005) inverted gravity data for the corner positions of the prisms assuming that the densities of the different units are known. The resulting nonlinear inversion problem was solved using simulated annealing which allowed for quantification of the model uncertainties and correlations. Aiming at regional studies, Plattner and Simons (2017) inverted satellite magnetic data for the coefficients of their internal and external parts using the mean-square error (MSE) to balance model uncertainty and resolution spread. The truncation in the MSE criterion was based on the index of the vector Slepian basis functions.

6 Survey Design

For a given data set, we aim at finding an inversion model with a low spread of the resolution matrix and low uncertainty. As given in Eq. 7, the model resolution matrix is determined by the quality of the observed data set \({\mathbf {d}}\), the accuracy of the forward modelling (or the sensitivity matrix \({\mathbf {J}}\left[{\mathbf {F}},{\mathbf {m}}\right]\)), the underground conductivity distribution \({\mathbf {m}}\), the model constraints \({\mathbf {W}}_{\mathrm{m}}\), the reference model \({\mathbf {m}}_{r}\) and the trade-off parameter \(\alpha\). If we assume that the model constraints are fixed during the inversion and the error of the forward computation is acceptable, the reliability of the inverted conductivity distribution is only related to the quality of the observed data set. In other words, for a given model regularisation, the model resolution matrix depends on the data \({\mathbf {d}}\), the data uncertainties \({\mathbf {W}}_d\) and the model \({\mathbf {m}}\). Thus, before conducting a field survey, we should experimentally find the optimal surveying configuration for the particular area. Another benefit of searching for an optimal configuration is to optimise the budget, because it is a key factor in approving or denying a proposed field survey.

The quality of the observed data set \({\mathbf {d}}\) is controlled by multiple factors, such as locations of transmitters \({\mathbf {r}}_{\mathrm{t}}\), locations of receivers \({\mathbf {r}}_{\mathrm{r}}\), the number of transmitters \(n_{\mathrm{t}}\), the number of receivers \(n_{\mathrm{r}}\), the measured components of electromagnetic fields F and the frequency f or time t ranges. For a given model parameterisation, experimental design (survey design) aims at making the model resolution matrix \({\mathbf {R}}_{\mathrm{M}}\) close to the identity matrix \({\mathbf {I}}\) by adding the necessary types of measurements. Formally, this optimisation problem is expressed as:

$$\begin{aligned} \underbrace{\text {minimise}}_{{\mathbf {r}}_{\mathrm{t}}, {\mathbf {r}}_{\mathrm{r}}, n_{\mathrm{t}}, n_{\mathrm{r}},F,f,t} \,\, |{\mathbf {R}}_{\mathrm{M}}-{\mathbf {I}}| \end{aligned}$$
(56)

where |.| denotes the matrix norm and \({\mathbf {R}}_{\mathrm{M}}\) is a function of variables \({\mathbf {r}}_{\mathrm{t}}\), \({\mathbf {r}}_{\mathrm{r}}\), \(n_{\mathrm{t}}\), \(n_{\mathrm{r}}\), F, f and t. Although the model resolution matrix \({\mathbf {R}}_{\mathrm{M}}\) is also a function of the unknown true model, homogeneous models or simple models derived from prior information can initially be used for testing. Alternatively, we can minimise objective functions that are based on the a posteriori model covariance matrix (e.g., Maurer and Boerner 1998; Maurer et al. 2000). In cases where it is difficult to change transmitter locations and signal frequencies, we can try different arrangements of receivers to satisfy the above nonlinear optimisation problem. Furthermore, if receiver locations were to be equally distributed across flat terrain, we would use three parameters which are the location of the starting receiver, the receiver spacing and the number of receivers along the grid to describe the receiver configuration. Then, the total number of unknowns are four scalar variables (two for the horizontal coordinates of the starting receiver) in the nonlinear optimisation problem in Eq. (56). In terms of standard optimisation algorithms, these four parameters can be solved for with high confidence, because it is a highly over-determined problem.

The concept of optimal survey design has a long history in product quality control (Taguchi 1987). Later on, its concept was applied to solve geophysical problems such as optimal design of seismic monitoring networks (Kijko 1977; Rabinowitz and Steinberg 1990; Hardt and Scherbaum 1994) and real-time control of magnetotelluric data acquisition in the field (Jones and Foster 1986). Over the last twenty years, geophysicists have used optimal survey design mainly in ERT imaging problems (Stummer et al. 2004; Wilkinson et al. 2006; Maurer et al. 2010), seismic tomography (Curtis 1999; Haber et al. 2008) and seismic full-wave full inversion (Maurer et al. 2017; Nuber et al. 2017). For ERT inversion problems, if the typical four-electrodes configurations are used, the maximum number of independent measurements is \(n\times (n-1)\times (n-2)\times (n-3)/8\)  (Coles and Morgan 2009; Hyvonen et al. 2014), where n is the number of equally spaced electrodes. For an acquisition system with 100 electrodes, the total number of measurements can easily reach \(10^{7}\). Therefore, it is crucial to select a small set of measurements from all possible measurement configurations which can still optimally resolve the underground structure. Successively augmenting a base set of measurements (Wilkinson et al. 2006; Stummer et al. 2004; Loke et al. 2014; Wagner et al. 2015; Uhlemann et al. 2018), we can employ the iterative update formulae in Eqs. (50) to (52) to quickly compute the model resolution and covariance matrices of the augmented data set. If the resolution matrix is improved or the uncertainty of the model is decreased, the test measurement will be added to the base set. Additionally, numerical experiments show that optimal survey configurations derived under the assumption of a homogeneous true model still work fine for practical investigations. This may be caused by the fact that nonlinearity is weaker in ERT problems than in EM problems. Interested readers can try to further investigate this relatively weak nonlinearity of the ERT problem.

As to geo-electromagnetic induction problems, Jones and Foster (1986) conducted the first attempt at real-time control of MT data acquisition by monitoring the improvement in model resolution matrices in 1D magnetotelluric studies. Based on the application of a genetic algorithm to the design of an EM survey using a horizontal electric dipole source and vertical magnetic field receivers, the numerical experiments presented by Maurer and Boerner (1998) showed that a reduced number of measurements could achieve an inversion model similarly well constrained as the full set of measurements. Maurer and Boerner (1998) used an objective function based on the SVD of the a posteriori model covariance matrix. For 1D single inversions of frequency-domain electromagnetic (FDEM) induction data with a horizontal loop transmitter and a horizontal loop receiver and geoelectrical sounding data using a Schlumberger configuration, Maurer et al. (2000) compared different survey design approaches based on analysis of forward modelling results, the sensitivity matrix, data importance (Sect. 2.1) and a genetic algorithm sampling the same objective function as Maurer and Boerner (1998). The presented studies use varying transmitter–receiver distances and frequencies in FDEM and electrode spacing in geoelectrics. For the cases considered by Maurer et al. (2000), the optimal survey designs deviated quite strongly from the typically used logarithmic equidistant source–receiver spacing and frequency spacing. However, in cases where no or little a priori information is available, the nonlinearity of the electromagnetic forward problem makes optimal survey design difficult, and comprehensive data sets should be acquired in the field. Similar to Maurer and Boerner (1998), Roux and Garcia (2014) applied a genetic algorithm to explore optimal combinations of transmitter–receiver distances and frequencies for a CO2 monitoring problem (see Fig. 14 and Table 2). In terms of a measure of the weighted model covariance (or the singular values), a set of 20 optimal frequencies with different transmitter–receiver distances (Tx–Rx), which are not equally distributed on a logarithmic axis, were determined to resolve a resistive CO2 sequestration layer embedded in a half-space model.

Only few studies were carried out for designing optimal configurations for electromagnetic induction problems. This may be caused by the fact that the number of acquired measurements is less than that of ERT surveys and, depending on the method, it is hard to relocate the heavy electromagnetic transmitters and receivers in the field. Although optimal survey design is not popular in the electromagnetic induction community and there are the aforementioned difficulties with nonlinearity, it can clearly help us to improve the survey reliability and also to save money, if adequate a priori information is available before field work.

Fig. 14
figure 14

Results of the five optimal acquisition configurations (A, B, C, D, E) for a 1D land-based CSEM survey (Roux and Garcia 2014). a Plots of optimal frequencies and Tx–Rx distances for each configuration which are detailedly shown in Table 2. b Spectrum of normalised eigenvalues for each optimal design. c Inverted resistivity models from these five survey designs compared with the given true model (grey solid line)

Table 2 List of detailed optimal frequencies and optimal Tx–Rx distances for the five optimal acquisition configurations as shown in Fig. 14

7 Inversion Grid Design

Utilising the above optimal survey design strategies, the number of measurements can be minimised. The next naturally arising task is to decrease the number of model cells or unknowns. This would not only decrease the computational cost, but also the ill-posedness of the inversion problem. We can use the geophysicist’s practical experience or the physics of electromagnetic field propagation to design the inversion grid. For instance, this would lead to assigning coarse cells to regions which are at depths larger than about 1 to 1.5 skin depths at the lowest signal frequency (Spies 1989). However, these artificially designed inversion grids may have too many unknowns in regions where the measurements are not sensitive to the model parameters. Hence, a preferred strategy is to implement an automatic and adaptive inversion grid meshing algorithm by which the model parameters can be nearly optimally distributed in space.

This idea has been implemented in applied mathematics (Becker et al. 2000; Bangerth 2002; Becker and Vexler 2004; Bangerth 2008; Bangerth and Joshi 2008) where inversions were reformulated as PDE-constrained nonlinear optimisation problems. Both model parameters and model responses are considered as unknowns. In terms of Lagrange variables, partial differential equation (PDE)-constrained nonlinear inversion problems are transformed to so-called all-at-once formulae where three sub-problems are defined (Haber et al. 2000; Wilhelms 2016). They are the state problem (using electromagnetic fields as unknowns), the adjoint problem (using Lagrange multipliers as unknowns), and the control equation (using model parameters as unknowns). The state and adjoint problems share the same forward modelling operator which can be simultaneously evaluated on the same forward modelling grid. The control equation is defined on an inversion grid which is independent of the forward modelling grid. The objective function is then a function of three types of parameters which are the model responses, the Lagrange multipliers and the model parameters. A posterior misfit estimation algorithm for the nonlinear objective function is developed to distribute the misfits of the objective function to residuals of the electromagnetic fields, the Lagrange multipliers and the model parameters. Residuals of electromagnetic fields and Lagrange multipliers can be estimated based on well-established posterior error estimators for forward modelling problems (Franke et al. 2007; Li and Pek 2008; Key and Ovall 2011; Schwarzbach et al. 2011; Ren et al. 2013; Grayver and Kolev 2015). As to geophysical inversion problems, Haber et al. (2007) implemented this adaptive inversion grid design in 3D ERT inversion problems. Using the gradient of the estimated conductivity as a measure for refinement of inversion grids, Haber et al. (2007) demonstrated that 3D inversion grids can be gradually and adaptively refined. Similarly, Grayver (2015) implemented 3D MT inversion grid refinement in the framework of conventional non-constrained optimisation. More interestingly, the author presented a way of designing the initial inversion grid using the diagonal entries of the model resolution matrix. Since only local mesh refinements are performed, the number of unknowns can be gradually increased. Recently, Samrock et al. (2018) successfully applied this 3D MT inversion code by Grayver (2015) for imaging the electrical conductivity structure of the magmatic system beneath the Ethiopian rift, where the inversion grid has only 79,508 unknowns (Fig. 15).

Fig. 15
figure 15

3D inversion grid generated by adaptive refinement techniques for a set of MT data collected across the Main Ethiopian Rift (Samrock et al. 2018). The number of inversion unknowns was gradually increased to 79,508. Red dots denote MT sites

8 Conclusions and Suggestions

In this review, we first provided a concise exposition to the theory of linearised and nonlinear model analysis tools to estimate model uncertainty and resolution. Model uncertainty estimates define an equivalent model domain in which each model can fit the cost function (most importantly the measurements) to within a threshold value. The model resolution matrix offers a way to examine the contributions of the unknown true model to the preferred inversion model and can be considered a blurring filter through which we see the former in the latter. Importantly, linearised tools can offer information about both model uncertainty and resolution. In contrast, the presently available nonlinear tools offer information mostly on model uncertainty. The equivalent model domains calculated by different linearised or nonlinear model analysis tools may comprise different valleys (regions of low value) of the cost function. This is so because different parts of the model domain are explored by the various model analysis tools. If comprehensive a priori information exists, optimal model constraints can be constructed. These model constraints may include a geologically meaningful reference model, a geologically meaningful model weighting matrix and, if applicable, petrophysical constraints. The a priori information may consist of geological maps, borehole logs and seismic, magnetic and gravity models. Under these conditions, the inversion may be expected to converge to a preferred inversion model that is close to the unknown true model and that has a model resolution matrix with small spread and small model uncertainties. Judicious selection of a priori constraints and their implementation is of importance to avoid biasing the inversion model into the wrong direction. In particular, care needs to be exercised in understanding the combined effect of structural contrasts in the reference model and the model weighting matrix on the inversion model. We have provided one example of this. Usage of regular model weighting matrices may introduce unintended bias to the inversion model in form of average amplitude shifts of the model parameters. In contrast, a singular model weighting matrix with smoothness constraints does not generate this form of bias, and, in many cases, it may be the preferred form of model regularisation. Note that structural constraints can be implemented by re-weighting the smoothness constraints locally.

Using model analysis tools, we are able to evaluate the quality of inversion models and to design optimal survey configurations. Many model analyses employ solely the sensitivity matrix to study how well the model is constrained by the data. This is so, because it is a simple measure and because, using Gauss–Newton inversion algorithms, the sensitivity matrix is explicitly available. However, for a reliable model analysis, we should consider the model resolution and covariance matrices and, if computationally feasible, nonlinear methods, such as the most-squares inversion. Model uncertainties estimated by the TSVD variant of the most-squares method often are larger than those computed by the partially nonlinear pseudo-hyperellipsoid. To reduce the equivalences in inversion models, i.e. to reduce model uncertainty and to improve model resolution, we should use multiple geophysical data sets. Joint inversion is a popular tool in the geo-electromagnetic induction community. Although it is clear that joint inversion reduces the equivalent model domain, surprisingly little work has been done on model analysis for 2D and 3D joint inversion problems.

Second, we listed acceleration algorithms to quickly compute complete or partial model resolution and covariance matrices offered by linearised model analysis tools. For 1D and 2D problems, the direct matrix computation approach should be used as it is computationally feasible and preserves detailed information on the resolution and covariance matrices. Furthermore, the TSVD method is a suitable choice if we are only interested in the constraints provided by the measurements and wish to ignore the effect of the model regularisation term. Analysis including the regularisation term can be accomplished using the GSVD method. To our knowledge, an EM model analysis using a model regularisation term and the GSVD method has not yet been presented, but it would have the potential to be an interesting complement to the literature. For large-scale 2D and general 3D problems, we may wish to use the row, column or diagonal extraction algorithms to evaluate the resolution and covariance matrices for selected cells. This choice may be motivated by the desire to condense the large amount of information in the full matrix expressions, whereas the large computational load associated with 3D problems may leave it as the only computationally practical approach. Uncertainty and resolution may be assumed to vary continuously at least over regions in model space that are homogeneous or have only gradual changes in material parameters. Thus, representative measures of uncertainty and resolution in parts of the model space that have not been analysed may be obtained by interpolating the ones computed for selected cells. The model reduction technique has not been widely explored either by the EM community, although this technique deserves further exploration, because it accelerates the computation and can even increase the quality of inversion results. A benefit of the reduced number of model parameters is a presumably simpler topography of the cost function. The iterative update formula is particularly useful for optimal survey design to study 3D structures. Thereby, the efficiency of additional data in aiding target characterisation can be quickly examined.

Third, we summarised the main findings of published model analysis studies, which are mainly for 1D and 2D individual and joint inversion models. This is caused by the prohibitive computational cost of performing full 3D model analysis. We think that in terms of the presented fast computation methods, 3D model analysis can be efficiently performed. In particular, the integration of model analysis into iterative solvers, such as the conjugate gradients (CG), nonlinear conjugate gradients (NLCG) and limited-memory quasi Newton (LMQN) methods, deserves attention. Using these fast algorithms, optimal survey configurations such as optimal transmitter locations, transmitter numbers, receiver locations, receiver numbers and optimal frequency ranges can be determined before deploying instruments in the fields. Optimal survey design not only can strengthen the reliability of inversion models, but also can reduce financial cost. However, we note that the pronounced nonlinear effects of electromagnetic inversion problems will lead to difficulty in applying optimal survey design in situations, when there is little a priori information on the target structures. Summarily speaking, due to significant enhancements of computing power and the availability of fast computation methods, we expect more model uncertainty and resolution analyses to be presented for 2D and 3D electromagnetic studies, as well as optimal survey design and inversion grid design.