Bayesian model comparison with un-normalised likelihoods

Everitt, Richard G.; Johansen, Adam M.; Rowing, Ellen; Evdemon-Hogan, Melina

doi:10.1007/s11222-016-9629-2

Bayesian model comparison with un-normalised likelihoods

Published: 08 February 2016

Volume 27, pages 403–422, (2017)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Richard G. Everitt¹,
Adam M. Johansen²,
Ellen Rowing¹ &
…
Melina Evdemon-Hogan¹

912 Accesses
17 Citations
13 Altmetric
1 Mention
Explore all metrics

Abstract

Models for which the likelihood function can be evaluated only up to a parameter-dependent unknown normalizing constant, such as Markov random field models, are used widely in computer science, statistical physics, spatial statistics, and network analysis. However, Bayesian analysis of these models using standard Monte Carlo methods is not possible due to the intractability of their likelihood functions. Several methods that permit exact, or close to exact, simulation from the posterior distribution have recently been developed. However, estimating the evidence and Bayes’ factors for these models remains challenging in general. This paper describes new random weight importance sampling and sequential Monte Carlo methods for estimating BFs that use simulation to circumvent the evaluation of the intractable likelihood, and compares them to existing methods. In some cases we observe an advantage in the use of biased weight estimates. An initial investigation into the theoretical and empirical properties of this class of methods is presented. Some support for the use of biased estimates is presented, but we advocate caution in the use of such estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatial Models Using Laplace Approximation Methods

Notes

We note that log of an unbiased estimate in fact produces a negatively-biased estimator but we observe, through the results for the exact algorithm indicate that the variance of the evidence estimates we use is sufficiently small that this effect is negligible.

References

Alquier, P., Friel, N., Everitt, R.G., Boland, A.: Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Statistics and Computing In press (2015)
Andrieu, C., Roberts, G.O.: The pseudo-marginal approach for efficient Monte Carlo computations. Ann Stat 37(2), 697–725 (2009)
Article MathSciNet MATH Google Scholar
Andrieu, C., Vihola, M.: Convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms (2012). arXiv:1210.1484
Beaumont, M.A.: Estimation of population growth or decline in genetically monitored populations. Genetics 164(3), 1139–1160 (2003)
Google Scholar
Beskos, A., Crisan, D., Jasra, A., Whiteley, N.: Error bounds and normalizing constants for sequential Monte Carlo in high dimensions (2011). arXiv:1112.1544
Caimo, A., Friel, N.: Bayesian inference for exponential random graph models. Soc Netw 33, 41–55 (2011)
Article Google Scholar
Chopin, N.: A sequential particle filter method for static models. Biometrika 89(3), 539–552 (2002)
Article MathSciNet MATH Google Scholar
Chopin, N., Jacob, P.E., Papaspiliopoulos, O.: $\text{ SMC }^2$: an efficient algorithm for sequential analysis of state space models. J R Stat Soc 75(3), 397–426 (2013)
Article MathSciNet Google Scholar
Del Moral, P.: Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications. Probability and Its Applications. Springer, New York (2004)
Book MATH Google Scholar
Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. J R Stat Soc 68(3), 411–436 (2006)
Article MathSciNet MATH Google Scholar
Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo for Bayesian computation. Bayesian Stat 8, 115–148 (2007)
MathSciNet MATH Google Scholar
Didelot, X., Everitt, R.G., Johansen, A.M., Lawson, D.J.: Likelihood-free estimation of model evidence. Bayesian Anal 6(1), 49–76 (2011)
Article MathSciNet MATH Google Scholar
Drovandi, C.C., Pettitt, A.N., Lee, A.: Bayesian indirect inference using a parametric auxiliary model. Stat Sci 30(1), 72–95 (2015)
Article MathSciNet MATH Google Scholar
Everitt, R.G.: Bayesian parameter estimation for latent Markov random fields and social networks. J Comput Graph Stat 21(4), 940–960 (2012)
Article MathSciNet Google Scholar
Fearnhead, P., Papaspiliopoulos, O., Roberts, G.O., Stuart, A.M.: Random-weight particle filtering of continuous time processes. J R Stat Soc 72(4), 497–512 (2010)
Article MathSciNet Google Scholar
Friel, N.: Evidence and Bayes factor estimation for Gibbs random fields. J Comput GraphStat 22(3), 518–532 (2013)
Article MathSciNet Google Scholar
Friel, N., Rue, H.: Recursive computing and simulation-free inference for general factorizable models. Biometrika 94(3), 661–672 (2007)
Article MathSciNet MATH Google Scholar
Girolami, M.A., Lyne, A.M., Strathmann, H., Simpson, D., Atchade, Y.: Playing Russian roulette with intractable likelihoods (2013). arXiv:1306.4032
Grelaud, A., Robert, C.P., Marin, J.M.: ABC likelihood-free methods for model choice in Gibbs random fields. Bayesian Anal 4(2), 317–336 (2009)
Article MathSciNet MATH Google Scholar
Johndrow, J.E., Mattingly, J.C., Mukherjee, S., Dunson, D.: Approximations of Markov chains and high-dimensional Bayesian inference (2015). arXiv:1508.03387
Klaas, M., de Freitas, N., Doucet, A.: Toward practical $N^2$ Monte Carlo: The marginal particle filter. In: Proceedings of the 20th International Conference on Uncertainty in Artificial Intelligence (2005)
Kong, A., Liu, J.S., Wong, W.H.: Sequential imputations and Bayesian missing data problems. J Am Stat Assoc 89(425), 278–288 (1994)
Article MATH Google Scholar
Lee, A., Whiteley, N.: Variance estimation and allocation in the particle filter (2015). arXiv:2015.0394
Marin, J.M., Pillai, N.S., Robert, C.P., Rousseau, J.: Relevant statistics for Bayesian model choice. J R Stat Soc 76(5), 833–859 (2014)
Article MathSciNet Google Scholar
Marjoram, P., Molitor, J., Plagnol, V., Tavare, S.: Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci USA 100(26), 15324–15328 (2003)
Article Google Scholar
Meng, Xl, Wong, W.H.: Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Stat Sin 6, 831–860 (1996)
Møller, J., Pettitt, A.N., Reeves, R.W., Berthelsen, K.K.: An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika 93(2), 451–458 (2006)
Article MathSciNet MATH Google Scholar
Murray, I., Ghahramani, Z., MacKay, D.J.C.: MCMC for doubly-intractable distributions. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI), pp. 359–366 (2006)
Neal, R.M.: Annealed importance sampling. Stat Comput 11(2), 125–139 (2001)
Google Scholar
Neal, R.M.: Estimating ratios of normalizing constants using linked importance sampling (2005). arXiv:0511.1216
Nicholls, G.K., Fox, C., Watt, A.M.: Coupled MCMC With a randomized acceptance probability (2012). arXiv:1205.6857
Peters, G.W.: Topics in sequential Monte Carlo samplers. M.Sc. thesis, Unviersity of Cambridge (2005)
Picchini, U., Forman, J.L.: Accelerating inference for diffusions observed with measurement error and large sample sizes using approximate Bayesian computation: a case study (2013). arXiv:1310.0973
Prangle, D., Fearnhead, P., Cox, M.P., Biggs, P.J., French, N.P.: Semi-automatic selection of summary statistics for ABC model choice. Stat Appl Genet Mol Biol 13(1), 67–82 (2014)
MathSciNet MATH Google Scholar
Rao, V., Lin, L., Dunson, D.B.: Bayesian inference on the Stiefel manifold (2013). arXiv:1311.0907
Robert, C.P., Cornuet, J.M., Marin, J.M., Pillai, N.S.: Lack of confidence in approximate Bayesian computation model choice. Proc Natl AcadSci USA 108(37), 15,112–7 (2011)
Schweinberger, M., Handcock, M.: J R Stat Soc 77, 647–676 (2015)
Article Google Scholar
Sisson, S.A., Fan, Y., Tanaka, M.M.: Sequential monte carlo without likelihoods. Proc Natl AcadSci USA 104(6), 1760–1765 (2007)
Article MathSciNet MATH Google Scholar
Skilling, J.: Nested sampling for general Bayesian computation. Bayesian Analysis 1(4), 833–859 (2006)
Article MathSciNet MATH Google Scholar
Tavaré, S., Balding, D.J., Griffiths, R.C., Donnelly, P.J.: Inferring Coalescence Times From DNA Sequence Data. Genetics 145(2), 505–518 (1997)
Google Scholar
Tran, M.N., Scharth, M., Pitt, M.K., Kohn, R.: $\text{ IS }^2$ for Bayesian inference in latent variable models (2013). arXiv:1309.3339
Whiteley, N.: Stability properties of some particle filters. Annals of Applied Probability 23(6), 2500–2537 (2013)
Article MathSciNet MATH Google Scholar
Wilkinson, R.D.: Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Statistical Applications in Genetics and Molecular Biology 12(2), 129–141 (2013)
Article MathSciNet Google Scholar
Wood, S.N.: Statistical inference for noisy nonlinear ecological dynamic systems. Nature 466(August), 1102–1104 (2010)
Article Google Scholar
Zhou, Y., Johansen, A.M., Aston, J.A.D.: Towards automatic model comparison: An adaptive sequential Monte Carlo approach. Journal of Computational and Graphical Statistics In press (2015)

Download references

Acknowledgments

The authors would like to thank Nial Friel for useful discussions, and for giving us access to the data and results from Friel (2013).

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Reading, Reading, UK
Richard G. Everitt, Ellen Rowing & Melina Evdemon-Hogan
Department of Statistics, University of Warwick, Coventry, CV4 7AL, UK
Adam M. Johansen

Authors

Richard G. Everitt
View author publications
You can also search for this author in PubMed Google Scholar
Adam M. Johansen
View author publications
You can also search for this author in PubMed Google Scholar
Ellen Rowing
View author publications
You can also search for this author in PubMed Google Scholar
Melina Evdemon-Hogan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard G. Everitt.

Appendices

Using SAV and exchange MCMC within SMC

1.1 Weight update when using SAV-MCMC

Let us consider the SAVM posterior, with K being the MCMC move used in SAVM. In this case the weight update is

$$\begin{aligned} \widetilde{w}_{k}^{\left( p\right) }= & {} \frac{p\left( \theta _{t}^{\left( p\right) }\right) f_{t}\left( y|\theta _{t}^{\left( p\right) }\right) q_{u} \left( u_{t}^{\left( p\right) }|\theta _{t}^{\left( p\right) },y\right) }{p\left( \theta _{t-1}^{\left( p\right) }\right) f_{t-1}\left( y|\theta _{t-1}^{\left( p\right) }\right) q_{u}\left( u_{t-1}^{\left( p\right) }|\theta _{t-1}^{\left( p\right) },y\right) }\\&\times \frac{L_{t-1}\left( \left( \theta _{t}^{\left( p\right) },u_{t}^{\left( p\right) }\right) , \left( \theta _{t-1}^{\left( p\right) },u_{t-1}^{\left( p\right) }\right) \right) }{K_{t}\left( \left( \theta _{t-1}^{\left( p\right) }, u_{t-1}^{\left( p\right) }\right) ,\left( \theta _{t}^{\left( p\right) },u_{t}^{\left( p\right) }\right) \right) }\\= & {} \frac{p\left( \theta _{t}^{\left( p\right) }\right) f_{t}\left( y|\theta _{t}^{\left( p\right) }\right) q_{u}\left( u_{t}^{\left( p\right) }|\theta _{t}^{\left( p\right) },y\right) }{p\left( \theta _{t-1}^{\left( p\right) }\right) f_{t-1}\left( y|\theta _{t-1}^{\left( p\right) }\right) q_{u}\left( u_{t-1}^{\left( p\right) }|\theta _{t-1}^{\left( p\right) },y\right) }\\&\times \frac{p\left( \theta _{t-1}^{\left( p\right) }\right) f_{t}\left( y|\theta _{t-1}^{\left( p\right) }\right) q_{u}\left( u_{t-1}^{\left( p\right) }|\theta _{t-1}^{\left( p\right) },y\right) }{p\left( \theta _{t}^{\left( p\right) }\right) f_{t}\left( y|\theta _{t}^{\left( p\right) }\right) q_{u}\left( u_{t}^{\left( p\right) }|\theta _{t}^{\left( p\right) },y\right) }\\= & {} \frac{\gamma _{t}\left( y|\theta _{t-1}^{\left( p\right) }\right) }{\gamma _{t-1} \left( y|\theta _{t-1}^{\left( p\right) }\right) }\frac{Z_{t-1}\left( \theta _{t-1}^{\left( p\right) }\right) }{Z_{t} \left( \theta _{t-1}^{\left( p\right) }\right) }, \end{aligned}$$

which is the same update as if we could use MCMC directly.

1.2 Weight update when using the exchange algorithm

Nicholls et al. (2012) show the exchange algorithm, when set up to target $\pi _{t}(\theta |y)\propto p(\theta )f_{t}(y|\theta )$ in the manner described in sect. 1.1.2, simulates a transition kernel that is in detailed balance with $\pi _{t}(\theta |y)$. This follows from showing that it satisfies a “very detailed balance” condition, which takes account of the auxiliary variable u. The result is that the derivation of the weight update follows exactly that of (12).

An extended space construction for the random weight SMC method in sect. 3.1.1

The following extended space construction justifies the use of the “approximate” weights in (13) via an explicit sequential importance (re)sampling argument along the lines of Del Moral et al. (2006), albeit with a slightly different sequence of target distributions.

Consider an actual sequence of target distributions $\{\pi _{t}\}_{t\ge 0}$. Assume we seek to approximate a normalising constant during every iteration by introducing additional variables $u_{t}=(u_{t}^{1},\ldots ,u_{t}^{M})$ during iteration $t>0$.

Define the sequence of target distributions:

where $L_{s}$ has the same rôle and interpretation as it does in a standard SMC sampler.

Assume that at iteration t the auxiliary variables $u_{t}^{m}$ are sampled independently (conditional upon the associated value of the parameter, $\theta _{t-1}$)and identically according to $f_{t}(\cdot |\theta _{t-1})$ and that $K_{t}$ denotes the incremental proposal distribution at iteration t, just as in a standard SMC sampler.

In the absence of resampling, each particle has been sampled from the following proposal distribution at time t:

$$\begin{aligned} \widetilde{\mu }_{t}\left( \widetilde{x}_{t}\right) =&\mu _{0}(\theta _{0})\prod _{s=1}^{t}K_{s} \left( \theta _{s-1},\theta _{s}\right) \prod _{s=1}^{t} \prod _{m=1}^{M}f_{s}\left( u_{s}^{m}|\theta _{s-1}\right) \end{aligned}$$

and hence its importance weight, $W_{t}(\widetilde{x}_{t})$, should be:

$$\begin{aligned}&\frac{\pi _{t}(\theta _{t})\prod _{s=0}^{t-1}L_{s}(\theta _{s+1},\theta _{s})}{\mu _{0}(\theta _{0})\prod _{s=1}^{t}K_{s}(\theta _{s-1},\theta _{s})} \\&\quad \quad \frac{\prod _{s=1}^{t}\frac{1}{M}\sum _{m=1}^{M}\left[ f_{s-1}(u_{s}^{m}|\theta _{s-1})\prod _{q\ne m}f_{s}(u_{s}^{m}|\theta _{s-1})\right] }{\prod _{s=1}^{t}\prod _{m=1}^{M}f_{s}(u_{s}^{m}|\theta _{s-1})}\\&\quad = \frac{\pi _{t}(\theta _{t})\prod _{s=0}^{t-1}L_{s}(\theta _{s+1},\theta _{s})}{\mu _{0}(\theta _{0})\prod _{s=1}^{t}K_{s}(\theta _{s-1},\theta _{s})}\prod _{s=1}^{t}\frac{1}{M}\sum _{m=1}^{M}\frac{f_{s-1}(u_{s}^{m}|\theta _{s-1})}{f_{s}(u_{s}^{m}|\theta _{s-1})}\\&\quad = W_{t-1}(\widetilde{x}_{t-1})\cdot \frac{\pi _{t}(\theta _{t})L_{t-1}(\theta _{t},\theta _{t-1})}{\pi _{t-1}(\theta _{t-1})K_{t}(\theta _{t-1},\theta _{t})}\\&\quad \qquad \frac{1}{M}\sum _{m=1}^{M}\frac{f_{t-1}(u_{t}^{m},\theta _{t-1})}{f_{t}(u_{t}^{m}|\theta _{t-1})}, \end{aligned}$$

which yields the natural sequential importance sampling interpretation. The validity of the incorporation of resampling follows by standard arguments.

If one has that $\pi _{t}(\theta _{t})\propto p(\theta _{t})f_{t}(y|\theta _{t})=p(\theta _{t})\gamma _{t}(y|\theta _{t})/Z_{t}(\theta _{t})$ and employs the time reversal of $K_{t}$ for $L_{t-1}$ then one arrives at an incremental importance weight, at time t of:

$$\begin{aligned}&\frac{p\left( \theta _{t}\right) f_{t}\left( y|\theta _{t-1}\right) }{p\left( \theta _{t-1}\right) f_{t-1}\left( y| \theta _{t-1}\right) }\frac{1}{M}\sum _{m=1}^{M}\frac{f_{t-1}\left( u_{t}^{m}| \theta _{t-1}\right) }{f_{t}\left( u_{t}^{m}|\theta _{t-1}\right) }\\&\quad = \frac{p\left( \theta _{t}\right) \gamma _{t}\left( y|\theta _{t-1}\right) }{p\left( \theta _{t-1}\right) \gamma _{t-1}\left( y|\theta _{t-1}\right) }\frac{1}{M}\sum _{m=1}^{M} \frac{\gamma _{t-1}\left( u_{t}^{m}|\theta _{t-1}\right) }{\gamma _{t}\left( u_{t}^{m}| \theta _{t-1}\right) } \end{aligned}$$

yielding the algorithm described in sect. 3.1.1 as an exact SMC algorithm on the described extended space.

Proof of SMC Sampler Error Bound

A little notation is required. We allow $(E,{\mathcal {E}})$ to denote the common state space of the sampler during each iteration, ${\mathcal {C}}_{b}(E)$ the collection of continuous, bounded functions from E to ${\mathbb {R}}$, and ${\mathcal {P}}(E)$ the collection of probability measures on this space. We define the Boltzmann-Gibbs operator associated with a potential function $G:E\rightarrow (0,\infty )$ as a mapping, $\varPsi _{G}:{\mathcal {P}}(E)\rightarrow {\mathcal {P}}(E)$, weakly via the integrals of any function $\varphi \in {\mathcal {C}}_{b}(E)$

$$\begin{aligned} \int \varphi (x)\varPsi _{G}(\eta )(dx)=\frac{\int \eta (dx)G(x)\varphi (x)}{\int \eta (dx^{\prime })G(x^{\prime })}. \end{aligned}$$

The integral of a set A under a probability measure $\eta $ is written $\eta (A)$ and the expectation of a function $\varphi $ of $X\sim \eta $ is written $\eta (\varphi )$. The supremum norm on ${\mathcal {C}}_{b}(E)$ is defined $||\varphi ||_{\infty }=\sup _{x\in E}\varphi (x)$ and the total variation distance on ${\mathcal {P}}(E)$ is $||\mu -\nu ||_{\text {TV}}=\sup _{A}(\nu (A)-\mu (A))$. Markov kernels, $M:E\rightarrow {\mathcal {P}}(E)$ induce two operators, one on integrable functions and the other on (probability) measures:

Having established this notation, we note that we have the following recursive definition of the distributions we consider:

$$\begin{aligned} \widetilde{\eta }_{0}=&\eta _{0}=:M_{0}&{\eta }_{t\ge 1}=&\varPsi _{G_{t-1}}(\eta _{t-1})&\widetilde{\eta }_{t\ge 1}=&\varPsi _{\widetilde{G}_{t-1}}(\widetilde{\eta }_{t-1}) \end{aligned}$$

and for notational convenience define the transition operators as

$$\begin{aligned} \varPhi _{t}(\eta _{t-1})=&\varPsi _{G_{t-1}}\left( \eta _{t-1}\right) M_{t}&\widetilde{\varPhi }_{t}\left( \widetilde{\eta }_{t-1}\right) =&\varPsi _{\widetilde{G}_{t-1}}\left( \widetilde{\eta }_{t-1}\right) M_{t}. \end{aligned}$$

We make use of the (nonlinear) dynamic semigroupoid, which we define recursively, via it’s action on a generic probability measure $\eta $, for $t\in {\mathbb {N}}$:

$$\begin{aligned} \varPhi _{t-1,t}(\eta )=&\varPhi _{t}(\eta )&\varPhi _{s,t}=\varPhi _{t}(\varPhi _{s,t-1}(\eta ))\text { for }s<t, \end{aligned}$$

with $\varPhi _{t,t}(\eta )= \eta $ and $\widetilde{\varPhi }_{s,t}$ defined correspondingly.

We begin with a lemma which allows us to control the discrepancy introduced by Bayesian updating of a measure with two different likelihood functions.

Lemma 1

(approximation error) If A1. holds, then $\forall \eta \in {\mathcal {P}}(E)$ and any $t\in {\mathbb {N}}$:

$$\begin{aligned} ||\varPsi _{\widetilde{G}_{t}}(\eta )-\varPsi _{G_{t}}(\eta )||_{TV}\le 2\gamma . \end{aligned}$$

Proof

Let $\varDelta _{t}:=\widetilde{G}_{t}-G_{t}$ and consider a generic $\varphi \in {\mathcal {C}}_{b}(E)$:

$$\begin{aligned}&(\varPsi _{\widetilde{G}_{t}}(\eta )-\varPsi _{G_{t}}(\eta ))(\varphi )\\&\quad = \frac{\eta (G_{t})\eta (\widetilde{G}_{t}\varphi )-\eta (\widetilde{G}_{t})\eta (G_{t}\varphi )}{\eta (\widetilde{G}_{t})\eta (G_{t})}\\&\quad = \frac{\eta (G_{t})\eta ((G_{t}+\varDelta _{t})\varphi )-\eta ((G_{t}+\varDelta _{t}))\eta (G_{t}\varphi )}{\eta (\widetilde{G}_{t})\eta (G_{t})}\\&\quad = \frac{\eta (G_{t})\eta (\varDelta _{t}\varphi )-\eta (\varDelta _{t})\eta (G_{t}\varphi )}{\eta (\widetilde{G}_{t})\eta (G_{t})} \end{aligned}$$

Considering the absolute value of this discrepancy, making using of the triangle inequality:

$$\begin{aligned} \left| (\varPsi _{\widetilde{G}_{t}}(\eta )-\varPsi _{G_{t}}(\eta ))(\varphi )\right| \le&\left| \frac{\eta (\varDelta _{t}\varphi )}{\eta (\widetilde{G}_{t})}\right| +\left| \frac{\eta (\varDelta _{t})}{\eta (\widetilde{G}_{t})}\right| \left| \frac{\eta (G_{t}\varphi )}{\eta (G_{t})}\right| \end{aligned}$$

Noting that $G_{t}$ is strictly positive, we can bound $|\eta (G_{t}\varphi )|/\eta (G_{t})$ with $\eta (G_{t}|\varphi |)/\eta (G_{t})$ and thus with $\left\| \varphi \right\| _{\infty }$ and apply a similar strategy to the first term:

$$\begin{aligned}&\left| (\varPsi _{\widetilde{G}_{t}}(\eta )-\varPsi _{G_{t}}(\eta ))(\varphi )\right| \\&\quad \le \left| \frac{\eta (|\varDelta _{t}|)\left\| \varphi \right\| _{\infty }}{\eta (\widetilde{G}_{t})}\right| +\left| \frac{\eta (\varDelta _{t})}{\eta (\widetilde{G}_{t})}\right| \left| \frac{\eta (G_{t}|\varphi |)}{\eta (G_{t})}\right| \le 2\gamma \left\| \varphi \right\| _{\infty }. \end{aligned}$$

(noting that $\eta (|\varDelta _{t}|)/\eta (\widetilde{G_{t}})<\gamma $ by integration of both sides of A1). $\square $

We now demonstrate that, if the local approximation error at each iteration of the algorithm(characterised by $\gamma $) is sufficiently small then it does not accumulate unboundedly as the algorithm progresses.

Proof of Proposition 1

We begin with a telescopic decomposition [mirroring the strategy employed for analysing particle approximations of these systems in Del Moral (2004)]:

$$\begin{aligned} \eta _{t}-\widetilde{\eta }_{t}=&\sum _{s=1}^{t}\varPhi _{s-1,t}(\widetilde{\eta }_{s-1})-\varPhi _{s,t}(\widetilde{\eta }_{s}). \end{aligned}$$

We thus establish (noting that $\widetilde{\eta }_{0}=\eta _{0}$):

$$\begin{aligned} \eta _{t}-\widetilde{\eta }_{t}=&\sum _{s=1}^{t}\varPhi _{s,t}(\varPhi _{s}(\widetilde{\eta }_{s-1}))-\varPhi _{s,t}(\widetilde{\varPhi }_{s}(\widetilde{\eta }_{s-1})). \end{aligned}$$

(20)

Turning our attention to an individual term in this expansion, noting that:

$$\begin{aligned} \varPhi _{s}(\eta )(\varphi )=&\varPsi _{G_{s-1}}(\eta )M_{s}(\varphi )&\widetilde{\varPhi }_{s}(\eta )(\varphi )=&\varPsi _{\widetilde{G}_{s-1}}(\eta )M_{s}(\varphi ) \end{aligned}$$

we have, by application of a standard Dobrushin contraction argument and Lemma 1

$$\begin{aligned}&(\varPhi _{s}(\widetilde{\eta }_{s-1})-\widetilde{\varPhi }_{s}(\widetilde{\eta }_{s-1}))(\varphi ) \end{aligned}$$

(21)

$$\begin{aligned}&\quad = \varPsi _{G_{s-1}}(\widetilde{\eta }_{s-1})M_{s}(\varphi )-\varPsi _{\widetilde{G}_{s-1}}(\widetilde{\eta }_{s-1})M_{s}(\varphi )\nonumber \\&\left\| \varPhi _{s}(\widetilde{\eta }_{s-1})-\widetilde{\varPhi }_{s}(\widetilde{\eta }_{s-1})\right\| _{\text{ TV }} \end{aligned}$$

(22)

$$\begin{aligned} {}&\quad \le (1-\epsilon (M))\left\| \varPsi _{G_{s-1}}(\widetilde{\eta }_{s-1})-\varPsi _{\widetilde{G}_{s-1}}(\widetilde{\eta }_{s-1})\right\| _{\text{ TV }}\nonumber \\&\quad \le 2\gamma (1-\epsilon (M)) \end{aligned}$$

(23)

which controls the error introduced instantaneously during each step.

We now turn our attention to controlling the accumulation of error. We make use of (Del Moral 2004, Proposition 4.3.6) which, under assumptions A2 and A3, allows us to deduce that for any probability measures $\mu ,\nu $:

$$\begin{aligned} \left\| \varPhi _{s,s+k}(\mu )-\varPhi _{s,s+k}(\nu )\right\| _{\text{ TV }}\le \beta (\varPhi _{s,s+k})\left\| \mu -\nu \right\| _{\text{ TV }} \end{aligned}$$

where

$$\begin{aligned} \beta (\varPhi _{s,s+k})=\frac{2}{\epsilon (M)\epsilon (G)}(1-\epsilon ^{2}(M))^{k}. \end{aligned}$$

Returning to decomposition (20), applying the triangle inequality and this result, before finally inserting (23) we arrive at:

$$\begin{aligned} \left\| \eta _{t}\!-\!\widetilde{\eta }_{t}\right\| _{\text{ TV }}&\le \! \sum _{s=1}^{t}\left\| \varPhi _{s,t}(\varPhi _{s}(\widetilde{\eta }_{s-1}))\!-\!\varPhi _{s,t}(\widetilde{\varPhi }_{s}(\widetilde{\eta }_{s-1}))\right\| _{\text{ TV }}\\&\le \sum _{s=1}^{t}\frac{2(1-\epsilon ^{2}(M))^{t-s}}{\epsilon (M)\epsilon (G)} \cdot \\&\quad \,\left\| \varPhi _{s}(\widetilde{\eta }_{s-1})-\widetilde{\varPhi }_{s}(\widetilde{\eta }_{s-1})\right\| _{\text{ TV }}\\&\le \sum _{s=1}^{t}\frac{2(1-\epsilon ^{2}(M))^{t-s}}{\epsilon (M)\epsilon (G)}\cdot 2\gamma (1-\epsilon (M))\\&= \frac{4\gamma (1-\epsilon (M))}{\epsilon (M)\epsilon (G)}\sum _{s=1}^{t}(1-\epsilon ^{2}(M))^{t-s} \end{aligned}$$

This is trivially bounded over all t by the geometric series and a little rearrangement yields the result:

$$\begin{aligned} \frac{4\gamma (1-\epsilon (M))}{\epsilon (M)\epsilon (G)}\sum _{s=0}^{\infty }(1-\epsilon ^{2}(M))^{s} =&\frac{4\gamma (1-\epsilon (M))}{\epsilon ^{3}(M)\epsilon (G)}. \end{aligned}$$

$\square $

Pseudo code for random weight SMC sampler

This appendix contains the simplest form of the random weight SMC sampler used in the data point tempering examples in sect. 3, in which resampling is performed at every step. Essentially, any standard improvements to SMC algorithms can be applied.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Everitt, R.G., Johansen, A.M., Rowing, E. et al. Bayesian model comparison with un-normalised likelihoods. Stat Comput 27, 403–422 (2017). https://doi.org/10.1007/s11222-016-9629-2

Download citation

Received: 01 April 2015
Accepted: 20 January 2016
Published: 08 February 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11222-016-9629-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian model comparison with un-normalised likelihoods

Abstract

Access this article

Similar content being viewed by others

Spatial Models Using Laplace Approximation Methods

Spatial Models Using Laplace Approximation Methods

Spatial Models Using Laplace Approximation Methods

Notes

References

Acknowledgments