Skip to main content
Log in

Bayesian model comparison with un-normalised likelihoods

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Models for which the likelihood function can be evaluated only up to a parameter-dependent unknown normalizing constant, such as Markov random field models, are used widely in computer science, statistical physics, spatial statistics, and network analysis. However, Bayesian analysis of these models using standard Monte Carlo methods is not possible due to the intractability of their likelihood functions. Several methods that permit exact, or close to exact, simulation from the posterior distribution have recently been developed. However, estimating the evidence and Bayes’ factors for these models remains challenging in general. This paper describes new random weight importance sampling and sequential Monte Carlo methods for estimating BFs that use simulation to circumvent the evaluation of the intractable likelihood, and compares them to existing methods. In some cases we observe an advantage in the use of biased weight estimates. An initial investigation into the theoretical and empirical properties of this class of methods is presented. Some support for the use of biased estimates is presented, but we advocate caution in the use of such estimates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. We note that log of an unbiased estimate in fact produces a negatively-biased estimator but we observe, through the results for the exact algorithm indicate that the variance of the evidence estimates we use is sufficiently small that this effect is negligible.

References

  • Alquier, P., Friel, N., Everitt, R.G., Boland, A.: Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels. Statistics and Computing In press (2015)

  • Andrieu, C., Roberts, G.O.: The pseudo-marginal approach for efficient Monte Carlo computations. Ann Stat 37(2), 697–725 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Andrieu, C., Vihola, M.: Convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms (2012). arXiv:1210.1484

  • Beaumont, M.A.: Estimation of population growth or decline in genetically monitored populations. Genetics 164(3), 1139–1160 (2003)

    Google Scholar 

  • Beskos, A., Crisan, D., Jasra, A., Whiteley, N.: Error bounds and normalizing constants for sequential Monte Carlo in high dimensions (2011). arXiv:1112.1544

  • Caimo, A., Friel, N.: Bayesian inference for exponential random graph models. Soc Netw 33, 41–55 (2011)

    Article  Google Scholar 

  • Chopin, N.: A sequential particle filter method for static models. Biometrika 89(3), 539–552 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Chopin, N., Jacob, P.E., Papaspiliopoulos, O.: \(\text{ SMC }^2\): an efficient algorithm for sequential analysis of state space models. J R Stat Soc 75(3), 397–426 (2013)

    Article  MathSciNet  Google Scholar 

  • Del Moral, P.: Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications. Probability and Its Applications. Springer, New York (2004)

    Book  MATH  Google Scholar 

  • Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. J R Stat Soc 68(3), 411–436 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo for Bayesian computation. Bayesian Stat 8, 115–148 (2007)

    MathSciNet  MATH  Google Scholar 

  • Didelot, X., Everitt, R.G., Johansen, A.M., Lawson, D.J.: Likelihood-free estimation of model evidence. Bayesian Anal 6(1), 49–76 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Drovandi, C.C., Pettitt, A.N., Lee, A.: Bayesian indirect inference using a parametric auxiliary model. Stat Sci 30(1), 72–95 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Everitt, R.G.: Bayesian parameter estimation for latent Markov random fields and social networks. J Comput Graph Stat 21(4), 940–960 (2012)

    Article  MathSciNet  Google Scholar 

  • Fearnhead, P., Papaspiliopoulos, O., Roberts, G.O., Stuart, A.M.: Random-weight particle filtering of continuous time processes. J R Stat Soc 72(4), 497–512 (2010)

    Article  MathSciNet  Google Scholar 

  • Friel, N.: Evidence and Bayes factor estimation for Gibbs random fields. J Comput GraphStat 22(3), 518–532 (2013)

    Article  MathSciNet  Google Scholar 

  • Friel, N., Rue, H.: Recursive computing and simulation-free inference for general factorizable models. Biometrika 94(3), 661–672 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Girolami, M.A., Lyne, A.M., Strathmann, H., Simpson, D., Atchade, Y.: Playing Russian roulette with intractable likelihoods (2013). arXiv:1306.4032

  • Grelaud, A., Robert, C.P., Marin, J.M.: ABC likelihood-free methods for model choice in Gibbs random fields. Bayesian Anal 4(2), 317–336 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Johndrow, J.E., Mattingly, J.C., Mukherjee, S., Dunson, D.: Approximations of Markov chains and high-dimensional Bayesian inference (2015). arXiv:1508.03387

  • Klaas, M., de Freitas, N., Doucet, A.: Toward practical \(N^2\) Monte Carlo: The marginal particle filter. In: Proceedings of the 20th International Conference on Uncertainty in Artificial Intelligence (2005)

  • Kong, A., Liu, J.S., Wong, W.H.: Sequential imputations and Bayesian missing data problems. J Am Stat Assoc 89(425), 278–288 (1994)

    Article  MATH  Google Scholar 

  • Lee, A., Whiteley, N.: Variance estimation and allocation in the particle filter (2015). arXiv:2015.0394

  • Marin, J.M., Pillai, N.S., Robert, C.P., Rousseau, J.: Relevant statistics for Bayesian model choice. J R Stat Soc 76(5), 833–859 (2014)

    Article  MathSciNet  Google Scholar 

  • Marjoram, P., Molitor, J., Plagnol, V., Tavare, S.: Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci USA 100(26), 15324–15328 (2003)

    Article  Google Scholar 

  • Meng, Xl, Wong, W.H.: Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Stat Sin 6, 831–860 (1996)

  • Møller, J., Pettitt, A.N., Reeves, R.W., Berthelsen, K.K.: An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika 93(2), 451–458 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Murray, I., Ghahramani, Z., MacKay, D.J.C.: MCMC for doubly-intractable distributions. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI), pp. 359–366 (2006)

  • Neal, R.M.: Annealed importance sampling. Stat Comput 11(2), 125–139 (2001)

    Google Scholar 

  • Neal, R.M.: Estimating ratios of normalizing constants using linked importance sampling (2005). arXiv:0511.1216

  • Nicholls, G.K., Fox, C., Watt, A.M.: Coupled MCMC With a randomized acceptance probability (2012). arXiv:1205.6857

  • Peters, G.W.: Topics in sequential Monte Carlo samplers. M.Sc. thesis, Unviersity of Cambridge (2005)

  • Picchini, U., Forman, J.L.: Accelerating inference for diffusions observed with measurement error and large sample sizes using approximate Bayesian computation: a case study (2013). arXiv:1310.0973

  • Prangle, D., Fearnhead, P., Cox, M.P., Biggs, P.J., French, N.P.: Semi-automatic selection of summary statistics for ABC model choice. Stat Appl Genet Mol Biol 13(1), 67–82 (2014)

    MathSciNet  MATH  Google Scholar 

  • Rao, V., Lin, L., Dunson, D.B.: Bayesian inference on the Stiefel manifold (2013). arXiv:1311.0907

  • Robert, C.P., Cornuet, J.M., Marin, J.M., Pillai, N.S.: Lack of confidence in approximate Bayesian computation model choice. Proc Natl AcadSci USA 108(37), 15,112–7 (2011)

  • Schweinberger, M., Handcock, M.: J R Stat Soc 77, 647–676 (2015)

    Article  Google Scholar 

  • Sisson, S.A., Fan, Y., Tanaka, M.M.: Sequential monte carlo without likelihoods. Proc Natl AcadSci USA 104(6), 1760–1765 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Skilling, J.: Nested sampling for general Bayesian computation. Bayesian Analysis 1(4), 833–859 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Tavaré, S., Balding, D.J., Griffiths, R.C., Donnelly, P.J.: Inferring Coalescence Times From DNA Sequence Data. Genetics 145(2), 505–518 (1997)

    Google Scholar 

  • Tran, M.N., Scharth, M., Pitt, M.K., Kohn, R.: \(\text{ IS }^2\) for Bayesian inference in latent variable models (2013). arXiv:1309.3339

  • Whiteley, N.: Stability properties of some particle filters. Annals of Applied Probability 23(6), 2500–2537 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Wilkinson, R.D.: Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Statistical Applications in Genetics and Molecular Biology 12(2), 129–141 (2013)

    Article  MathSciNet  Google Scholar 

  • Wood, S.N.: Statistical inference for noisy nonlinear ecological dynamic systems. Nature 466(August), 1102–1104 (2010)

    Article  Google Scholar 

  • Zhou, Y., Johansen, A.M., Aston, J.A.D.: Towards automatic model comparison: An adaptive sequential Monte Carlo approach. Journal of Computational and Graphical Statistics In press (2015)

Download references

Acknowledgments

The authors would like to thank Nial Friel for useful discussions, and for giving us access to the data and results from Friel (2013).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard G. Everitt.

Appendices

Using SAV and exchange MCMC within SMC

1.1 Weight update when using SAV-MCMC

Let us consider the SAVM posterior, with K being the MCMC move used in SAVM. In this case the weight update is

$$\begin{aligned} \widetilde{w}_{k}^{\left( p\right) }= & {} \frac{p\left( \theta _{t}^{\left( p\right) }\right) f_{t}\left( y|\theta _{t}^{\left( p\right) }\right) q_{u} \left( u_{t}^{\left( p\right) }|\theta _{t}^{\left( p\right) },y\right) }{p\left( \theta _{t-1}^{\left( p\right) }\right) f_{t-1}\left( y|\theta _{t-1}^{\left( p\right) }\right) q_{u}\left( u_{t-1}^{\left( p\right) }|\theta _{t-1}^{\left( p\right) },y\right) }\\&\times \frac{L_{t-1}\left( \left( \theta _{t}^{\left( p\right) },u_{t}^{\left( p\right) }\right) , \left( \theta _{t-1}^{\left( p\right) },u_{t-1}^{\left( p\right) }\right) \right) }{K_{t}\left( \left( \theta _{t-1}^{\left( p\right) }, u_{t-1}^{\left( p\right) }\right) ,\left( \theta _{t}^{\left( p\right) },u_{t}^{\left( p\right) }\right) \right) }\\= & {} \frac{p\left( \theta _{t}^{\left( p\right) }\right) f_{t}\left( y|\theta _{t}^{\left( p\right) }\right) q_{u}\left( u_{t}^{\left( p\right) }|\theta _{t}^{\left( p\right) },y\right) }{p\left( \theta _{t-1}^{\left( p\right) }\right) f_{t-1}\left( y|\theta _{t-1}^{\left( p\right) }\right) q_{u}\left( u_{t-1}^{\left( p\right) }|\theta _{t-1}^{\left( p\right) },y\right) }\\&\times \frac{p\left( \theta _{t-1}^{\left( p\right) }\right) f_{t}\left( y|\theta _{t-1}^{\left( p\right) }\right) q_{u}\left( u_{t-1}^{\left( p\right) }|\theta _{t-1}^{\left( p\right) },y\right) }{p\left( \theta _{t}^{\left( p\right) }\right) f_{t}\left( y|\theta _{t}^{\left( p\right) }\right) q_{u}\left( u_{t}^{\left( p\right) }|\theta _{t}^{\left( p\right) },y\right) }\\= & {} \frac{\gamma _{t}\left( y|\theta _{t-1}^{\left( p\right) }\right) }{\gamma _{t-1} \left( y|\theta _{t-1}^{\left( p\right) }\right) }\frac{Z_{t-1}\left( \theta _{t-1}^{\left( p\right) }\right) }{Z_{t} \left( \theta _{t-1}^{\left( p\right) }\right) }, \end{aligned}$$

which is the same update as if we could use MCMC directly.

1.2 Weight update when using the exchange algorithm

Nicholls et al. (2012) show the exchange algorithm, when set up to target \(\pi _{t}(\theta |y)\propto p(\theta )f_{t}(y|\theta )\) in the manner described in sect. 1.1.2, simulates a transition kernel that is in detailed balance with \(\pi _{t}(\theta |y)\). This follows from showing that it satisfies a “very detailed balance” condition, which takes account of the auxiliary variable u. The result is that the derivation of the weight update follows exactly that of (12).

An extended space construction for the random weight SMC method in sect. 3.1.1

The following extended space construction justifies the use of the “approximate” weights in (13) via an explicit sequential importance (re)sampling argument along the lines of Del Moral et al. (2006), albeit with a slightly different sequence of target distributions.

Consider an actual sequence of target distributions \(\{\pi _{t}\}_{t\ge 0}\). Assume we seek to approximate a normalising constant during every iteration by introducing additional variables \(u_{t}=(u_{t}^{1},\ldots ,u_{t}^{M})\) during iteration \(t>0\).

Define the sequence of target distributions:

where \(L_{s}\) has the same rôle and interpretation as it does in a standard SMC sampler.

Assume that at iteration t the auxiliary variables \(u_{t}^{m}\) are sampled independently (conditional upon the associated value of the parameter, \(\theta _{t-1}\))and identically according to \(f_{t}(\cdot |\theta _{t-1})\) and that \(K_{t}\) denotes the incremental proposal distribution at iteration t, just as in a standard SMC sampler.

In the absence of resampling, each particle has been sampled from the following proposal distribution at time t:

$$\begin{aligned} \widetilde{\mu }_{t}\left( \widetilde{x}_{t}\right) =&\mu _{0}(\theta _{0})\prod _{s=1}^{t}K_{s} \left( \theta _{s-1},\theta _{s}\right) \prod _{s=1}^{t} \prod _{m=1}^{M}f_{s}\left( u_{s}^{m}|\theta _{s-1}\right) \end{aligned}$$

and hence its importance weight, \(W_{t}(\widetilde{x}_{t})\), should be:

$$\begin{aligned}&\frac{\pi _{t}(\theta _{t})\prod _{s=0}^{t-1}L_{s}(\theta _{s+1},\theta _{s})}{\mu _{0}(\theta _{0})\prod _{s=1}^{t}K_{s}(\theta _{s-1},\theta _{s})} \\&\quad \quad \frac{\prod _{s=1}^{t}\frac{1}{M}\sum _{m=1}^{M}\left[ f_{s-1}(u_{s}^{m}|\theta _{s-1})\prod _{q\ne m}f_{s}(u_{s}^{m}|\theta _{s-1})\right] }{\prod _{s=1}^{t}\prod _{m=1}^{M}f_{s}(u_{s}^{m}|\theta _{s-1})}\\&\quad = \frac{\pi _{t}(\theta _{t})\prod _{s=0}^{t-1}L_{s}(\theta _{s+1},\theta _{s})}{\mu _{0}(\theta _{0})\prod _{s=1}^{t}K_{s}(\theta _{s-1},\theta _{s})}\prod _{s=1}^{t}\frac{1}{M}\sum _{m=1}^{M}\frac{f_{s-1}(u_{s}^{m}|\theta _{s-1})}{f_{s}(u_{s}^{m}|\theta _{s-1})}\\&\quad = W_{t-1}(\widetilde{x}_{t-1})\cdot \frac{\pi _{t}(\theta _{t})L_{t-1}(\theta _{t},\theta _{t-1})}{\pi _{t-1}(\theta _{t-1})K_{t}(\theta _{t-1},\theta _{t})}\\&\quad \qquad \frac{1}{M}\sum _{m=1}^{M}\frac{f_{t-1}(u_{t}^{m},\theta _{t-1})}{f_{t}(u_{t}^{m}|\theta _{t-1})}, \end{aligned}$$

which yields the natural sequential importance sampling interpretation. The validity of the incorporation of resampling follows by standard arguments.

If one has that \(\pi _{t}(\theta _{t})\propto p(\theta _{t})f_{t}(y|\theta _{t})=p(\theta _{t})\gamma _{t}(y|\theta _{t})/Z_{t}(\theta _{t})\) and employs the time reversal of \(K_{t}\) for \(L_{t-1}\) then one arrives at an incremental importance weight, at time t of:

$$\begin{aligned}&\frac{p\left( \theta _{t}\right) f_{t}\left( y|\theta _{t-1}\right) }{p\left( \theta _{t-1}\right) f_{t-1}\left( y| \theta _{t-1}\right) }\frac{1}{M}\sum _{m=1}^{M}\frac{f_{t-1}\left( u_{t}^{m}| \theta _{t-1}\right) }{f_{t}\left( u_{t}^{m}|\theta _{t-1}\right) }\\&\quad = \frac{p\left( \theta _{t}\right) \gamma _{t}\left( y|\theta _{t-1}\right) }{p\left( \theta _{t-1}\right) \gamma _{t-1}\left( y|\theta _{t-1}\right) }\frac{1}{M}\sum _{m=1}^{M} \frac{\gamma _{t-1}\left( u_{t}^{m}|\theta _{t-1}\right) }{\gamma _{t}\left( u_{t}^{m}| \theta _{t-1}\right) } \end{aligned}$$

yielding the algorithm described in sect. 3.1.1 as an exact SMC algorithm on the described extended space.

Proof of SMC Sampler Error Bound

A little notation is required. We allow \((E,{\mathcal {E}})\) to denote the common state space of the sampler during each iteration, \({\mathcal {C}}_{b}(E)\) the collection of continuous, bounded functions from E to \({\mathbb {R}}\), and \({\mathcal {P}}(E)\) the collection of probability measures on this space. We define the Boltzmann-Gibbs operator associated with a potential function \(G:E\rightarrow (0,\infty )\) as a mapping, \(\varPsi _{G}:{\mathcal {P}}(E)\rightarrow {\mathcal {P}}(E)\), weakly via the integrals of any function \(\varphi \in {\mathcal {C}}_{b}(E)\)

$$\begin{aligned} \int \varphi (x)\varPsi _{G}(\eta )(dx)=\frac{\int \eta (dx)G(x)\varphi (x)}{\int \eta (dx^{\prime })G(x^{\prime })}. \end{aligned}$$

The integral of a set A under a probability measure \(\eta \) is written \(\eta (A)\) and the expectation of a function \(\varphi \) of \(X\sim \eta \) is written \(\eta (\varphi )\). The supremum norm on \({\mathcal {C}}_{b}(E)\) is defined \(||\varphi ||_{\infty }=\sup _{x\in E}\varphi (x)\) and the total variation distance on \({\mathcal {P}}(E)\) is \(||\mu -\nu ||_{\text {TV}}=\sup _{A}(\nu (A)-\mu (A))\). Markov kernels, \(M:E\rightarrow {\mathcal {P}}(E)\) induce two operators, one on integrable functions and the other on (probability) measures:

Having established this notation, we note that we have the following recursive definition of the distributions we consider:

$$\begin{aligned} \widetilde{\eta }_{0}=&\eta _{0}=:M_{0}&{\eta }_{t\ge 1}=&\varPsi _{G_{t-1}}(\eta _{t-1})&\widetilde{\eta }_{t\ge 1}=&\varPsi _{\widetilde{G}_{t-1}}(\widetilde{\eta }_{t-1}) \end{aligned}$$

and for notational convenience define the transition operators as

$$\begin{aligned} \varPhi _{t}(\eta _{t-1})=&\varPsi _{G_{t-1}}\left( \eta _{t-1}\right) M_{t}&\widetilde{\varPhi }_{t}\left( \widetilde{\eta }_{t-1}\right) =&\varPsi _{\widetilde{G}_{t-1}}\left( \widetilde{\eta }_{t-1}\right) M_{t}. \end{aligned}$$

We make use of the (nonlinear) dynamic semigroupoid, which we define recursively, via it’s action on a generic probability measure \(\eta \), for \(t\in {\mathbb {N}}\):

$$\begin{aligned} \varPhi _{t-1,t}(\eta )=&\varPhi _{t}(\eta )&\varPhi _{s,t}=\varPhi _{t}(\varPhi _{s,t-1}(\eta ))\text { for }s<t, \end{aligned}$$

with \(\varPhi _{t,t}(\eta )= \eta \) and \(\widetilde{\varPhi }_{s,t}\) defined correspondingly.

We begin with a lemma which allows us to control the discrepancy introduced by Bayesian updating of a measure with two different likelihood functions.

Lemma 1

(approximation error) If A1. holds, then \(\forall \eta \in {\mathcal {P}}(E)\) and any \(t\in {\mathbb {N}}\):

$$\begin{aligned} ||\varPsi _{\widetilde{G}_{t}}(\eta )-\varPsi _{G_{t}}(\eta )||_{TV}\le 2\gamma . \end{aligned}$$

Proof

Let \(\varDelta _{t}:=\widetilde{G}_{t}-G_{t}\) and consider a generic \(\varphi \in {\mathcal {C}}_{b}(E)\):

$$\begin{aligned}&(\varPsi _{\widetilde{G}_{t}}(\eta )-\varPsi _{G_{t}}(\eta ))(\varphi )\\&\quad = \frac{\eta (G_{t})\eta (\widetilde{G}_{t}\varphi )-\eta (\widetilde{G}_{t})\eta (G_{t}\varphi )}{\eta (\widetilde{G}_{t})\eta (G_{t})}\\&\quad = \frac{\eta (G_{t})\eta ((G_{t}+\varDelta _{t})\varphi )-\eta ((G_{t}+\varDelta _{t}))\eta (G_{t}\varphi )}{\eta (\widetilde{G}_{t})\eta (G_{t})}\\&\quad = \frac{\eta (G_{t})\eta (\varDelta _{t}\varphi )-\eta (\varDelta _{t})\eta (G_{t}\varphi )}{\eta (\widetilde{G}_{t})\eta (G_{t})} \end{aligned}$$

Considering the absolute value of this discrepancy, making using of the triangle inequality:

$$\begin{aligned} \left| (\varPsi _{\widetilde{G}_{t}}(\eta )-\varPsi _{G_{t}}(\eta ))(\varphi )\right| \le&\left| \frac{\eta (\varDelta _{t}\varphi )}{\eta (\widetilde{G}_{t})}\right| +\left| \frac{\eta (\varDelta _{t})}{\eta (\widetilde{G}_{t})}\right| \left| \frac{\eta (G_{t}\varphi )}{\eta (G_{t})}\right| \end{aligned}$$

Noting that \(G_{t}\) is strictly positive, we can bound \(|\eta (G_{t}\varphi )|/\eta (G_{t})\) with \(\eta (G_{t}|\varphi |)/\eta (G_{t})\) and thus with \(\left\| \varphi \right\| _{\infty }\) and apply a similar strategy to the first term:

$$\begin{aligned}&\left| (\varPsi _{\widetilde{G}_{t}}(\eta )-\varPsi _{G_{t}}(\eta ))(\varphi )\right| \\&\quad \le \left| \frac{\eta (|\varDelta _{t}|)\left\| \varphi \right\| _{\infty }}{\eta (\widetilde{G}_{t})}\right| +\left| \frac{\eta (\varDelta _{t})}{\eta (\widetilde{G}_{t})}\right| \left| \frac{\eta (G_{t}|\varphi |)}{\eta (G_{t})}\right| \le 2\gamma \left\| \varphi \right\| _{\infty }. \end{aligned}$$

(noting that \(\eta (|\varDelta _{t}|)/\eta (\widetilde{G_{t}})<\gamma \) by integration of both sides of A1). \(\square \)

We now demonstrate that, if the local approximation error at each iteration of the algorithm(characterised by \(\gamma \)) is sufficiently small then it does not accumulate unboundedly as the algorithm progresses.

Proof of Proposition 1

We begin with a telescopic decomposition [mirroring the strategy employed for analysing particle approximations of these systems in Del Moral (2004)]:

$$\begin{aligned} \eta _{t}-\widetilde{\eta }_{t}=&\sum _{s=1}^{t}\varPhi _{s-1,t}(\widetilde{\eta }_{s-1})-\varPhi _{s,t}(\widetilde{\eta }_{s}). \end{aligned}$$

We thus establish (noting that \(\widetilde{\eta }_{0}=\eta _{0}\)):

$$\begin{aligned} \eta _{t}-\widetilde{\eta }_{t}=&\sum _{s=1}^{t}\varPhi _{s,t}(\varPhi _{s}(\widetilde{\eta }_{s-1}))-\varPhi _{s,t}(\widetilde{\varPhi }_{s}(\widetilde{\eta }_{s-1})). \end{aligned}$$
(20)

Turning our attention to an individual term in this expansion, noting that:

$$\begin{aligned} \varPhi _{s}(\eta )(\varphi )=&\varPsi _{G_{s-1}}(\eta )M_{s}(\varphi )&\widetilde{\varPhi }_{s}(\eta )(\varphi )=&\varPsi _{\widetilde{G}_{s-1}}(\eta )M_{s}(\varphi ) \end{aligned}$$

we have, by application of a standard Dobrushin contraction argument and Lemma 1

$$\begin{aligned}&(\varPhi _{s}(\widetilde{\eta }_{s-1})-\widetilde{\varPhi }_{s}(\widetilde{\eta }_{s-1}))(\varphi ) \end{aligned}$$
(21)
$$\begin{aligned}&\quad = \varPsi _{G_{s-1}}(\widetilde{\eta }_{s-1})M_{s}(\varphi )-\varPsi _{\widetilde{G}_{s-1}}(\widetilde{\eta }_{s-1})M_{s}(\varphi )\nonumber \\&\left\| \varPhi _{s}(\widetilde{\eta }_{s-1})-\widetilde{\varPhi }_{s}(\widetilde{\eta }_{s-1})\right\| _{\text{ TV }} \end{aligned}$$
(22)
$$\begin{aligned} {}&\quad \le (1-\epsilon (M))\left\| \varPsi _{G_{s-1}}(\widetilde{\eta }_{s-1})-\varPsi _{\widetilde{G}_{s-1}}(\widetilde{\eta }_{s-1})\right\| _{\text{ TV }}\nonumber \\&\quad \le 2\gamma (1-\epsilon (M)) \end{aligned}$$
(23)

which controls the error introduced instantaneously during each step.

We now turn our attention to controlling the accumulation of error. We make use of  (Del Moral 2004, Proposition 4.3.6) which, under assumptions A2 and A3, allows us to deduce that for any probability measures \(\mu ,\nu \):

$$\begin{aligned} \left\| \varPhi _{s,s+k}(\mu )-\varPhi _{s,s+k}(\nu )\right\| _{\text{ TV }}\le \beta (\varPhi _{s,s+k})\left\| \mu -\nu \right\| _{\text{ TV }} \end{aligned}$$

where

$$\begin{aligned} \beta (\varPhi _{s,s+k})=\frac{2}{\epsilon (M)\epsilon (G)}(1-\epsilon ^{2}(M))^{k}. \end{aligned}$$

Returning to decomposition (20), applying the triangle inequality and this result, before finally inserting (23) we arrive at:

$$\begin{aligned} \left\| \eta _{t}\!-\!\widetilde{\eta }_{t}\right\| _{\text{ TV }}&\le \! \sum _{s=1}^{t}\left\| \varPhi _{s,t}(\varPhi _{s}(\widetilde{\eta }_{s-1}))\!-\!\varPhi _{s,t}(\widetilde{\varPhi }_{s}(\widetilde{\eta }_{s-1}))\right\| _{\text{ TV }}\\&\le \sum _{s=1}^{t}\frac{2(1-\epsilon ^{2}(M))^{t-s}}{\epsilon (M)\epsilon (G)} \cdot \\&\quad \,\left\| \varPhi _{s}(\widetilde{\eta }_{s-1})-\widetilde{\varPhi }_{s}(\widetilde{\eta }_{s-1})\right\| _{\text{ TV }}\\&\le \sum _{s=1}^{t}\frac{2(1-\epsilon ^{2}(M))^{t-s}}{\epsilon (M)\epsilon (G)}\cdot 2\gamma (1-\epsilon (M))\\&= \frac{4\gamma (1-\epsilon (M))}{\epsilon (M)\epsilon (G)}\sum _{s=1}^{t}(1-\epsilon ^{2}(M))^{t-s} \end{aligned}$$

This is trivially bounded over all t by the geometric series and a little rearrangement yields the result:

$$\begin{aligned} \frac{4\gamma (1-\epsilon (M))}{\epsilon (M)\epsilon (G)}\sum _{s=0}^{\infty }(1-\epsilon ^{2}(M))^{s} =&\frac{4\gamma (1-\epsilon (M))}{\epsilon ^{3}(M)\epsilon (G)}. \end{aligned}$$

\(\square \)

Pseudo code for random weight SMC sampler

This appendix contains the simplest form of the random weight SMC sampler used in the data point tempering examples in sect. 3, in which resampling is performed at every step. Essentially, any standard improvements to SMC algorithms can be applied.

figure a

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Everitt, R.G., Johansen, A.M., Rowing, E. et al. Bayesian model comparison with un-normalised likelihoods. Stat Comput 27, 403–422 (2017). https://doi.org/10.1007/s11222-016-9629-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9629-2

Keywords

Navigation