A coalescent dual process in a Moran model with genic selection

https://doi.org/10.1016/j.tpb.2009.03.004Get rights and content

Abstract

A coalescent dual process for a multi-type Moran model with genic selection is derived using a generator approach. This leads to an expansion of the transition functions in the Moran model and the Wright–Fisher diffusion process limit in terms of the transition functions for the coalescent dual. A graphical representation of the Moran model (in the spirit of Harris) identifies the dual as a strong dual process following typed lines backwards in time. An application is made to the harmonic measure problem of finding the joint probability distribution of the time to the first loss of an allele from the population and the distribution of the surviving alleles at the time of loss. Our dual process mirrors the Ancestral Selection Graph of [Krone, S. M., Neuhauser, C., 1997. Ancestral processes with selection. Theoret. Popul. Biol. 51, 210–237; Neuhauser, C., Krone, S. M., 1997. The genealogy of samples in models with selection. Genetics 145, 519–534], which allows one to reconstruct the genealogy of a random sample from a population subject to genic selection. In our setting, we follow [Stephens, M., Donnelly, P., 2002. Ancestral inference in population genetics models with selection. Aust. N. Z. J. Stat. 45, 395–430] in assuming that the types of individuals in the sample are known. There are also close links to [Fearnhead, P., 2002. The common ancestor at a nonneutral locus. J. Appl. Probab. 39, 38–54]. However, our methods and applications are quite different. This work can also be thought of as extending a dual process construction in a Wright–Fisher diffusion in [Barbour, A.D., Ethier, S.N., Griffiths, R.C., 2000. A transition function expansion for a diffusion model with selection. Ann. Appl. Probab. 10, 123–162]. The application to the harmonic measure problem extends a construction provided in the setting of a neutral diffusion process model in [Ethier, S.N., Griffiths, R.C., 1991. Harmonic measure for random genetic drift. In: Pinsky, M.A. (Ed.), Diffusion Processes and Related Problems in Analysis, vol. 1. In: Progress in Probability Series, vol. 22, Birkhäuser, Boston, pp. 73–81].

Introduction

Consider a population in which each individual is labelled according to a type taken from the set [d]={1,2,,d}. We write Xj(t) for the frequency of type j individuals in the population at time t0 and X(t)=(Xj(t))j[d] and model the evolution of the population using a multi-type Wright–Fisher diffusion process. In the simplest setting, there is no selection, and mutation between types is parent independent. That is, each individual mutates to type j at rate θj/20, independent of its current type. The generator of the diffusion process is then L=12i,j[d]xi(δijxj)2xixj+12i[d](θi|θ|xi)xi, where we use the notation |a|=jaj for the sum of elements in a vector. If θ>0 (meaning that θj>0 for all j[d]) the Wright–Fisher diffusion has a Dirichlet stationary distribution D(x,θ)=Γ(|θ|)Γ(θ1)Γ(θd)x1θ11xdθd1 for x>0 and |x|=1. More generally, one can allow some of the θj to vanish, in which case we obtain a generalized Dirichlet distribution in which the corresponding frequencies xj vanish with probability one. Ethier and Griffiths (1993) show that the transition distribution of the diffusion can be expressed as a mixture, P(t,x,)=k=0qk|θ|(t)|l|=kM(l;k,x)D(,θ+l), where M(l;k,x) denotes the multinomial distribution, M(l;k,x)=klx1l1xdld,|l|=k, and qk|θ|(t) are the transition functions of a (dual) pure death process which we denote by {L(t),t0}. This process should be thought of as evolving in backwards time. Lineages are lost through coalescence, through which kk1 at rate k(k1)/2, and mutation, resulting in kk1 at an additional rate k|θ|/2. We suppose that L(t) starts from infinity (although it will be finite at any t>0). If |θ|=0 this dual process is the number of blocks in the famous Kingman coalescent (Kingman, 1982). The expansion (2) still holds in this case, except that now we will have L(t)1 for all t0 and so the summation is over k1. There is an explicit expression for the transition functions, qk|θ|(t)=j=kρj|θ|(t)(1)jk(2j+|θ|1)(j+|θ|)(k1)j!(jk)!, where ρj|θ|(t)=ej(j+|θ|1)t/2, (Griffiths, 1980, Tavaré, 1984, Griffiths, 2006). To understand the expansion (2), one can think of the infinite number of individuals that make up L(0) as the ‘leaves’ in a forest of trees. Each tree either grows from a ‘founder’ at time t (which corresponds to time zero in the diffusion process) or its root arose through a new mutation. This subdivides the leaves into ‘families’ and leads to the Dirichlet mixture. If there are k founder lineages, then their types are determined by sampling k individuals from the diffusion at time zero, and hence the probability that the numbers of founder lineages of types 1,,d are given by l=(l1,,ld) with |l|=k is just M(l;k,x). Let U=(U1,,Uk) be the relative family sizes of these founder families in the leaves of the tree, and V=(V1,,Vd) be the frequencies of families derived from new mutations on the tree edges in (0,t). Then UV=(U1,,Uk,V1,,Vd) has a D(uv,(1,,1)θ) distribution. The term D(,θ+l) in (2) is obtained by combining families of individuals of the same type, corresponding to adding the parameters in the Dirichlet distribution. If θ=0 the process is one of pure random drift. There is an analogous mixture representation for the transition function of a Fleming–Viot process. In that setting the finite set [d] is replaced by an infinite type space. At each time t, X(t) is now a probability measure on the type space and the Dirichlet distribution is replaced by a Poisson–Dirichlet distribution (Ethier and Griffiths, 1993). This is a canonical representation because the d-type diffusion can be obtained by taking the measure that determines the type after mutation in the Fleming–Viot process to be atomic with atoms {θj/|θ|;θj>0,j[d]}. The transition distribution (2) first appeared in Griffiths and Li (1983) and Tavaré (1984) with an interpretation based on Griffiths (1980) lines of descent. Donnelly and Tavaré (1987) discuss (2) and give a probabilistic explanation. Watterson (1984) derived an analogous representation for the distribution of old and new allele families in a neutral Moran model.

Of course one can obtain the multi-type Wright–Fisher diffusion with selection as an infinite population limit of Moran models (with weak selection). For a population of size N and types from [d], each individual of type i gives birth at rate λi=λ(1+σi/N) (to an offspring of the same type) and an individual is selected at random from the population to die (thus maintaining constant population size). It is convenient to suppose that the constants {σi}i[d] are negative and we write σ for max{σiσj:i,j[d]}. In addition, mutation changes an individual of type i to an individual of type j at rate μpij. If we take λ=N/2 and let N we recover the multi-type Wright–Fisher diffusion. Krone and Neuhauser (1997) and Neuhauser and Krone (1997) exploited the graphical representation of a Moran model with selection (which can be thought of as a biased voter model on a complete graph) to write down the Ancestral Selection Graph (ASG). This graph is traced out by a branching–coalescing system of lineages and has embedded within it the genealogy of a random sample from the population. Passing to a weak selection limit they obtain the same duality in the diffusion setting. In that limit, if there are currently j edges in the graph then jj1 (through coalescence) at rate (j2) and jj+1 at rate σj/2. These branching events correspond to ‘potential selective events’. In order to extract the genealogy of a sample of size n, one starts with n edges in the graph and traces back until the first (almost surely finite) time when there is only one edge. This is the ‘ultimate ancestor’. The type of the ultimate ancestor is chosen (by sampling from the population at that time) and then one works back through the graph using the rule that ‘the fitter type always wins’ whenever one arrives at a point corresponding to a branching event. This allows us to prune the graph to recover the genealogical tree (and the types in the sample). Mano (in press) found an explicit, though complicated, expression for the transition functions of the number of ancestors in the ASG as we trace backwards in time by considering the ASG as a moment dual in a two-allele Wright–Fisher diffusion process with selection and without mutation. Donnelly and Kurtz (1999) use their ‘modified lookdown’ approach to construct, simultaneously, the Fleming–Viot process with selection and the ancestral selection graph that encodes the genealogy. Stephens and Donnelly (2002) and Fearnhead (2002) consider the case when the types of individual in the sample are known and construct an ASG with ‘typed’ lines. Stephens and Donnelly (2002) deal with general diploid selection (in an infinite population limit) whereas Fearnhead (2002) considers the genic selection that interests us here. His results too are valid in the infinite population limit. Both papers deal only with parent-independent mutation. The transition rates in their typed ASGs are similar to those of (24) in this paper once we specialize to a diffusion model with genic selection and parent-independent mutation.

Barbour et al. (2000) derive a transition density expansion for a Wright–Fisher diffusion with genic selection in terms of the transition functions of a dual process. The generator of the diffusion process {X(t),t0} with selection coefficients {σj0,j[d]} is L=12i,j[d]xi(δijxj)2xixj+12i[d](θi|θ|xi+xi(σis(x)))xi, where s(x)=j[d]σjxj. The transition distribution in the diffusion can be written as a mixture, P(t,x,)=lZ+dhxl(t)D(,l+θ,σ). Here D(y,l+θ,σ) is a weighted Dirichlet distribution whose density is weighted by exp(j[d]σjyj). In general, if ξ has a Dirichlet distribution D(,θ+l) we write v(θ+l)=E[exp{j[d]σjξj}] for the normalizing constant in D(,l+θ,σ). The functions {hxl(t),t0} are transition functions of a multi-type birth and death process started from an infinite number of individuals whose types have frequencies x=(x1,,xd). (In fact, showing that one can construct the birth–death process from this entrance boundary at infinity is rather involved.) The non-zero entries in the αth row of the Q matrix for the multi-type birth and death process are q(α,α+ej)=12|σj||α|(αj+θj)|α|+|θ|v(θ+α+ej)v(θ+α)q(α,αej)=12αj(|θ|+|α|1)v(α+θej)v(θ+α)q(α,α)=12[j[d]αj|σj|+|α|(|α|+|θ|1)]. As usual we have used ej to denote the vector with zero entries in all but the jth slot, where there is a 1. In contrast to the neutral case, here no closed-form expression is known for the transition functions {hxl(t),t0}. Moreover, although there must be a connection with the ASG, the probabilistic interpretation of the dual process is not immediately clear.

Our first goal in this paper is to derive a branching and coalescing dual process for a Moran model describing N individuals, with types chosen from a (possibly infinite) space E, undergoing genic selection and a general Markov mutation scheme. The derivation is entirely algebraic, but has a clear probabilistic interpretation which we provide through consideration of the graphical representation of the Moran model. This relates closely to previous work of Stephens and Donnelly (2002) and Fearnhead (2002), but our purpose is different. Here we apply the dual to provide an expansion analogous to (4) for the transition functions of both the Moran model and the diffusion obtained under the weak selection limit and, furthermore, when the mutation mechanism is such that types can be lost from the population, to provide a new line of attack for the ‘harmonic measure problem’.

We suppose that each individual of type i mutates to type j at rate μpij, where P=(Pij)i,jE is a transition probability matrix. We write Zj(t) for the number of individuals of type j at time t and Z(t)=(Zj(t))jE. The duality derived in Section 4.1 is based on factorial moments of the Wright–Fisher diffusion. These are defined through fα(z)=jEzj[αj]. Here, and throughout, we adopt the standard notation a[k]=a(a1)(ak+1),a(k)=a(a+1)(a+k1). The algebraic derivation that we present only provides a weak duality between the Moran model and the multi-type branching and coalescing dual. That this can be extended to a strong duality is seen through the graphical representation of the Moran model in Section 4.2. This direct probabilistic interpretation of the dual process contrasts with the situation in Barbour et al. (2000), where such a direct interpretation of the dual seems to be difficult. In Section 5, the transition functions of {Z(t),t0} will be shown to have a mixture expansion in terms of the transition functions of the dual process. Specializing to parent-independent mutation and passing to the limit as N we then recover Eq. (4), the transition function expansion of the corresponding diffusion process of Barbour et al. (2000).

In models where the mutation scheme is such that types can be lost from the population, it is natural to ask about the way in which types are lost and the probability of fixation of a particular type. Questions of this type can be difficult to answer in multi-type diffusion models, especially in the presence of selection. In a neutral population with types labelled {0,1,,n} and no mutation (so that the Wright–Fisher diffusion is one of pure random drift), Ethier and Griffiths (1991) find the probability density that the allele labelled 0 is the first to be lost and that at the time of loss the remaining alleles have frequencies y=(y1,,yn). If the initial frequency of types in the population is given by x=(x0,,xn), then the probability density is h0(x,y)=k=n+1(1x0)k1x0{lZ+:|l|=k1}M(l;k1,x+/(1x0))D(y,l), where x+=(x1,,xn). The density h0(x,y) is the solution to a harmonic measure problem. Verification of this is rather technical and provides little insight into how the density was actually derived. In Section 6 we find the equivalent density in a Moran model with selection through the mixture representation of the transition functions. In fact using this approach we can obtain the additional information of the time of loss of the first allele.

Section snippets

Neutral dual process derivation

In order to understand the constructions that follow, it is useful to consider first a simpler setting. In this section we rederive (2) for a two-type Wright–Fisher diffusion. In this case we may set X1(t)=X(t) and X2(t)=1X1(t). Then the one-dimensional process {X(t),t0} has generator L=12x(1x)2x2+12(θ1|θ|x)x, where x1=x,x2=1x and |θ|=θ1+θ2. Applying the generator to test functions of the form x1k1x2k2, we obtain Lx1k1x2k2=12|k|(|k|1+|θ|)x1k1x2k2+12k1(k11+θ1)x1k11x2k2+12k2(k21+θ2)x

A Moran model with genic selection

We now turn to our Moran model with genic selection. Recall that the population consists of N individuals with types chosen from a space E and that Zi(t) denotes the number of individuals of type i at time t. An individual of type jE gives birth at rate λj0, and an individual is chosen at random to die. Mutation changes each type i individual to type j at rate μpij, where P=(pij)ijE is a transition probability matrix and μ0. If the population configuration is z, then the transition rate

Algebraic duality

The dual process to our Moran model with genic selection will be identified through consideration of the generator L acting on test functions of the form fα(z)=jEzj[αj]. The state space of the dual process is ΔN={α;αZ+E,|α|N}. Recall that in Section 2 we considered renormalized test functions, gk(x), chosen in such a way that when thought of as acting on gk(x) as a function of k, L still defined the generator of a Markov process. Here too we must modify our test functions. Mirroring what we

A transition function expansion

We now exploit our duality to find a transition function expansion for our Moran model with genic selection. Let {syz(t),y,zZ+E,|y|=|z|=N,t0} be the transition functions of {Z(t),t0} and {hαβ(t),α,βΔN,t0} be the transition functions of the dual process {L(t),t0}.

Theorem 2

Transition functions in the Moran model have a dual expansionsyz(t)=φ(z)αΔNhzα(t)H˜(yα)φ(y),for y,zZ+E,|y|=|z|=N . Here φ is the stationary distribution of the process and H˜(yα)=H(αy)φ(y)/H(α) is the posterior distribution

Harmonic measure

In this section we apply our transition function expansion to some harmonic measure problems for the Moran model with genic selection. In the Moran model suppose that there are d=n+1 types labelled 0,1,,n and there is no mutation. We calculate the joint probability density that type 0 is the first to be lost, that it is lost at time τ, and that the distribution of the surviving types at time τ+ is z.

When a type is lost from the population, the dimension of the state space of the process is

References (19)

There are more references available in the full text version of this article.

Cited by (70)

  • Approximate filtering via discrete dual processes

    2024, Stochastic Processes and their Applications
  • Dynamics of a Fleming–Viot type particle system on the cycle graph

    2021, Stochastic Processes and their Applications
View all citing articles on Scopus
View full text