A coalescent dual process in a Moran model with genic selection
Introduction
Consider a population in which each individual is labelled according to a type taken from the set . We write for the frequency of type individuals in the population at time and and model the evolution of the population using a multi-type Wright–Fisher diffusion process. In the simplest setting, there is no selection, and mutation between types is parent independent. That is, each individual mutates to type at rate , independent of its current type. The generator of the diffusion process is then where we use the notation for the sum of elements in a vector. If (meaning that for all ) the Wright–Fisher diffusion has a Dirichlet stationary distribution for and . More generally, one can allow some of the to vanish, in which case we obtain a generalized Dirichlet distribution in which the corresponding frequencies vanish with probability one. Ethier and Griffiths (1993) show that the transition distribution of the diffusion can be expressed as a mixture, where denotes the multinomial distribution, and are the transition functions of a (dual) pure death process which we denote by . This process should be thought of as evolving in backwards time. Lineages are lost through coalescence, through which at rate , and mutation, resulting in at an additional rate . We suppose that starts from infinity (although it will be finite at any ). If this dual process is the number of blocks in the famous Kingman coalescent (Kingman, 1982). The expansion (2) still holds in this case, except that now we will have for all and so the summation is over . There is an explicit expression for the transition functions, where , (Griffiths, 1980, Tavaré, 1984, Griffiths, 2006). To understand the expansion (2), one can think of the infinite number of individuals that make up as the ‘leaves’ in a forest of trees. Each tree either grows from a ‘founder’ at time (which corresponds to time zero in the diffusion process) or its root arose through a new mutation. This subdivides the leaves into ‘families’ and leads to the Dirichlet mixture. If there are founder lineages, then their types are determined by sampling individuals from the diffusion at time zero, and hence the probability that the numbers of founder lineages of types are given by with is just . Let be the relative family sizes of these founder families in the leaves of the tree, and be the frequencies of families derived from new mutations on the tree edges in . Then has a distribution. The term in (2) is obtained by combining families of individuals of the same type, corresponding to adding the parameters in the Dirichlet distribution. If the process is one of pure random drift. There is an analogous mixture representation for the transition function of a Fleming–Viot process. In that setting the finite set is replaced by an infinite type space. At each time , is now a probability measure on the type space and the Dirichlet distribution is replaced by a Poisson–Dirichlet distribution (Ethier and Griffiths, 1993). This is a canonical representation because the -type diffusion can be obtained by taking the measure that determines the type after mutation in the Fleming–Viot process to be atomic with atoms . The transition distribution (2) first appeared in Griffiths and Li (1983) and Tavaré (1984) with an interpretation based on Griffiths (1980) lines of descent. Donnelly and Tavaré (1987) discuss (2) and give a probabilistic explanation. Watterson (1984) derived an analogous representation for the distribution of old and new allele families in a neutral Moran model.
Of course one can obtain the multi-type Wright–Fisher diffusion with selection as an infinite population limit of Moran models (with weak selection). For a population of size and types from , each individual of type gives birth at rate (to an offspring of the same type) and an individual is selected at random from the population to die (thus maintaining constant population size). It is convenient to suppose that the constants are negative and we write for . In addition, mutation changes an individual of type to an individual of type at rate . If we take and let we recover the multi-type Wright–Fisher diffusion. Krone and Neuhauser (1997) and Neuhauser and Krone (1997) exploited the graphical representation of a Moran model with selection (which can be thought of as a biased voter model on a complete graph) to write down the Ancestral Selection Graph (ASG). This graph is traced out by a branching–coalescing system of lineages and has embedded within it the genealogy of a random sample from the population. Passing to a weak selection limit they obtain the same duality in the diffusion setting. In that limit, if there are currently edges in the graph then (through coalescence) at rate and at rate . These branching events correspond to ‘potential selective events’. In order to extract the genealogy of a sample of size , one starts with edges in the graph and traces back until the first (almost surely finite) time when there is only one edge. This is the ‘ultimate ancestor’. The type of the ultimate ancestor is chosen (by sampling from the population at that time) and then one works back through the graph using the rule that ‘the fitter type always wins’ whenever one arrives at a point corresponding to a branching event. This allows us to prune the graph to recover the genealogical tree (and the types in the sample). Mano (in press) found an explicit, though complicated, expression for the transition functions of the number of ancestors in the ASG as we trace backwards in time by considering the ASG as a moment dual in a two-allele Wright–Fisher diffusion process with selection and without mutation. Donnelly and Kurtz (1999) use their ‘modified lookdown’ approach to construct, simultaneously, the Fleming–Viot process with selection and the ancestral selection graph that encodes the genealogy. Stephens and Donnelly (2002) and Fearnhead (2002) consider the case when the types of individual in the sample are known and construct an ASG with ‘typed’ lines. Stephens and Donnelly (2002) deal with general diploid selection (in an infinite population limit) whereas Fearnhead (2002) considers the genic selection that interests us here. His results too are valid in the infinite population limit. Both papers deal only with parent-independent mutation. The transition rates in their typed ASGs are similar to those of (24) in this paper once we specialize to a diffusion model with genic selection and parent-independent mutation.
Barbour et al. (2000) derive a transition density expansion for a Wright–Fisher diffusion with genic selection in terms of the transition functions of a dual process. The generator of the diffusion process with selection coefficients is where . The transition distribution in the diffusion can be written as a mixture, Here is a weighted Dirichlet distribution whose density is weighted by . In general, if has a Dirichlet distribution we write for the normalizing constant in . The functions are transition functions of a multi-type birth and death process started from an infinite number of individuals whose types have frequencies . (In fact, showing that one can construct the birth–death process from this entrance boundary at infinity is rather involved.) The non-zero entries in the th row of the matrix for the multi-type birth and death process are As usual we have used to denote the vector with zero entries in all but the th slot, where there is a 1. In contrast to the neutral case, here no closed-form expression is known for the transition functions . Moreover, although there must be a connection with the ASG, the probabilistic interpretation of the dual process is not immediately clear.
Our first goal in this paper is to derive a branching and coalescing dual process for a Moran model describing individuals, with types chosen from a (possibly infinite) space , undergoing genic selection and a general Markov mutation scheme. The derivation is entirely algebraic, but has a clear probabilistic interpretation which we provide through consideration of the graphical representation of the Moran model. This relates closely to previous work of Stephens and Donnelly (2002) and Fearnhead (2002), but our purpose is different. Here we apply the dual to provide an expansion analogous to (4) for the transition functions of both the Moran model and the diffusion obtained under the weak selection limit and, furthermore, when the mutation mechanism is such that types can be lost from the population, to provide a new line of attack for the ‘harmonic measure problem’.
We suppose that each individual of type mutates to type at rate , where is a transition probability matrix. We write for the number of individuals of type at time and . The duality derived in Section 4.1 is based on factorial moments of the Wright–Fisher diffusion. These are defined through Here, and throughout, we adopt the standard notation The algebraic derivation that we present only provides a weak duality between the Moran model and the multi-type branching and coalescing dual. That this can be extended to a strong duality is seen through the graphical representation of the Moran model in Section 4.2. This direct probabilistic interpretation of the dual process contrasts with the situation in Barbour et al. (2000), where such a direct interpretation of the dual seems to be difficult. In Section 5, the transition functions of will be shown to have a mixture expansion in terms of the transition functions of the dual process. Specializing to parent-independent mutation and passing to the limit as we then recover Eq. (4), the transition function expansion of the corresponding diffusion process of Barbour et al. (2000).
In models where the mutation scheme is such that types can be lost from the population, it is natural to ask about the way in which types are lost and the probability of fixation of a particular type. Questions of this type can be difficult to answer in multi-type diffusion models, especially in the presence of selection. In a neutral population with types labelled and no mutation (so that the Wright–Fisher diffusion is one of pure random drift), Ethier and Griffiths (1991) find the probability density that the allele labelled 0 is the first to be lost and that at the time of loss the remaining alleles have frequencies . If the initial frequency of types in the population is given by , then the probability density is where . The density is the solution to a harmonic measure problem. Verification of this is rather technical and provides little insight into how the density was actually derived. In Section 6 we find the equivalent density in a Moran model with selection through the mixture representation of the transition functions. In fact using this approach we can obtain the additional information of the time of loss of the first allele.
Section snippets
Neutral dual process derivation
In order to understand the constructions that follow, it is useful to consider first a simpler setting. In this section we rederive (2) for a two-type Wright–Fisher diffusion. In this case we may set and . Then the one-dimensional process has generator where and . Applying the generator to test functions of the form , we obtain
A Moran model with genic selection
We now turn to our Moran model with genic selection. Recall that the population consists of individuals with types chosen from a space and that denotes the number of individuals of type at time . An individual of type gives birth at rate , and an individual is chosen at random to die. Mutation changes each type individual to type at rate , where is a transition probability matrix and . If the population configuration is , then the transition rate
Algebraic duality
The dual process to our Moran model with genic selection will be identified through consideration of the generator acting on test functions of the form The state space of the dual process is Recall that in Section 2 we considered renormalized test functions, , chosen in such a way that when thought of as acting on as a function of , still defined the generator of a Markov process. Here too we must modify our test functions. Mirroring what we
A transition function expansion
We now exploit our duality to find a transition function expansion for our Moran model with genic selection. Let be the transition functions of and be the transition functions of the dual process .
Theorem 2 Transition functions in the Moran model have a dual expansionfor . Here is the stationary distribution of the process and is the posterior distribution
Harmonic measure
In this section we apply our transition function expansion to some harmonic measure problems for the Moran model with genic selection. In the Moran model suppose that there are types labelled and there is no mutation. We calculate the joint probability density that type 0 is the first to be lost, that it is lost at time , and that the distribution of the surviving types at time is .
When a type is lost from the population, the dimension of the state space of the process is
References (19)
Lines of descent in the diffusion approximation of neutral Wright–Fisher models
Theoret. Popul. Biol.
(1980)- et al.
Simulating allele frequencies in a population and the genetic differentiation of populations under mutation pressure
Theoret. Popul. Biol.
(1983) The coalescent
Stochastic Process. Appl.
(1982)- et al.
Ancestral processes with selection
Theoret. Popul. Biol.
(1997) Line-of-descent and genealogical processes, and their application in population genetics models
Theoret. Popul. Biol.
(1984)- et al.
A transition function expansion for a diffusion model with selection
Ann. Appl. Probab.
(2000) - et al.
Genealogical processes for Fleming–Viot models with selection and recombination
Ann. Appl. Probab.
(1999) - et al.
The population genealogy of the infinitely-many neutral alleles model
J. Math. Biol.
(1987) - et al.
Harmonic measure for random genetic drift
Cited by (70)
On multi-type Cannings models and multi-type exchangeable coalescents
2024, Theoretical Population BiologyApproximate filtering via discrete dual processes
2024, Stochastic Processes and their ApplicationsThe ancestral selection graph for a Λ-asymmetric Moran model
2024, Theoretical Population BiologyA nearly-neutral biallelic Moran model with biased mutation and linear and quadratic selection
2021, Theoretical Population BiologyDynamics of a Fleming–Viot type particle system on the cycle graph
2021, Stochastic Processes and their ApplicationsMaximum likelihood estimators for scaled mutation rates in an equilibrium mutation–drift model
2020, Theoretical Population Biology