The fixation probability of two competing beneficial mutations

https://doi.org/10.1016/j.tpb.2010.04.001Get rights and content

Abstract

Suppose that a beneficial mutation is undergoing a selective sweep when another beneficial mutation arises at a linked locus. We study the fixation probability of the double mutant, i.e., one (produced by recombination) that carries both mutations. Previous analysis works well for the case where the earlier beneficial mutation confers a greater selective advantage than the later mutation, but not so well in the opposite case. We present an approach to approximating the fixation probability in the case where the later mutation confers a greater selective advantage.

Introduction

The question of what limits the rate of adaptation is important both for understanding whether species approach this limit in nature, and for the practical task of maximising the response to artificial selection (Barton, 1995). In an asexual population, different beneficial mutations can only fix if they arise sequentially on the same lineage, otherwise one eventually outcompetes the others. In a sexually reproducing population the rate of adaptation can be faster because, through recombination, beneficial mutations that arise in different genetic backgrounds can be combined into a single individual, resulting in an evolutionary advantage for recombination. This is commonly known as the Hill–Robertson effect and was first quantified by Hill and Robertson (1966). In spite of the presence of recombination, at least for certain parameter ranges, interference between different beneficial mutations can still substantially reduce their probability of fixation. Although it is well known that selection at one locus can reduce the efficacy of selection at other (linked) loci, quantitative results are rather incomplete. A full mathematical analysis of the competition between multiple beneficial mutations in a sexually reproducing population and hence the limits on the rate of adaptation is out of reach, but here we take a first step in that direction by considering just two competing mutations. The results we present in this work will be relevant for populations that are large enough that clonal interference involving two mutations does occur, but not so large that it is likely for clonal interference to involve more than two mutations. In addition, the range of population sizes to which this work is relevant is also determined by the distance along the chromosome between beneficial mutations and the strengths of their effects.

We shall suppose that one favoured mutation, which confers a selective advantage s1, say, to a carrier is well established in the population when a new mutation at a different locus ‘nearby’ (in a sense to be made precise a little later) on the same chromosome gives rise to a second favoured mutation, with selective advantage s2. Selective advantage is taken to be multiplicative, which can be approximated to be additive for small selection coefficients. The case s1>s2 is fairly well understood, but the case s2>s1 is mathematically far more challenging and leads to some rather different effects. Let us write r for the probability per individual per generation of a recombination event between the two loci. In the case when s1>s2, Barton (1995) shows that the probability of fixation of the allele carrying both beneficial mutations depends on just two parameters, the ratio s2/s1 and the scaled recombination rate r/s1. The key observation arising from our study is that for s2>s1 this is most definitely not so: in this case the quantity Nr plays a key role, in particular, population size matters.

Simulation results in Fig. 1, Fig. 2 show that the fixation probability of the allele with favoured mutations at both loci depends on both N and r. In Fig. 1, we hold N fixed and vary r, whereas in Fig. 2, we hold r fixed and vary N. Fig. 1 shows simulation results with two sets of selection coefficients: s1=0.02,s2=0.016 and s1=0.016,s2=0.02. For now we shall refer to those individuals carrying just the first beneficial mutation as type 1 and those carrying neither beneficial mutation as type 0. The second mutation is assumed to arise at a random time during the sweep of the first mutation. If at that time U individuals out of the population of size 2N carry the first mutation, then the second mutation has probability U/2N of falling on a type 1 individual. If the second mutation falls on a type 1 individual, which has been isolated by the dotted lines in Fig. 1(a), we see that the fixation probability of the double mutant allele is little changed as r increases. The increase in the fixation probability of the double mutant as r increases results from the case of the second mutation falling on a type 0 individual, which has been isolated in Fig. 1(b). In this case, if 2Nr is less than roughly 1, interference between the loci significantly affects the probability that both beneficial mutations fix. Furthermore, the range of r that yields significant reduction of fixation probability is quite different for the two cases of s1<s2 and s1>s2. In the latter case, r only needs to be less than O(s1) for reduction of fixation probability to occur. In this work, we present an algorithm for finding the fixation probability in the case of s1<s2.

Before stating our model and results more precisely, let us place our work in context. We consider a dioecious population. Suppose first that a selected locus and a neutral locus are tightly linked so that r is small. Then a selective sweep at the first locus will reduce the variability at the second locus, even though that locus is itself selectively neutral, since the frequency of the allelic type that happened to be on the chromosome on which the beneficial mutation originally arose is boosted. This genetic hitchhiking effect was first analysed mathematically by Maynard Smith and Haigh (1974) (see Barton, 2000, for a review of some more recent developments). Gillespie (2000) introduced the pseudo-hitchhiking model which provides a setting where the effect of repeated hitchhiking events on genetic variation at a neutral locus can be analysed. He suggests this as a possible explanation for the low variation in polymorphism between species (in contrast to the high variation in the census numbers that govern neutral evolution).

In Gillespie (2001), the pseudo-hitchhiking model is extended to include the effects of repeated substitutions at a strongly selected locus on a linked weakly selected locus. Here linkage is complete (r=0) and the selective sweeps are non-overlapping. As before there is little variation with changing population size.

The data analysis of Bazin et al. (2006) provides considerable support for Gillespie’s genetic draft as a model of the evolution of mitochondrial DNA. On the other hand, they also remark that although the diversity in nuclear DNA (which does experience recombination) does not show the linear dependence on population size that a neutral theory of evolution predicts, nonetheless there is a marked correlation between diversity and population size. For example diversity in invertebrates is higher than in vertebrates.

Gillespie’s analysis is based on investigating the effect of a sweep at a strongly selected locus on the probability of fixation of a completely linked weakly selected allele. He assumes that the difference in the selective advantages at the two loci is so great that the strongly selected locus will not be affected by the weakly selected one. (See Birky and Walsh, 1988, for related work.) However, if there is a high rate of selective substitutions (perhaps following some environmental change) and recombination rates are low then we must also understand interference between two mutations with comparable selection coefficients arising at different loci and that is our purpose here.

If there is no recombination then the problem is one of competitive exclusion. This is the situation studied by Gerrish and Lenski (1998), Wilke (2004), Rouzine et al. (2007) and Yu et al. (in press), amongst others. These papers study the rate of adaptation, i.e. the expected speed of increase of the mean fitness of the population, when mutations accumulate in a large asexual population. Kim and Stephan (2003) extend the analysis in Gerrish and Lenski (1998) to a case where the loci are partially linked. They also performed extensive simulations in the presence of recombination and with multiple competing selected loci. Kim (2006) investigates the effect of recurrent strong directional selection at one locus on weak selection at another tightly (but not completely) linked locus, extending the work of Gillespie (2000).

The most substantive analytic work on interference between sweeps in the presence of recombination can be found in Barton (1995) and Otto and Barton (1997). Before describing that, let us recall some facts about isolated sweeps, about which a great deal is known (see e.g., Chapter 5 Ewens, 1979). Suppose that a single advantageous allele with selective advantage s1 arises in an otherwise neutral population of size 2N at time 0. If s11Ns1, then the proportion of chromosomes carrying this allele at time t is well approximated by the solution, {P(t)}t0, to the stochastic differential equation dP=s1P(1P)dt+12NP(1P)dW, where {W(t)}t0 is a standard Wiener process and P(0)=1/(2N) (Ethier and Kurtz, 1986, Eq. 10.2.7). With one chromosome in the initial population possessing the advantageous allele, the probability that this mutation ultimately fixes, i.e. P hits 1, is 1e2s11e4Ns12s1 for small but fixed (with respect to N) s1 and large N. Let P̃ be the process P conditioned to hit 1 and T̃fix=inf{t0:P̃(t)=1}, then dP̃=s1P̃(1P̃)coth(Ns1P̃)dt+12NP̃(1P̃)dW, from which we can calculate the expected duration of the sweep (see e.g., Etheridge et al., 2006, Section 4.1): E[T̃fix]=2s1log(2Ns1)+O(1s1) and the variance var[T̃fix] is O(1/s12). Thus the duration of the sweep is approximately 2log(2Ns1)/s1 generations. An analogous Green’s function calculation to that leading to (3) gives the following. The time 2NP̃ and 2N(1P̃) spends at values less than O(1/s1) is O(1/s1). For 1/(2Ns1)ϵ<1/2, it takes approximately log(2Ns1ϵ)/s1 generations to increase from 1/(2N) to ϵ, and to increase from 1ϵ to 1. On the other hand, the time for P̃ to increase from ϵ to 1ϵ is only O(1/s1). Therefore if the population size is large, then during almost all the timecourse of the sweep, the value of P̃ is near 0 or 1. During the first half (in terms of time) of the sweep, when P̃ is small, we can nevertheless expect 2NP̃ to spend most of the time at levels O(1/s1). Similarly, 2NP̃ spends most of the time at levels 2N during the second half of the sweep.

Now suppose that a beneficial mutation with selection coefficient s2 arises at a second (linked) locus. As discussed above, if P̃ is bounded away from 0 or 1, which is the case during almost the entire timecourse of the sweep for large N, the behaviour of (2) can be well approximated by the corresponding deterministic ODE: dPˆdt=s1Pˆ(1Pˆ)coth(Ns1Pˆ). The time when the second mutation arises can be approximately taken to be uniformly distributed in the period [T1,T2] when Pˆ increases from 1/2N to 11/2N. The new mutation can arise on one of two genetic backgrounds 1 and 0, denoting those chromosomes carrying the original beneficial mutation and those not, respectively. The new mutation falling on background 1 gives rise to an individual carrying both mutations, which we refer to as type 11. Let Q(0),U and Q(1),U denote, respectively, the fixation probability of type 11 if the second mutation arises on the 0 background and the 1 background, at a time when the number of individuals in the 0 background is U. Then the fixation probability, when averaged over all possible times when the second mutation arises, can be approximated by T1T2((1Pˆ(t))Q(0),Pˆ(t)+Pˆ(t)Q(1),Pˆ(t))dt. The quantity Q(1),Pˆ(t) depends mostly on whether type 11 becomes established during early stages of its sweep and has little to do with the effect of recombination as shown by simulation results in Fig. 1. In particular, if one assumes a Moran model similar to the one we describe in Section 2, then the number of type 11 individuals can be approximated by a branching process where each individual gives birth to an additional individual at rate 1+(s1+s2)(1Pˆ(t))+s2Pˆ(t) and dies at rate 1(s1+s2)(1Pˆ(t))s2Pˆ(t). Then Q(1),Pˆ(t) is approximately the survival probability of this branching process, which is approximately 2(s1+s2s1Pˆ(t)).

More interesting things happen if the second mutation falls on background 0. We shall refer to the resulting chromosome as type 01, the other two types in the population being type 10 and type 00. From now on, we focus on how to calculate Q(0),U, the fixation probability of type 11 in this case, and we regard U, the number of type 10 individuals right after the second mutation arises, as known. In order for both mutations to sweep through the population in this case, it is necessary for recombination to produce a type 11 individual. What then is the probability that type 11 sweeps through the population, i.e. both advantageous mutations become fixed? There are three possible scenarios: i., the new mutation dies out and the first mutation (which is already established in the population) eventually sweeps to fixation; ii., the new mutation becomes established and eventually sweeps to fixation, displacing the first mutation; and iii., an individual carrying the second mutation recombines with one carrying the first mutation to produce a double mutant, type 11, which eventually sweeps to fixation. In principle there is a fourth possibility, that both beneficial mutations are lost, but since the first is already established this has negligible probability. If s1s2, then in both scenarios i. and ii., once a beneficial mutation is fixed, we have an increase of s1s2 in the mean fitness of the population. From the point of view of mean fitness of the population, this is as if the second mutation has not arisen at all. In scenario iii., we have an increase of s1+s2 to the mean fitness of the population. Therefore the fixation probability of type 11 determines the extent to which the two mutations interfere with each other to impede the rate of increase in mean fitness (that is the rate of adaptation) in the population. But the problem of finding the fixation probability of type 11 is complicated by the fact that through the action of selection and recombination the four types of individual (classified by which of the advantageous mutations they carry) interact with each other in a nonlinear way.

We shall write Xij(t) (where i,j{0,1}) for the proportion of the population at time t after the introduction of the second beneficial mutation which are type i at the original selected locus and type j at the second selected locus, where ‘type 1′ always means ‘carries the favourable mutation’. Barton (1995) approximates the proportion of the population carrying just the first beneficial mutation by taking N= in (1). The result is the logistic curve, X10(τ)=1/(1+exp(s1τ)), where his time origin τ=0 is taken to be when X10 makes up half the population and, with this convention, the second mutation can arise at negative times. He assumes that the growth of X10 will be unimpeded by the arrival of the new mutation. Type 11 individuals will be produced by recombination and he then uses a branching process approximation to estimate the probability that they become established (and therefore with high probability will become fixed). This approach leads to two coupled partial differential equations for the fixation probability of the second beneficial mutation, indexed according to whether it first appears on the type 0 or type 1 background. In particular, his approximation is independent of population size. He quantifies the extent to which the fixation probability of the second mutation is increased when it arises on a type 1 background, producing a type 11 individual, but decreased when it arises on a type 0 background and by taking a weighted average of the two (integrated over the possible times of arrival of the second mutation) quantifies the interference between the two sweeps. In the case when s1>s2, his findings are supported by the simulation studies in Otto and Barton (1997). Even for weakly linked loci, e.g., for r/s1 as big as 0.1, he sees substantial interference. Only when r/s11 does the effect become negligible. By contrast, we shall see that in the case when the second mutant confers a greater selective advantage than the first, the range of r for which interference is appreciable decreases as the inverse of population size and for tightly linked loci the probability that both mutations eventually fix can be heavily dependent on population size.

As can be seen from simulations results shown in Fig. 2, this dependence is heaviest if the later mutation appears during the first half (in terms of time) of the sweep of the earlier mutation. In this case, if N is large, then with high probability, it arises on the type 0 background. Our discussion of the single sweep reveals that when the new mutation arises we should expect the original mutation to be present in a large number of copies, but to only form a small proportion of the whole (large) population; that is 1/(2N)U/(2N)1. Simulation results in Fig. 2 shows that if 2Nr is less than about 1, one expects to see a significant reduction in the fixation probability of type 11.

By contrast, if N is large and the time this new mutation arises during the second half of the timecourse of the sweep, then it is most likely to arise on the 1 background, since almost the entire population carries the original beneficial mutation. In this case, the probability of fixation of the s2-mutation is almost the same as if it had arisen in an otherwise neutral population, which is not heavily dependent on the population size. As the population size increases, one expects more mutations to arise and hence also more occurrences of both type 01 and type 11 births, as well as mutations at other loci. In this work, however, we focus on the case of only a single mutation event at a second (not third, fourth, etc.) locus that arises during the sweep of the first mutation. See Section 4 for a brief discussion on the difficulties we may encounter if we try to extend the methods we present in this work to scenarios with three or four loci.

The case s2=s1 has been studied (using simulations) by Kim and Stephan (2003). They found that interference between the two beneficial mutations causes a reduction in their fixation probabilities (and also slightly reduces the hitchhiking effect at a linked neutral locus), but this Hill–Robertson effect is only strong when recombination rates are relatively low. All their simulations take an effective population size Ne=104. Instead of s1 being exactly the same as s2, it is more likely that s1 and s2 are closely matched but not exactly equal. Our results reveal that if s2>s1, then the rate of recombination at which the effect becomes significant is dependent upon effective population size. Otherwise, i.e. if s2<s1, the parameter range of the rate of recombination where we see a strong Hill–Robertson effect is not sensitive to population size.

We can give an intuitive explanation for this dichotomy. If s2<s1, then X10 will grow to almost 1 almost immediately after the second mutation arises and, as a result, X01 cannot grow to much larger than O(1/N). With O(1) individuals of type 01, a number roughly O(r) of recombinants can be produced before type 10 finishes its sweep, consequently the fixation probability of type 11 does not have a strong dependence on population size. On the other hand, for s2>s1, once (and if) type 01 gets established in the population, which occurs with positive probability, X01 is destined to start a sweep itself, displacing both type 10 and type 00, since type 01 is fitter than both these two types. Therefore X01 and X10 are both O(1) during a nontrivial period of time (O(1/s)) and the total number of recombinants produced is roughly O(Nr/s). Consequently the fixation probability of type 11 depends strongly on population size. In particular, the assumption that breaks down in the derivation of Barton (1995) when s1<s2 is that X10 simply grows logistically after the arrival of the second mutation. Instead, there is a complex (albeit almost deterministic) interplay between X10 and X01. Our analysis must establish the probability of a recombination of a type 10 and a type 01 individual to produce a type 11 and the subsequent probability of fixation of the descendants of this doubly favoured individual.

The rest of this paper is organised as follows. In Section 2 we set up a continuous time Moran model for the evolution of our population. The choice of a Moran model over a Wright–Fisher one is a matter of mathematical convenience; one would expect similar results for a Wright–Fisher model. In Section 3, we outline our approach to calculating the fixation probability of the type 11, assuming the second beneficial mutation arises in a 0 background and s1<s2. Finally, in Section 4, we compare calculations obtained using our method and Barton’s method and simulation results.

Section snippets

Moran model

We describe two closely related models for the evolution of types at the two selected loci in a population of size 2N, corresponding to a diploid population of size N. The first model has recombination and resampling mechanisms, whereas in the second model, the recombination mechanism is replaced by a gene conversion mechanism that will slightly simplify our notation. The algorithm of calculating the fixation probability of type 11 that we describe in Section 3 will be identical for both

Fixation probability for the double mutant

We now describe a method of calculating p11,fix, the fixation probability of the double mutant for the gene conversion model given in Section 2. We focus on the γ<1 case. The approximations we use below are non-rigorous, but yield good estimates for the fixation probability when 2N is moderately large and γ is not too small, as demonstrated by simulation results in Section 4. Here, moderately large and not too small mean that the values of N and γ can facilitate small enough choices (e.g.,

Results and discussion

We illustrate in Fig. 3 a typical run of step 3 of the algorithm we described in Section 3. The trajectory of F11,arise(τ) is slightly delayed as compared to that of F11,est(τ), because it takes a bit of time for the number of the descendants of a type 11 individual born due to recombination to grow to 2Nδ11. Nevertheless, both F11,est(τ) and F11,arise(τ) converge to almost the same limit as t becomes large.

Fig. 4 compares fixation probabilities obtained using simulation and calculations

Acknowledgments

The authors are grateful to Nick Barton for posing the initial problem and for valuable discussions throughout this research. In addition, we thank two anonymous referees for valuable advice and suggestions. FY supported by EPSRC/GR/T19537 while at the University of Oxford.

References (18)

  • N.H. Barton

    Linkage and the limits to natural selection

    Genetics

    (1995)
  • N.H. Barton

    Genetic hitchhiking

    Philos. Trans. R. Soc. Lond. Ser. B

    (2000)
  • E. Bazin et al.

    Population size does not influence mitochondrial genetic diversity in animals

    Science

    (2006)
  • C.W. Birky et al.

    Effects of linkage on rates of molecular evolution

    Proc. Natl. Acad. Sci.

    (1988)
  • A.M. Etheridge et al.

    An approximate sampling formula under genetic hitchhiking

    Ann. Appl. Probab.

    (2006)
  • Stewart N. Ethier et al.
  • W.J. Ewens

    Mathematical Population Genetics

    (1979)
  • P.J. Gerrish et al.

    The fate of competeing beneficial mutations in an asexual population

    Genetica

    (1998)
  • J.H. Gillespie

    Genetic drift in an infinite population: the pseudohitchiking model

    Genetics

    (2000)
There are more references available in the full text version of this article.

Cited by (12)

View all citing articles on Scopus
View full text