The distribution of surviving blocks of an ancestral genome
Introduction
We now have a good theoretical understanding of the evolution of single genes subject to random genetic drift: the coalescent process describes the ancestry of samples of genes, whilst diffusion approximations can be used to describe the evolution of whole populations. In principle, these methods can be extended to several genetic loci, but in practice they become intractable beyond a few loci. This is because the number of genotypes that must be tracked increases exponentially with the number of loci, and because of the nonlinearities introduced by selection and recombination. These difficulties pose a serious problem for the analysis of DNA sequence data. Typically, the mutation and recombination rates in sexually reproducing organisms are of the same order of magnitude, and so it is usually not feasible to analyse short segments of genome as if they reproduced asexually. In any case, the patterns of association across the genome are fundamental to detecting the action of selection, and inferring the genetic basis of phenotypic variation.
A natural way to describe the evolution of the genome is to represent it in terms of blocks which have different ancestry; an infinite number of loci can then be represented by a finite number of variables. Fisher (1953) first introduced this approach, and observed that the junctions between blocks with different ancestry behaved like discrete Mendelian loci. The approach received little attention until the availability of DNA sequences made it more directly applicable to data. Early work concentrated on two areas: the distribution of identity by descent under inbreeding (Sved, 1971; Stam, 1980; Franklin, 1977), and the genetic contribution of an individual to some particular descendant (Donnelly, 1983; Bickeboller and Thompson, 1996; Stefanov, 2000). Here the ‘pedigree’ is the tree of all descendants of the individual, irrespective of whether they carry any of its genetic material. In particular, Donnelly (1983) has shown that the probability that an individual contributes no genes to this descendant is approximately exp(−tR/2t), where R is the map length and t is the time.
Recent work has focussed on the process backwards in time. Hudson (1990) extended the coalescent process to allow for recombination along continuous genomes; tracing the ancestry of several genomes back in time, the number of ancestral lineages may decrease through coalescence of different lineages, or may increase if one genome descends from two parental genomes via a recombination event which generates a new pair of junctions. This process of coalescence with recombination has been used extensively in simulations, and for fitting models to data using Monte Carlo methods. However, although the process is simple to define, analytical results have been hard to obtain (though see Wiuf and Hein 1997, Wiuf and Hein 1999; Derrida and Jung-Muller, 1999; Simonsen and Churchill, 1997; Griffiths and Tavare, 1996; Griffiths, 1999). One difficulty is that ancestral genomes typically carry several blocks of genome which are ancestral to the present-day sample; the joint location of multiple blocks must be followed.
Here, we consider a simpler, but closely related process: the descent of one particular genome forwards in time. We can divide the problem into two parts. First, we can follow descendants regardless of whether they actually carry any of the genetic information in question. These descendants form a branching tree, which we refer to as the pedigree. Second, we follow the descent of genetic material down each lineage in this pedigree. In a sexual population at equilibrium, the expected number of descendants doubles every generation. However, segregation is expected to halve the amount of ancestral material passed down each lineage, and so, on average, the genetic contribution of any one genome remains constant. We are concerned with the stochastic fluctuations caused by random reproduction, and random recombination.
The paper is laid out as follows. In Section 2 we present a preliminary mathematical analysis of our model. In particular, we find the probability that at least some part of a single genome will survive after t generations. We set out exact recursions for this probability, and in Section 2.1, in the case of a Poisson offspring distribution, find a simple and accurate approximation to it. In particular, we are able to show that if the population is not growing, a section of genome is certain to be lost, but that the rate of loss is very slow (logarithmic). In Section 2.2, we extend our analysis to consider the moments of the distribution of block sizes. This is achieved by superimposing the effects of recombination (which ‘erodes’ the blocks of ancestral genome) on a ‘pedigree’. The approach gives exact expressions for the moments, but these rapidly become very cumbersome as the order of the moments increases. In Section 3 we use simulations to illustrate the behaviour of the process. We return to determining the distribution of block sizes in Section 4. There, we show how to recover full information for the sizes of blocks of ancestral genome surviving at time t via the same recursion that gave us the loss probability, but starting from a different initial condition. In Appendix C this is extended further to retain also the positions of surviving ancestral blocks on the genome. In passing, in Section 4.1 we also note that a simple modification gives the distribution of blocks associated with some marker locus—for example, with a new mutation. In principle, differentiation would give us complete information about the moments of the distribution of block sizes, but in practice this is not tractable and so in Section 4.2 we propose an alternative approach. This involves taking a continuous approximation to the process that encodes the numbers and sizes of blocks of ancestral genome alive at time t. (In the mathematical literature such an approximation is called a superprocess.)
With these results in hand, we will be in a position to consider the robustness of the method to deviations from the two key assumptions: that there is a branching process within a large population, and that there is only a single crossover per generation. In Section 6 we show that the assumption of a single crossover is not really restrictive: even in a model with multiple crossovers, after only a few tens of generations individuals typically carry a single small block in which at most one crossover is likely. Of course, because in a sexual population at equilibrium on average each individual produces two offspring, related individuals do soon interbreed. However, in Section 5 we illustrate the surprising accuracy of the branching process assumption for populations of moderate size. In 7 An example, 8 Discussion we discuss the implications of this work and, as an example, consider the probability of survival of a single human genome that lived in the distant past. The detailed mathematical proofs are presented in the appendices.
Section snippets
The model
Consider a genome of map length y, and with complete interference between crossovers. With probability y, there is one crossover. This ensures that offspring inherit at most a single block. Note that we do not assume that y is small, although necessarily y⩽1. Each genome produces offspring by mating with an unrelated individual. For most of what follows we assume that the number of offspring is Poisson distributed, with mean 2(1+S). (Results would be essentially the same for any distribution
Simulations
Before returning to mathematical analysis, we pause to see, via simulations, what the process really looks like. Fig. 5 shows two random realisations of the process, for a genome of length y=1, after 50 generations. In the left panel, there are 250.77=3.43×1015 descendants in the pedigree (1.7 times the expected 250). Of this enormous number of descendants, only 131 carry any material from the ancestor (again, rather more than the expected 1+yt=51). The total amount of ancestral material,
Generating functions
The explicit results that we have given so far only allow us to calculate the mean number of blocks of a given size (or more precisely with sizes in a given range). In fact, we now recover the Laplace transform of the whole block-size distribution from a modification of recursion (2). This will encode enough information to give the full joint distribution of numbers of blocks of different sizes. We shall use it to write down the moment generating function for various quantities of interest.
Robustness of the branching process approximation
Simulations of finite populations (Baird 1995a, Baird 1995b) suggest that the branching process approximation for the extinction probability remains good even up to the point at which individual genes are likely to be fixed (t∼N; Fig. 8). By this time, the total number of individuals carrying some of the original block form a significant portion of the population. At first sight this is surprising, because long before this time (by ∼log2(N) generations), almost all individuals are pedigree
Long genetic maps
The analysis thus far has depended on the assumption that there is only a single crossover per generation, so that individuals carry at most one block. This is reasonable for small segments, but makes it impossible to describe long genomes. We now examine how the results here can be extended to the case when we cannot ignore the possibility of multiple crossovers.
Our approach will once again be to superimpose the effects of recombination on a pedigree. We shall see that the ancestral block is
An example
In this section, we consider the fate of a long genome. As an example, we choose a genetic map of the same length as the human genome: R=35.67 Morgans for autosomes (averaged over the sexes and ignoring sex linkage; Broman et al., 1998; Kong et al., 2002). Our model of uniformly distributed crossovers does not capture the actual distribution of the genome across chromosomes with interference between crossovers. However, we will see that the qualitative outcome is not sensitive to such details.
Discussion
We have considered the fate of a single genome which is passed down through a very large sexual population. If there is at most one crossover per generation, then each descendant will carry only a single block of ancestral genome. For this case, we derive an extremely accurate approximation for the probability that at least some of the original genome will survive. In a population of constant size, and with no selection, eventual loss is inevitable, and yet extremely slow. For example, a block
Acknowledgements
Much of this research was carried out while A.M.E. was visiting the University of Edinburgh. She thanks everyone there for their hospitality. N.H.B. thank the University of California, Davis and N.H.B. and A.M.E. thank EPSRC and BBSRC for financial support (BBSRC MM109726 and EPSRC Advanced Research Fellowship GR/A90923/01). Finally, we thank Toby Johnson for his help with the artwork.
References (38)
- et al.
Comprehensive human genetic mapindividual and sex-specific variation in recombination
Am. J. Hum. Genet.
(1998) The probability that related individuals share some section of genome identical by descent
Theor. Pop. Biol.
(1983)The distribution of the proportion of the genome which is homozygous by descent in inbred individuals
Theor. Pop. Biol.
(1977)The time to the ancestor along sequences with recombination
Theor. Pop. Biol.
(1999)- et al.
Monte Carlo inference methods in population genetics
Math. Comput. Modelling
(1996) - et al.
Ancestral processes with selection
Theor. Pop. Biol.
(1997) - et al.
A Markov chain model of coalescence with recombination
Theor. Pop. Biol.
(1997) Linkage disequilibrium and homozygosity of chromosome segments in finite populations
Theor. Pop. Biol.
(1971)- et al.
Recombination as a point process along sequences
Theor. Pop. Biol.
(1999) - et al.
Branching Processes
(1972)
The mixing of genotypes in hybrid zonesa simulation study of multilocus clines
Evolution
Genetic hitch-hiking
Philos. Trans. R. Soc. London (B)
Recurrence times for the Ehrenfest model
Pacific J. Math.
The probability distribution of the amount of an individual's genome surviving to the following generation
Genetics
The effect of population history on the lengths of ancestral chromosome segments
Genetics
The genealogical tree of a chromosome
J. Stat. Phys.
Über zwei bekannte Einwände gegen das Boltzmannsche H-Theorem
Phys. Z.
Cited by (42)
Donnelly (1983) and the limits of genetic genealogy
2020, Theoretical Population BiologySpatial Sorting Unlikely to Promote Maladaptive Hybridization: Response to Lowe, Muhlfeld, and Allendorf
2015, Trends in Ecology and EvolutionConvergence of genealogies through spinal decomposition with an application to population genetics
2023, Probability Theory and Related FieldsWidespread introgression of MHC genes in Iberian Podarcis lizards
2023, Molecular Ecology