The distribution of surviving blocks of an ancestral genome

doi:10.1016/S0040-5809(03)00098-4

Theoretical Population Biology

Volume 64, Issue 4, December 2003, Pages 451-471

https://doi.org/10.1016/S0040-5809(03)00098-4 Get rights and content

Abstract

What is the chance that some part of a stretch of genome will survive? In a population of constant size, and with no selection, the probability of survival of some part of a stretch of map length y<1 approaches $y/ log (yt/2)$ for $log (yt)⪢1$ . Thus, the whole genome is certain to be lost, but the rate of loss is extremely slow. This solution extends to give the whole distribution of surviving block sizes as a function of time. We show that the expected number of blocks at time t is 1+yt and give expressions for the moments of the number of blocks and the total amount of genome that survives for a given time. The solution is based on a branching process and assumes complete interference between crossovers, so that each descendant carries only a single block of ancestral material. We consider cases where most individuals carry multiple blocks, either because there are multiple crossovers in a long genetic map, or because enough time has passed that most individuals in the population are related to each other. For species such as ours, which have a long genetic map, the genome of any individual which leaves descendants (∼80% of the population for a Poisson offspring number with mean two) is likely to persist for an extremely long time, in the form of a few short blocks of genome.

Introduction

We now have a good theoretical understanding of the evolution of single genes subject to random genetic drift: the coalescent process describes the ancestry of samples of genes, whilst diffusion approximations can be used to describe the evolution of whole populations. In principle, these methods can be extended to several genetic loci, but in practice they become intractable beyond a few loci. This is because the number of genotypes that must be tracked increases exponentially with the number of loci, and because of the nonlinearities introduced by selection and recombination. These difficulties pose a serious problem for the analysis of DNA sequence data. Typically, the mutation and recombination rates in sexually reproducing organisms are of the same order of magnitude, and so it is usually not feasible to analyse short segments of genome as if they reproduced asexually. In any case, the patterns of association across the genome are fundamental to detecting the action of selection, and inferring the genetic basis of phenotypic variation.

A natural way to describe the evolution of the genome is to represent it in terms of blocks which have different ancestry; an infinite number of loci can then be represented by a finite number of variables. Fisher (1953) first introduced this approach, and observed that the junctions between blocks with different ancestry behaved like discrete Mendelian loci. The approach received little attention until the availability of DNA sequences made it more directly applicable to data. Early work concentrated on two areas: the distribution of identity by descent under inbreeding (Sved, 1971; Stam, 1980; Franklin, 1977), and the genetic contribution of an individual to some particular descendant (Donnelly, 1983; Bickeboller and Thompson, 1996; Stefanov, 2000). Here the ‘pedigree’ is the tree of all descendants of the individual, irrespective of whether they carry any of its genetic material. In particular, Donnelly (1983) has shown that the probability that an individual contributes no genes to this descendant is approximately exp(−tR/2^t), where R is the map length and t is the time.

Recent work has focussed on the process backwards in time. Hudson (1990) extended the coalescent process to allow for recombination along continuous genomes; tracing the ancestry of several genomes back in time, the number of ancestral lineages may decrease through coalescence of different lineages, or may increase if one genome descends from two parental genomes via a recombination event which generates a new pair of junctions. This process of coalescence with recombination has been used extensively in simulations, and for fitting models to data using Monte Carlo methods. However, although the process is simple to define, analytical results have been hard to obtain (though see Wiuf and Hein 1997, Wiuf and Hein 1999; Derrida and Jung-Muller, 1999; Simonsen and Churchill, 1997; Griffiths and Tavare, 1996; Griffiths, 1999). One difficulty is that ancestral genomes typically carry several blocks of genome which are ancestral to the present-day sample; the joint location of multiple blocks must be followed.

Here, we consider a simpler, but closely related process: the descent of one particular genome forwards in time. We can divide the problem into two parts. First, we can follow descendants regardless of whether they actually carry any of the genetic information in question. These descendants form a branching tree, which we refer to as the pedigree. Second, we follow the descent of genetic material down each lineage in this pedigree. In a sexual population at equilibrium, the expected number of descendants doubles every generation. However, segregation is expected to halve the amount of ancestral material passed down each lineage, and so, on average, the genetic contribution of any one genome remains constant. We are concerned with the stochastic fluctuations caused by random reproduction, and random recombination.

The paper is laid out as follows. In Section 2 we present a preliminary mathematical analysis of our model. In particular, we find the probability that at least some part of a single genome will survive after t generations. We set out exact recursions for this probability, and in Section 2.1, in the case of a Poisson offspring distribution, find a simple and accurate approximation to it. In particular, we are able to show that if the population is not growing, a section of genome is certain to be lost, but that the rate of loss is very slow (logarithmic). In Section 2.2, we extend our analysis to consider the moments of the distribution of block sizes. This is achieved by superimposing the effects of recombination (which ‘erodes’ the blocks of ancestral genome) on a ‘pedigree’. The approach gives exact expressions for the moments, but these rapidly become very cumbersome as the order of the moments increases. In Section 3 we use simulations to illustrate the behaviour of the process. We return to determining the distribution of block sizes in Section 4. There, we show how to recover full information for the sizes of blocks of ancestral genome surviving at time t via the same recursion that gave us the loss probability, but starting from a different initial condition. In Appendix C this is extended further to retain also the positions of surviving ancestral blocks on the genome. In passing, in Section 4.1 we also note that a simple modification gives the distribution of blocks associated with some marker locus—for example, with a new mutation. In principle, differentiation would give us complete information about the moments of the distribution of block sizes, but in practice this is not tractable and so in Section 4.2 we propose an alternative approach. This involves taking a continuous approximation to the process that encodes the numbers and sizes of blocks of ancestral genome alive at time t. (In the mathematical literature such an approximation is called a superprocess.)

With these results in hand, we will be in a position to consider the robustness of the method to deviations from the two key assumptions: that there is a branching process within a large population, and that there is only a single crossover per generation. In Section 6 we show that the assumption of a single crossover is not really restrictive: even in a model with multiple crossovers, after only a few tens of generations individuals typically carry a single small block in which at most one crossover is likely. Of course, because in a sexual population at equilibrium on average each individual produces two offspring, related individuals do soon interbreed. However, in Section 5 we illustrate the surprising accuracy of the branching process assumption for populations of moderate size. In 7 An example, 8 Discussion we discuss the implications of this work and, as an example, consider the probability of survival of a single human genome that lived in the distant past. The detailed mathematical proofs are presented in the appendices.

Section snippets

The model

Consider a genome of map length y, and with complete interference between crossovers. With probability y, there is one crossover. This ensures that offspring inherit at most a single block. Note that we do not assume that y is small, although necessarily y⩽1. Each genome produces offspring by mating with an unrelated individual. For most of what follows we assume that the number of offspring is Poisson distributed, with mean 2(1+S). (Results would be essentially the same for any distribution

Simulations

Before returning to mathematical analysis, we pause to see, via simulations, what the process really looks like. Fig. 5 shows two random realisations of the process, for a genome of length y=1, after 50 generations. In the left panel, there are 2^50.77=3.43×10¹⁵ descendants in the pedigree (1.7 times the expected 2⁵⁰). Of this enormous number of descendants, only 131 carry any material from the ancestor (again, rather more than the expected 1+yt=51). The total amount of ancestral material,

Generating functions

The explicit results that we have given so far only allow us to calculate the mean number of blocks of a given size (or more precisely with sizes in a given range). In fact, we now recover the Laplace transform of the whole block-size distribution from a modification of recursion (2). This will encode enough information to give the full joint distribution of numbers of blocks of different sizes. We shall use it to write down the moment generating function for various quantities of interest.

Robustness of the branching process approximation

Simulations of finite populations (Baird 1995a, Baird 1995b) suggest that the branching process approximation for the extinction probability remains good even up to the point at which individual genes are likely to be fixed (t∼N; Fig. 8). By this time, the total number of individuals carrying some of the original block form a significant portion of the population. At first sight this is surprising, because long before this time (by ∼log₂(N) generations), almost all individuals are pedigree

Long genetic maps

The analysis thus far has depended on the assumption that there is only a single crossover per generation, so that individuals carry at most one block. This is reasonable for small segments, but makes it impossible to describe long genomes. We now examine how the results here can be extended to the case when we cannot ignore the possibility of multiple crossovers.

Our approach will once again be to superimpose the effects of recombination on a pedigree. We shall see that the ancestral block is

An example

In this section, we consider the fate of a long genome. As an example, we choose a genetic map of the same length as the human genome: R=35.67 Morgans for autosomes (averaged over the sexes and ignoring sex linkage; Broman et al., 1998; Kong et al., 2002). Our model of uniformly distributed crossovers does not capture the actual distribution of the genome across chromosomes with interference between crossovers. However, we will see that the qualitative outcome is not sensitive to such details.

Discussion

We have considered the fate of a single genome which is passed down through a very large sexual population. If there is at most one crossover per generation, then each descendant will carry only a single block of ancestral genome. For this case, we derive an extremely accurate approximation for the probability that at least some of the original genome will survive. In a population of constant size, and with no selection, eventual loss is inevitable, and yet extremely slow. For example, a block

Acknowledgements

Much of this research was carried out while A.M.E. was visiting the University of Edinburgh. She thanks everyone there for their hospitality. N.H.B. thank the University of California, Davis and N.H.B. and A.M.E. thank EPSRC and BBSRC for financial support (BBSRC MM109726 and EPSRC Advanced Research Fellowship GR/A90923/01). Finally, we thank Toby Johnson for his help with the artwork.

References (38)

K.W. Broman et al.
Comprehensive human genetic mapindividual and sex-specific variation in recombination
Am. J. Hum. Genet.
(1998)
K.P. Donnelly
The probability that related individuals share some section of genome identical by descent
Theor. Pop. Biol.
(1983)
I.R. Franklin
The distribution of the proportion of the genome which is homozygous by descent in inbred individuals
Theor. Pop. Biol.
(1977)
R.C. Griffiths
The time to the ancestor along sequences with recombination
Theor. Pop. Biol.
(1999)
R.C. Griffiths et al.
Monte Carlo inference methods in population genetics
Math. Comput. Modelling
(1996)
S.M. Krone et al.
Ancestral processes with selection
Theor. Pop. Biol.
(1997)
K.L. Simonsen et al.
A Markov chain model of coalescence with recombination
Theor. Pop. Biol.
(1997)
J.A. Sved
Linkage disequilibrium and homozygosity of chromosome segments in finite populations
Theor. Pop. Biol.
(1971)
C. Wiuf et al.
Recombination as a point process along sequences
Theor. Pop. Biol.
(1999)
K. Athreya et al.
Branching Processes
(1972)

Baird, S.J.E., 1995a. Applications of junctions theory. Ph.D. Thesis, University of...

S.J.E. Baird

The mixing of genotypes in hybrid zonesa simulation study of multilocus clines

Evolution

(1995)

N.H. Barton

Genetic hitch-hiking

Philos. Trans. R. Soc. London (B)

(2000)

R. Bellman et al.

Recurrence times for the Ehrenfest model

Pacific J. Math.

(1951)

H. Bickeboller et al.

The probability distribution of the amount of an individual's genome surviving to the following generation

Genetics

(1996)

N.H. Chapman et al.

The effect of population history on the lengths of ancestral chromosome segments

Genetics

(2002)

B. Derrida et al.

The genealogical tree of a chromosome

J. Stat. Phys.

(1999)

P. Ehrenfest et al.

Über zwei bekannte Einwände gegen das Boltzmannsche H-Theorem

Phys. Z.

(1907)

Etheridge, A.M., 2000. An Introduction to Superprocesses, University Lecture Notes, Vol. 20. Amer. Math. Soc.,...

Cited by (42)

Donnelly (1983) and the limits of genetic genealogy
2020, Theoretical Population Biology
Spatial Sorting Unlikely to Promote Maladaptive Hybridization: Response to Lowe, Muhlfeld, and Allendorf
2015, Trends in Ecology and Evolution
Counting the genetic ancestors from source populations in members of an admixed population
2024, Genetics
IBD sharing patterns as intra-breed admixture indicators in small ruminants
2024, Heredity
Convergence of genealogies through spinal decomposition with an application to population genetics
2023, Probability Theory and Related Fields
Widespread introgression of MHC genes in Iberian Podarcis lizards
2023, Molecular Ecology

View all citing articles on Scopus

¹: Supported in part by BBSRC MM109726.

²: Supported by an EPSRC Advanced Fellowship.

View full text

The distribution of surviving blocks of an ancestral genome

Abstract

Introduction

Section snippets

The model

Simulations

Generating functions

Robustness of the branching process approximation

Long genetic maps

An example

Discussion

Acknowledgements

Am. J. Hum. Genet.

Theor. Pop. Biol.

Theor. Pop. Biol.

Theor. Pop. Biol.

Math. Comput. Modelling

Theor. Pop. Biol.

Theor. Pop. Biol.

Theor. Pop. Biol.

Theor. Pop. Biol.

Branching Processes

The mixing of genotypes in hybrid zonesa simulation study of multilocus clines

Evolution

Genetic hitch-hiking

Philos. Trans. R. Soc. London (B)

Recurrence times for the Ehrenfest model

Pacific J. Math.

The probability distribution of the amount of an individual's genome surviving to the following generation

Genetics

The effect of population history on the lengths of ancestral chromosome segments

Genetics

The genealogical tree of a chromosome

J. Stat. Phys.

Über zwei bekannte Einwände gegen das Boltzmannsche H-Theorem

Phys. Z.