Skip to main content

Advertisement

Log in

The advantages of dense marker sets for linkage analysis with very large families

  • Original Investigation
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

Dense sets of hundreds of thousands of markers have been developed for genome-wide association studies. These marker sets are also beneficial for linkage analysis of large, deep pedigrees containing distantly related cases. It is impossible to analyse jointly all genotypes in large pedigrees using the Lander–Green Algorithm, however, as marker density increases it becomes less crucial to analyse all individuals’ genotypes simultaneously. In this report, an approximate multipoint non-parametric technique is described, where large pedigrees are split into many small pedigrees, each containing just two cases. This technique is demonstrated, using phased data from the International Hapmap Project to simulate sets of 10,000, 50,000 and 250,000 markers, showing that it becomes increasingly accurate as more markers are genotyped. This method allows routine linkage analysis of large families with dense marker sets and represents a more easily applied alternative to Monte Carlo Markov Chain methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abecasis GR, Wigginton JE (2005) Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers. Am J Hum Genet 77:754–767

    Article  PubMed  CAS  Google Scholar 

  • Abecasis GR, Cherny SS, Cookson WO, Cardon LR (2002) Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30:97–101

    Article  PubMed  CAS  Google Scholar 

  • Abney M, Ober C, McPeek MS (2002) Quantitative-trait homozygosity and association mapping and empirical genomewide significance in large, complex pedigrees: fasting serum-insulin level in the Hutterites. Am J Hum Genet 70:920–934

    Article  PubMed  CAS  Google Scholar 

  • Almasy L, Blangero J (1998) Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet 62:1198–1211

    Article  PubMed  CAS  Google Scholar 

  • Altshuler D, Brooks LD, Chakravarti A, Collins FS, Daly MJ, Donnelly P, Consortium IH (2005) A haplotype map of the human genome. Nature 437:1299–1320

    Article  CAS  Google Scholar 

  • Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263–265

    Article  PubMed  CAS  Google Scholar 

  • Bureau A, Speed TP, Baird PN (2000) Recovering inheritance information for linkage analysis in large pedigrees by Markov chain Monte Carlo multipoint computations. Am J Hum Genet 67:306

    Google Scholar 

  • Chen W-M, Abecasis GR (2006) Estimating the power of variance component linkage analysis in large pedigrees. Genet Epidemiol 30:471–484

    Article  PubMed  Google Scholar 

  • Chiang AP, Beck JS, Yen HJ, Tayeh MK, Scheetz TE, Swiderski RE, Nishimura DY, Braun TA, Kim KY, Huang J, Elbedour K, Carmi R, Slusarski DC, Casavant TL, Stone EM, Sheffield VC (2006) Homozygosity mapping with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet–Biedl syndrome gene (BBS11). Proc Natl Acad Sci USA 103:6287–6292

    Article  PubMed  CAS  Google Scholar 

  • de Andrade M, Olswold CL, Slusser JP, Tordsen LA, Atkinson EJ, Rabe KG, Slager SL (2005) Identification of genes involved in alcohol consumption and cigarettes smoking. BMC Genet 6 (Suppl 1):S112

    Article  PubMed  CAS  Google Scholar 

  • Elston RC, Stewart J (1971) A general model for the genetic analysis of pedigree data. Hum Hered 21:523–542

    Article  PubMed  CAS  Google Scholar 

  • Evans DM, Cardon LR (2004) Guidelines for genotyping in genomewide linkage studies: single nucleotide polymorphism maps versus microsatellite maps. Am J Hum Genet 75:687–692

    Article  PubMed  CAS  Google Scholar 

  • George AW, Thompson EA (2003) Discovering disease genes: multipoint linkage analysis via a new Markov chain Monte Carlo approach. Stat Sci 18:515–531

    Article  Google Scholar 

  • Gudbjartsson DF, Thorvaldsson T, Kong A, Gunnarsson G, Ingolfsdottir A (2005) Allegro version 2. Nat Genet 37:1015–1016

    Article  PubMed  CAS  Google Scholar 

  • Heath SC (1997) Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am J Hum Genet 61:748–760

    PubMed  CAS  Google Scholar 

  • Hicks AA, Petursson H, Jonsson T, Stefansson H, Johannsdottir HS, Sainz J, Frigge ML, Kong A, Gulcher JR, Stefansson K, Sveinbjornsdottir S (2002) A susceptibility gene for late-onset idiopathic Parkinson’s disease. Ann Neurol 52:549–555

    Article  PubMed  CAS  Google Scholar 

  • Hinrichs AL, Bertelsen S, Bierut LJ, Dunn G, Jin CH, Kauwe JS, Suarez BK (2005) Multipoint identity-by-descent computations for single-point polymorphism and microsatellite maps. BMC Genet 6:S34

    Article  PubMed  CAS  Google Scholar 

  • Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6:95–108

    Article  PubMed  CAS  Google Scholar 

  • Huang Q, Shete S, Amos CI (2004) Ignoring linkage disequilibrium among tightly linked markers induces false-positive evidence of linkage for affected sib pair analysis. Am J Hum Genet 75:1106–1112

    Article  PubMed  CAS  Google Scholar 

  • Hwu WL, Yang CF, Fann CS, Chen CL, Tsai TF, Chien YH, Chiang SC, Chen CH, Hung SI, Wu JY, Chen YT (2005) Mapping of psoriasis to 17q terminus. J Med Genet 42:152–158

    Article  PubMed  CAS  Google Scholar 

  • Kruglyak L, Daly MJ, ReeveDaly MP, Lander ES (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet 58:1347–1363

    PubMed  CAS  Google Scholar 

  • Lindholm E, Hodge SE, Greenberg DA (2004) Comparative informativeness for linkage of multiple SNPs and single microsatellites. Hum Hered 58:164–170

    Article  PubMed  Google Scholar 

  • McVean GA, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P (2004) The fine-scale structure of recombination rate variation in the human genome. Science 304:581–584

    Article  PubMed  CAS  Google Scholar 

  • Risch N (2001) The genetic epidemiology of cancer: interpreting family and twin studies and their implications for molecular genetic approaches. Cancer Epidemiol Biomarkers Prev 10:733–741

    PubMed  CAS  Google Scholar 

  • Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517

    Article  PubMed  CAS  Google Scholar 

  • Schaid DJ, McDonnell SK, Wang L, Cunningham JM, Thibodeau SN (2002) Caution on pedigree haplotype inference with software that assumes linkage equilibrium. Am J Hum Genet 71:992–995

    Article  PubMed  Google Scholar 

  • Service S, Molina J, Deyoung J, Jawaheer D, Aldana I, Vu T, Bejarano J, Fournier E, Ramirez M, Mathews CA, Davanzo P, Macaya G, Sandkuijl L, Sabatti C, Reus V, Freimer N (2006) Results of a SNP genome screen in a large Costa Rican pedigree segregating for severe bipolar disorder. Am J Med Genet B Neuropsychiatr Genet 141:367–373

    PubMed  Google Scholar 

  • Sieh W, Basu S, Fu AQ, Rothstein JH, Scheet PA, Stewart WC, Sung YJ, Thompson EA, Wijsman EM (2005) Comparison of marker types and map assumptions using Markov chain Monte Carlo-based linkage analysis of COGA data. BMC Genet 6 (Suppl 1):S11

    Article  PubMed  CAS  Google Scholar 

  • Sobel E, Lange K (1996) Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics. Am J Hum Genet 58:1323–1337

    PubMed  CAS  Google Scholar 

  • Sveinbjornsdottir S, Hicks AA, Jonsson T, Petursson H, Guomundsson G, Frigge ML, Kong A, Gulcher JR, Stefansson K (2000) Familial aggregation of Parkinson’s disease in Iceland. N Engl J Med 343:1765–1770

    Article  PubMed  CAS  Google Scholar 

  • Vierimaa O, Georgitsi M, Lehtonen R, Vahteristo P, Kokko A, Raitila A, Tuppurainen K, Ebeling TM, Salmela PI, Paschke R, Gundogdu S, De Menis E, Makinen MJ, Launonen V, Karhu A, Aaltonen LA (2006) Pituitary adenoma predisposition caused by germline mutations in the AIP gene. Science 312:1228–1230

    Article  PubMed  CAS  Google Scholar 

  • Whittemore AS, Halpern J (1994) A class of tests for linkage using affected pedigree members. Biometrics 50:118–127

    Article  PubMed  CAS  Google Scholar 

  • Wijsman EM, Rothstein JH, Thompson EA (2006) Multipoint linkage analysis with many multiallelic or dense diallelic markers: markov chain-monte carlo provides practical approaches for genome scans on general pedigrees. Am J Hum Genet 79:846–858

    Article  PubMed  CAS  Google Scholar 

  • Wilcox MA, Pugh EW, Zhang H, Zhong X, Levinson DF, Kennedy GC, Wijsman EM (2005) Comparison of single-nucleotide polymorphisms and microsatellite markers for linkage analysis in the COGA and simulated data sets for Genetic Analysis Workshop 14: Presentation Groups 1, 2, and 3. Genet Epidemiol 29 (Suppl 1):S7–S28

    Article  PubMed  Google Scholar 

  • Xu J, Zheng SL, Komiya A, Mychaleckyj JC, Isaacs SD, Hu JJ, Sterling D, Lange EM, Hawkins GA, Turner A, Ewing CM, Faith DA, Johnson JR, Suzuki H, Bujnovszky P, Wiley KE, DeMarzo AM, Bova GS, Chang B, Hall MC, McCullough DL, Partin AW, Kassabian VS, Carpten JD, Bailey-Wilson JE, Trent JM, Ohar J, Bleecker ER, Walsh PC, Isaacs WB, Meyers DA (2002) Germline mutations and sequence variants of the macrophage scavenger receptor 1 gene are associated with prostate cancer risk. Nat Genet 32:321–325

    Article  PubMed  CAS  Google Scholar 

  • Yang XR, Beerman M, Bergen AW, Parry DM, Sheridan E, Liebsch NJ, Kelley MJ, Chanock S, Goldstein AM (2005) Corroboration of a familial chordoma locus on chromosome 7q and evidence of genetic heterogeneity using single nucleotide polymorphisms (SNPs). Int J Cancer 116:487–491

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Terry Speed for his suggestions during the genesis of this project. RT, SQ, JD and JS are supported by an NHMRC Capacity-Building grant, and JS is also supported by an NHMRC Transitional Institute Grant. JM is an NHMRC CJ Martin Fellow.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Russell Thomson.

Appendix

Appendix

Further details on the simulation process

Four sets of simulations were performed (Figs. 2, 3, 4 and 8) to examine the accuracy of this splitting method at various marker densities. The simulation process involved randomly choosing a position on the autosomal genome for a disease locus. For the bulk of the simulations, the inheritance vector at this locus was specified. Traditionally, simulations to estimate power are created by fixing the disease model and allowing the inheritance vector to vary according to the model chosen. With our simulation scheme we were not able to determine power for a given disease model, however, the scheme facilitated comparisons between linkage methods and marker sets. For the set of simulations that assumed no disease locus was present, the inheritance vector was allowed to vary.

To simulate genotypic data, for each founder, two haplotypes extending 20 cM either side of the disease locus (or to the end of the chromosome) were sampled without replacement from the phased haplotypes in phase I of the International HapMap project (Utah CEPH population). Recombinations between the markers in the 40 cM region were then simulated for every meiosis in the pedigree, using the fine-scale genetic map estimated from HapMap data available for download at http://www.hapmap.org/downloads/recombination/latest/ (McVean et al. 2004). These recombinations, in conjunction with the chosen inheritance vector, were used to generate genotypes for non-founders.

The S pairs statistic was estimated for the simulated genotypic data, using the pedigree-splitting method described above. IBD-sharing probabilities were calculated using Merlin 1.0 (Abecasis et al. 2002; Abecasis and Wigginton 2005). LD between the markers was taken into account by defining haplotype blocks using HaploView 3.2 (Barrett et al. 2005) (spine-of-LD rule with D′ = 0.8). HaploView was also used to estimate haplotype frequencies. These were provided as inputs in Merlin 1.0 and the SNPs within haplotype blocks were then treated as single multi-allelic markers. The NPLpairs statistic was obtained from the S pairs statistic by subtracting the mean and dividing by the standard deviation of the S pairs statistic under the null hypothesis of no linkage for a given pedigree. The mean and standard deviation were estimated for a fully-informative marker by 107 gene-dropping simulations (Sobel and Lange 1996). In each simulation founder alleles are transmitted randomly down the complete, unsplit pedigree, and S pairs is equal to the number of pairs of alleles from distinct cases that are copies of the same founder allele.

Estimation of the number of markers required to provide sufficient power to detect linkage

Figure 6a gives the cumulative distribution function F(x) of the number of markers, x, inherited IBD by a pair of related cases on either side of a disease locus (i.e. in the region of no recombination). To obtain this, it was assumed that both markers and recombinations are both distributed randomly across the genome with Poisson distributions relative to genetic distance. Then for a pair of cases who only have one line of common ancestry through a pair of full siblings, the expected distance in morgans to the first recombination on either side of the disease locus is \( {\text{1/}}\lambda _{{\text{1}}}={\text{1/(}}k{\text{ + 1)}}\), where k is the degree of relationship between the cases (= 3 for first cousins, = 5 for second cousins, etc.). Letting n denote the number of markers genotyped along the genome (of approximate length 35 morgans), the expected distance to the first marker on either side of the disease locus is \( {\text{1/}}\lambda _{{\text{2}}} {\text{ = 35/}}n \).

Consider the combined Poisson process of recombinations and markers with rate λ 1 + λ 2, where each event has probability, \( p_{m} = \lambda _{{{\text{ }}2}} /(\lambda _{{{\text{ }}1}} + \lambda _{{{\text{ 2}}}} )=(n/35), (k + 1 + n/35) \) of being a marker. Due to the reversibility of Poisson processes, the distribution of x is the same as the distribution of the number of markers occurring before two recombination events occur in the (k + 1) meioses connecting the cases. This is given by

$$ \begin{aligned}{} f(x) & = P(x{\text{ markers before 2nd recomb}}{\text{.}}) \\ & = {\sum\limits_{i = 0}^x {P(i{\text{ markers}}){\text{ }}P({\text{recomb}}{\text{.}}){\text{ }}P{\text{(}}x - i{\text{ markers) }}P{\text{(recomb}}{\text{.)}}} } \\ & = (x + 1)p^{x}_{m} (1 - p_{m} )^{2} \\ \end{aligned} $$
(1)

The cumulative distribution function is then given by

$$ \begin{aligned}{} F(x) & \; = \;P(X \le x)\; = \;{\sum\limits_{j = 0}^x {(j + 1)p_{m} ^{j} {\left( {1 - p_{m} } \right)}^{2} } } \\ & = {\left( {1 - p_{m} } \right)}^{2} {\sum\limits_{j = 0}^x {\frac{d} {{dp_{m} }}{\left[ {p_{m} ^{{j + 1}} } \right]}} } \\ & = {\left( {1 - p_{m} } \right)}^{2} \frac{d} {{dp_{m} }}{\left[ {{\sum\limits_{j = 0}^x {p_{m} ^{{j + 1}} } }} \right]} \\ & = {\left( {1 - p_{m} } \right)}^{2} \frac{d} {{dp_{m} }}{\left[ {{\left( {\frac{{p_{m} - p_{m} ^{{x + 2}} }} {{1 - p_{m} }}} \right)}} \right]} \\ & = 1 - {\left[ {\frac{{n/35}} {{k + 1 + n/35}}} \right]}^{{x + 1}} {\left( {1 + (x + 1){\left[ {\frac{{k + 1}} {{k + 1 + n/35}}} \right]}} \right)} \\ \end{aligned} $$
(2)

Figure 6b presents an estimate of the minimum number, y of identical-by-state (IBS) markers required to be reasonably confident that a haplotype has been inherited identical-by-descent (IBD) by two individuals. An idealized situation is assumed, where there is no LD between the markers, but they are close enough to ensure that the relatives have either inherited a haplotype IBD over the entire region, or have not inherited a haplotype IBD over any of the region (that is there is no recombination in the region and thus complete separation between the scales on which LD and linkage operate). If two individuals have inherited an allele identical-by-state at y consecutive markers, the probability that these alleles have been inherited IBD is

$$ \begin{aligned}{} P({\text{IBD}}\left| {y{\text{ alleles IBS}},k} \right.) & = \frac{{P({\text{IBD}}\left| k \right.)}} {{P({\text{IBD}}\left| k \right.)\; + \;P(y{\text{ alleles IBS}}\left| {{\text{not IBD}},k} \right.)P({\text{not IBD}}\left| k \right.)}} \\ & = \frac{{\frac{1} {{2^{k-1} }}}} {{\frac{1} {{2^{k-1} }}\; + \;{\left[ {1 - 2p^{2} (1 - p)^{2} } \right]}^{y} {\left( {1 - \frac{1} {{2^{k-1} }}} \right)}}} \\ \end{aligned} $$
(3)

where k is the degree of relationship and p is the minor allele frequency. For this probability to be greater than P, y is given by

$$ y > \frac{{\log {\left[ {\frac{{1/P - 1}} {{2^{k-1} - 1}}} \right]}}} {{\log {\left( {1 - 2p^{2} (1 - p)^{2} } \right)}}}. $$
(4)

Figure 6b was generated from the above formula, with P = 0.8.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thomson, R., Quinn, S., McKay, J. et al. The advantages of dense marker sets for linkage analysis with very large families. Hum Genet 121, 459–468 (2007). https://doi.org/10.1007/s00439-007-0323-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-007-0323-5

Keywords

Navigation