Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

The complex interplay among factors that influence allelic association

An Erratum to this article was published on 01 March 2004

Key Points

  • Although complex gene studies are typically designed by focusing on such factors as linkage disequilibrium (LD) or the frequency of the susceptibility loci, the interplay between such factors is much more important than their isolated effects. The joint effects of marker allele frequency, linkage disequilibrium and allelic effect size are crucial in determining the statistical power to detect associations between complex traits and candidate genes in well-designed, population-based case–control studies.

  • The relationship between these parameters can be quantitatively described and various situations of interplay should be considered in the design stage of case–control studies.

  • Provided large sample sizes are used, common (>0.1 in frequency) markers in high LD with either common disease alleles of small effect or rare disease alleles of large effect should generally be suitable to detect association with loci of moderate effect sizes, as long as multiple rare alleles of equal effect size do not cancel each other out by being associated with opposite marker alleles.

Abstract

Small effect sizes, common-disease/common-variant versus rare variant influences, biased single nucleotide polymorphism ascertainment and low linkage disequilibrium have recently been discussed as impediments to association studies. Such a focus on the individual factors that highlight their maximum potential effect (whether positive or deleterious) is often optimistic as, in practice, they do not operate in isolation. Instead, they work jointly to generate the disease gene architecture and to determine the ability of a study to discover it. Here, we consider how the effect size of the susceptibility locus, the frequency of the disease allele(s), the frequency of the marker allele(s) that are correlated with the disease allele(s) and the extent of linkage disequilibrium together influence genetic association studies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Reduction in effect size owing to linkage disequilibrium and mismatches between marker and disease allele frequencies.
Figure 2: The joint influence of linkage disequilibrium, effect size and marker–disease allele frequencies on the power to detect disease association.

Similar content being viewed by others

References

  1. Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nature Genet. 33 (Suppl), 228–237 (2003). Excellent overview of complex disease mapping, with anticipations for the focus of the next few years.

    Article  CAS  PubMed  Google Scholar 

  2. Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996).

    Article  CAS  PubMed  Google Scholar 

  3. Weiss, K. M. & Terwilliger, J. D. How many diseases does it take to map a gene with SNPs? Nature Genet. 26, 151–157 (2000).

    Article  CAS  PubMed  Google Scholar 

  4. Risch, N. J. Searching for genetic determinants in the new millennium. Nature 405, 847–856 (2000).

    Article  CAS  PubMed  Google Scholar 

  5. Cardon, L. R. & Bell, J. I. Association study designs for complex diseases. Nature Rev. Genet 2, 91–99 (2001).

    Article  CAS  PubMed  Google Scholar 

  6. Clark, A. G. Finding genes underlying risk of complex disease by linkage disequilibrium mapping. Curr. Opin. Genet. Dev. 13, 296–302 (2003).

    Article  CAS  PubMed  Google Scholar 

  7. Chakravarti, A. It's raining SNPs, hallelujah? Nature Genet. 19, 216–217 (1998).

    Article  CAS  PubMed  Google Scholar 

  8. Johnson, G. C et al. Haplotype tagging for the identification of common disease genes. Nature Genet. 29, 233–237 (2001).

    Article  CAS  PubMed  Google Scholar 

  9. Clayton, D. & McKeigue, P. M. Epidemiological methods for studying genes and environmental factors in complex disease. Lancet 358, 1356–1360 (2001). Clear description of the advantages, disadvantages and applications of different epidemiological design methods in the study of genetic and environmental factors in complex disease aetiology. Also a useful background for the increasingly popular country-wide 'Biobank' projects.

    Article  CAS  PubMed  Google Scholar 

  10. Rothman, K. J. & Greenland, S. in Modern Epidemiology (eds Rothman, K. J. & Greenland, S.) 79–92 (Lippincott-Raven, Philadelphia, 1998).

    Google Scholar 

  11. Rothman, K. J. & Greenland, S. in Modern Epidemiology (eds Rothman, K. J. & Greenland, S.) 93–114 (Lippincott-Raven, Philadelphia, 1998).

    Google Scholar 

  12. Schlesselman, J. J. Case–Control Studies. Design, Conduct, Analysis (Oxford Univ. Press, New York, 1982).

    Google Scholar 

  13. Pritchard, J. K. & Donnelly, P. Case–control studies of association in structured or admixed populations. Theor. Popul. Biol. 60, 227–237 (2001).

    Article  CAS  PubMed  Google Scholar 

  14. Greenland, S. & Rothman, K. J. in Modern Epidemiology (eds Rothman, K. J. & Greenland, S.) 47–64 (Lippincott-Raven, Philadelphia, 1998).

    Google Scholar 

  15. Kirkwood, B. R. Cohort and Case–Control Studies. Essentials of Medical Statistics 173–183 (Blackwell Scientific Publications, Oxford, 1988).

    Google Scholar 

  16. Altman, D. G. in Practical Statistics for Medical Research (ed. Altman, D. G.) 231–276 (Chapman and Hall, London, 1991).

    Google Scholar 

  17. Khoury, M. J., Beaty, T. H. & Cohen, B. H. Fundamentals of Genetic Epidemiology (Oxford Univ. Press, New York, 1993).

    Google Scholar 

  18. Altmuller, J., Palmer, L. J., Fischer, G., Scherb, H. & Wjst, J. Genomewide scans of complex human diseases: true linkage is hard to find. Am. J. Hum. Genet. 69, 936–950 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Roses, A. D. A model for suceptibility polymorphisms for complex diseases: apolipoprotein E and Alzheimer's disease. Neurogenet. 1, 3–11 (1997).

    Article  CAS  Google Scholar 

  20. Peto, R. et al. Smoking, smoking cessation, and lung cancer in the UK since 1950: combination of national statistics with two case–control studies. BMJ 321, 323–329 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Weiss, K. M. & Clark, A. G. Linkage disequilibrium and the mapping of complex human traits. Trends Genet. 18, 19–24 (2002).

    Article  CAS  PubMed  Google Scholar 

  22. Dawson, E. et al. A first-generation linkage disequilibrium map of human chromosome 22. Nature 418, 544–548 (2002).

    Article  CAS  PubMed  Google Scholar 

  23. Phillips, M. S. et al. Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots. Nature Genet. 33, 382–387 (2003).

    Article  CAS  PubMed  Google Scholar 

  24. Cardon, L. R. & Abecasis, G. R. Using haplotype blocks to map human complex trait loci. Trends Genet. 19, 135–140 (2003).

    Article  CAS  PubMed  Google Scholar 

  25. Ardlie, K. G., Kruglyak, L. & Seielstad, M. Patterns of linkage disequilibrium in the human genome. Nature Rev. Genet. 3, 299–309 (2002). Detailed review of LD in the human genome and its possible origins and applications.

    Article  CAS  PubMed  Google Scholar 

  26. Couzin, J. Human genome. HapMap launches with pledges of $100 million. Science 298, 941–942 (2002).

    Article  CAS  PubMed  Google Scholar 

  27. Couzin, J. New mapping project splits the community. Science 296, 1391–1392 (2002).

    Article  CAS  PubMed  Google Scholar 

  28. Wall, J. D. & Pritchard, J. K. Haplotype blocks and linkage disequilibrium in the human genome. Nature Rev. Genet. 4, 587–597 (2003).

    Article  CAS  PubMed  Google Scholar 

  29. Hedrick, P. W. Gametic disequilibrium measures: proceed with caution. Genetics 117, 331–341 (1987).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Devlin, B. & Risch, N. A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29, 311–322 (1995).

    Article  CAS  PubMed  Google Scholar 

  31. Lewontin, R. C. The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49, 49–67 (1964).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Pritchard, J. K. & Przeworski, M. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Reich, D. E. et al. Linkage disequilibrium in the human genome. Nature 411, 199–204 (2001).

    Article  CAS  PubMed  Google Scholar 

  34. Abecasis, G. R. et al. Extent and distribution of linkage disequilibrium in three genomic regions. Am. J. Hum. Genet. 68, 191–197 (2001).

    Article  CAS  PubMed  Google Scholar 

  35. Kruglyak, L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genet. 22, 139–144 (1999).

    Article  CAS  PubMed  Google Scholar 

  36. Pritchard, J. K. & Cox, N. J. The allelic architecture of human disease genes: common disease–common variant`.. or not? Hum. Mol. Genet. 11, 2417–2423 (2003).

    Article  Google Scholar 

  37. Reich, D. E. & Lander, E. S. On the allelic spectrum of human disease. Trends Genet. 17, 502–510 (2001).

    Article  CAS  PubMed  Google Scholar 

  38. Wright, A. F. & Hastie, N. D. Complex genetic diseases: controversy over the Croesus code. Genome Biol. 2, comment 2007.1–2007.8 (2001).

    Google Scholar 

  39. Terwilliger, J. D. & Weiss, K. M. Linkage disequilibrium mapping of complex disease: fantasy or reality? Curr. Opin. Biotechnol. 9, 578–594 (1998). A strong case against the CDCV hypothesis and details the implications of this premise for LD mapping of complex traits.

    Article  CAS  PubMed  Google Scholar 

  40. Pritchard, J. K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Smith, D. J. & Lusis, A. J. The allelic structure of common disease. Hum. Mol. Genet. 11, 2455–2461 (2002).

    Article  CAS  PubMed  Google Scholar 

  42. Wright, A. A polygenic basis for late-onset disease. Trends Genet. 19, 97–106 (2003).

    Article  CAS  PubMed  Google Scholar 

  43. Chapman, J. M., Cooper, J. D., Todd, J. A. & Clayton, D. G. Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum. Hered. 56, 18–31 (2003).

    Article  PubMed  Google Scholar 

  44. Thompson, E. A., Deeb, S., Walker, D. & Motulsky, A. G. The detection of linkage disequilibrium between closely linked markers: RFLPs at the AI-CIII apolipoprotein genes. Am. J. Hum. Genet. 42, 113–124 (1998).

    Google Scholar 

  45. Muller-Myhsok, B. & Abel, L. Genetic analysis of complex diseases (comments on Risch & Merikangas). Science 275, 1328–1329 (1997). An early description of the importance of the similarity between disease and marker allele frequency in the power of association detection.

    CAS  PubMed  Google Scholar 

  46. Abecasis, G. R., Cookson, W. O. & Cardon, L. R. The power to detect linkage disequilibrium with quantitative traits in selected samples. Am. J. Hum. Genet. 68, 1463–1474 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Bertina, R. M. et al. Mutation in blood coagulation factor V associated with resistance to activated protein C. Nature 369, 64–67 (1994).

    Article  CAS  PubMed  Google Scholar 

  48. Hugot, J. P. et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 411, 599–603 (2001).

    Article  CAS  PubMed  Google Scholar 

  49. Rubinsztein, D. C. & Easton, D. F. Apolipoprotein E genetic variation and Alzheimer's disease. A meta-analysis. Dement. Geriatr. Cogn Disord. 10, 199–209 (1999).

    Article  CAS  PubMed  Google Scholar 

  50. Engel, L. S. et al. Pooled analysis and meta-analysis of glutathione S-transferase M1 and bladder cancer: a HuGE review. Am. J. Epidemiol. 156, 95–109 (2002).

    Article  PubMed  Google Scholar 

  51. Altshuler, D. et al. The common PPARgamma Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nature Genet. 26, 76–80 (2000).

    Article  CAS  PubMed  Google Scholar 

  52. Roses, A. D. Pharmacogenetics and the practice of medicine. Nature 405, 857–865 (2000).

    Article  CAS  PubMed  Google Scholar 

  53. Antoniou, A. C. & Easton, D. F. Polygenic inheritance of breast cancer: implications for design of association studies. Genet. Epidemiol. 25, 190–202 (2003). Recent paper that describes an example of the use of enrichment in epidemiological study design of traits with polygenic origin, and its influence on power in case–control studies.

    Article  PubMed  Google Scholar 

  54. Risch, N. & Teng, J. The relative power of family-based and case–control designs for linkage disequilibrium studies of complex human diseases. I. DNA pooling. Genome Res. 8, 1273–1288 (1998). Provides the fundamental equations that describe the relationship between the marker OR detected in a case–control study and the disease OR, marker allele frequency and disease allele frequency.

    Article  CAS  PubMed  Google Scholar 

  55. Ackerman, H. et al. Haplotypic analysis of the TNF locus by association efficiency and entropy. Genome Biol. 4, R24 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Fleiss, J. L., Tytun, A. & Ury, H. K. A simple approximation for calculating sample sizes for comparing independent proportions. Biometrics 36, 343–346 (1980).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank A. Morris for his very helpful comments on several of the ideas underlying this paper, and J. Marchini for help with some of the data representations. This work was supported by a Wellcome Trust Principal Research Fellowship to L.R.C. and a Medical Research Council Fellowship in Bioinformatics to K.T.Z.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lon R. Cardon.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

DATABASES

Entrez

APOE( * 4)

factor V Leiden

GSTM1

NOD2/CARD15

PPARγ

OMIM

Alzheimer disease

bladder cancer

Crohn disease

deep vein thrombosis

type II diabetes

FURTHER INFORMATION

International HapMap Project

Glossary

COMPLEX TRAIT

A measured phenotype, such as disease status or a quantitative character, which is influenced by many environmental and genetic factors, and potentially by interactions in and between them.

EFFECT SIZE

The extent to which a factor influences the risk of the condition under study, rather than simply an indication of whether a factor is significantly related to the condition.

LINKAGE DISEQUILIBRIUM

Two loci that are in linkage disequilibrium are inherited together more often than would be expected by chance.

MULTIFACTORIAL DISEASE

A disease that is influenced by many environmental and genetic susceptibility factors (see also complex trait).

HAPLOTYPE TAGGING

The concept that most of the haplotype structure (allele combination) in a particular chromosomal region can be captured by genotyping a smaller number of markers than all of those that constitute the haplotypes. The crucial markers to type would be those that distinguish one haplotype from another.

CASE–CONTROL STUDY

An epidemiological study design in which cases with a defined condition and controls without this condition are sampled from the same population. Risk-factor information is compared between the two groups to investigate the potential role of these in the aetiology of the condition.

EPIDEMIOLOGY

A discipline that seeks to explain the extent to which factors to which people are exposed (environmental or genetic) influence their risk of disease, by means of population-based investigations. Epidemiological studies are designed to minimize bias in obtaining results for the population under study.

POWER

The probability of a study to obtain a significant result if this result is true in the underlying population from which the study subjects were sampled.

PROSPECTIVE COHORT STUDY

Longitudinal analysis in which individuals selected for certain exposure characteristics are followed up over time to assess who develops a certain outcome (often disease).

GENOMIC CONTROL

Statistical/genetic procedure in which the apparent association between a particular polymorphism and a certain condition is adjusted for population stratification in the study sample using a set of randomly selected, unlinked markers. Population stratification can occur if the study sample consists of two or more sub-populations with distinct differences in allele frequencies.

INCOMPLETE PENETRANCE

A situation in which the probability of having the disease, given that one has the disease mutation(s), is less than 1.0.

HAPLOTYPE PHASE

The arrangement of alleles at two loci on homologous chromosomes. For example, in a diploid individual with genotype Mm at a marker locus and genotype Aa at the other locus, possible linkage phases are MA/ma and Ma/mA, for which '/' separates the two homologous chromosomes.

MINOR ALLELE FREQUENCY

The lowest allele frequency at a locus that is observed in a particular population. For single nucleotide polymorphisms, this is simply the lesser of the two allele frequencies.

MICROSATELLITE

A class of repetitive DNA sequences that are made up of tandemly organized repeats that are 2–8 nucleotides in length. They can be highly polymorphic and are frequently used as molecular markers in population genetics studies.

GENETIC DRIFT

The random fluctuation in population allele frequencies as genes are transmitted from one generation to the next.

TYPE I ERROR

The probability of rejecting the null hypothesis when it is true. For genetic association studies, type I errors reflect false-positive findings of associations between allele/genotype and disease.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zondervan, K., Cardon, L. The complex interplay among factors that influence allelic association. Nat Rev Genet 5, 89–100 (2004). https://doi.org/10.1038/nrg1270

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg1270

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing