Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

cnvHap: an integrative population and haplotype–based multiplatform model of SNPs and CNVs

Abstract

Although genome-wide association studies have uncovered single-nucleotide polymorphisms (SNPs) associated with complex disease, these variants account for a small portion of heritability. Some contribution to this 'missing heritability' may come from copy-number variants (CNVs), in particular rare CNVs; but assessment of this contribution remains challenging because of the difficulty in accurately genotyping CNVs, particularly small variants. We report a population-based approach for the identification of CNVs that integrates data from multiple samples and platforms. Our algorithm, cnvHap, jointly learns a chromosome-wide haplotype model of CNVs and cluster-based models of allele intensity at each probe. Using data for 50 French individuals assayed on four separate platforms, we found that cnvHap correctly detected at least 14% more deleted and 50% more amplified genotypes than PennCNV or QuantiSNP, with an 82% and 115% improvement for aberrations containing <10 probes. Combining data from multiple platforms additionally improved sensitivity.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Schematic flow chart showing cnvHap operation.
Figure 2: Visualization of cnvHap population model on chromosome 1 for integrated Illumina 1M and Agilent 244k datasets.
Figure 3: CNV predictions for chromosome 1 25.452 Mb–25.536 Mb.
Figure 4: Cumulative frequency of squared Pearson's correlation coefficient between predicted and benchmark copy number calls.
Figure 5: ROC curves for detecting CNVs using Illumina 1M data.
Figure 6: Combining datasets improved sensitivity.

Similar content being viewed by others

References

  1. Meyre, D. et al. Genome-wide association study for early-onset and morbid adult obesity identifies three new risk loci in European populations. Nat. Genet. 41, 157–159 (2009).

    Article  CAS  Google Scholar 

  2. Sladek, R. et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445, 881–885 (2007).

    Article  CAS  Google Scholar 

  3. Zeggini, E. et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 40, 638–645 (2008).

    Article  CAS  Google Scholar 

  4. Cook, E.H. & Scherer, S.W. Copy-number variations associated with neuropsychiatric conditions. Nature 455, 919–923 (2008).

    Article  CAS  Google Scholar 

  5. Walters, R.G. et al. A new highly penetrant form of obesity due to deletions on chromosome 16p11.2. Nature 463, 671–675 (2010).

    Article  CAS  Google Scholar 

  6. Aitman, T.J. et al. Copy number polymorphism in Fcgr3 predisposes to glomerulonephritis in rats and humans. Nature 439, 851–855 (2006).

    Article  CAS  Google Scholar 

  7. Diskin, S.J. et al. Copy number variation at 1q21.1 associated with neuroblastoma. Nature 459, 987–991 (2009).

    Article  CAS  Google Scholar 

  8. McCarroll, S.A. et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nat. Genet. 40, 1107–1112 (2008).

    Article  CAS  Google Scholar 

  9. Willer, C.J. et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat. Genet. 41, 25–34 (2009).

    Article  CAS  Google Scholar 

  10. Kleinjan, D.A. & van Heyningen, V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am. J. Hum. Genet. 76, 8–32 (2005).

    Article  CAS  Google Scholar 

  11. Stranger, B.E. et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848–853 (2007).

    Article  CAS  Google Scholar 

  12. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    Article  CAS  Google Scholar 

  13. Wellcome Trust Case Control Consortium. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010).

  14. Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).

    Article  CAS  Google Scholar 

  15. Lipson, D., Aumann, Y., Ben-Dor, A., Linial, N. & Yakhini, Z. Efficient calculation of interval scores for DNA copy number data analysis. J. Comput. Biol. 13, 215–228 (2006).

    Article  CAS  Google Scholar 

  16. Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).

    Article  CAS  Google Scholar 

  17. Colella, S. et al. QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res. 35, 2013–2025 (2007).

    Article  CAS  Google Scholar 

  18. Franke, L. et al. Detection, imputation, and association analysis of small deletions and null alleles on oligonucleotide arrays. Am. J. Hum. Genet. 82, 1316–1333 (2008).

    Article  CAS  Google Scholar 

  19. Mefford, H.C. et al. A method for rapid, targeted CNV genotyping identifies rare variants associated with neurocognitive disease. Genome Res. 19, 1579–1585 (2009).

    Article  CAS  Google Scholar 

  20. Cooper, G.M., Zerr, T., Kidd, J.M., Eichler, E.E. & Nickerson, D.A. Systematic assessment of copy-number-variant detection via genome-wide SNP genotyping. Nat. Genet. 40, 1199–1203 (2008).

    Article  CAS  Google Scholar 

  21. Korn, J.M. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat. Genet. 40, 1253–1260 (2008).

    Article  CAS  Google Scholar 

  22. Coin, L. & Durbin, R. Improved techniques for the identification of pseudogenes. Bioinformatics 20 (Suppl. 1), i94–i100 (2004).

    Article  CAS  Google Scholar 

  23. Hoerl, A.E. Application of ridge analysis to regression problems. Chem. Eng. Prog. 58, 54–59 (1962).

    Google Scholar 

  24. de Smith, A.J. et al. Small deletion variants have stable breakpoints commonly associated with alu elements. PLoS One 3, e3104 (2008).

    Article  Google Scholar 

  25. Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).

    Article  CAS  Google Scholar 

  26. Su, S.-Y., Balding, D.J. & Coin, L.J.M. Inference of haplotypic phase and missing genotypes in polyploid organisms and variable copy number genomic regions. BMC Bioinformatics 9, 513 (2008).

    Article  Google Scholar 

  27. de Smith, A.J. et al. Array CGH analysis of copy number variation identifies 1284 new genes variant in healthy white males: implications for association studies of complex diseases. Hum. Mol. Genet. 16, 2783–2794 (2007).

    Article  CAS  Google Scholar 

  28. Peiffer, D.A. et al. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res. 16, 1136–1148 (2006).

    Article  CAS  Google Scholar 

  29. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    Article  CAS  Google Scholar 

  30. Su, S.-Y., Balding, D.J. & Coin, L.J.M. Disease association tests by inferring ancestral haplotypes using a hidden Markov model. Bioinformatics 24, 972–978 (2008).

    Article  CAS  Google Scholar 

  31. Marioni, J.C. et al. Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization. Genome Biol. 8, R228 (2007).

    Article  Google Scholar 

Download references

Acknowledgements

We thank D. Serre, A. Montpetit and D. Vincent for advice concerning Illumina arrays and D. Peiffer (Illumina) for providing genotype data on HapMap samples. Genome Canada and Genome Quebec funded genotyping on the Illumina Human1M platform. L.J.M.C. is funded by a Research Council UK fellowship. J.E.A. is supported by the Medical Research Council. R.G.W. is supported by Johnson & Johnson and the South East England Development Agency. J.S.E.-S.M. is supported by an Imperial College Division of Medicine PhD studentship.

Author information

Authors and Affiliations

Authors

Contributions

L.J.M.C. designed the project with A.I.F.B., developed the cnvHap algorithm and software, analyzed data and wrote the paper. J.E.A. ran cnvPartition, PennCNV and QuantiSNP on the data and helped write the paper. R.G.W. and J.S.E.-S.M. provided critical comments and helped to write the paper. D.J.B. provided statistical advice. R.S. provided SNP genotype data, advised on its interpretation and edited the paper. A.J.d.S. provided aCGH data and advised on its interpretation. P.F. provided the DNA samples and coordinated the SNP genotyping. A.I.F.B. designed the project with L.J.M.C., coordinated the aCGH analysis, contributed to writing the paper and oversaw the project.

Corresponding author

Correspondence to Lachlan J M Coin.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9, Supplementary Tables 1–3 and Supplementary Note 1 (PDF 1513 kb)

Supplementary Software

Software, documentation and an example. (ZIP 9805 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Coin, L., Asher, J., Walters, R. et al. cnvHap: an integrative population and haplotype–based multiplatform model of SNPs and CNVs. Nat Methods 7, 541–546 (2010). https://doi.org/10.1038/nmeth.1466

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.1466

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing