Abstract
With the recent advance in genome-wide association studies (GWAS), disease-associated single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) have been extensively reported. Accordingly, the issue of incorrect identification of recombination events that can induce the distortion of multi-allelic or hemizygous variants has received more attention. However, the potential distorted calculation bias or significance of a detected association in a GWAS due to the coexistence of CNVs and SNPs in the same genomic region may remain under-recognized. Here we performed the association study within a congenital scoliosis (CS) cohort whose genetic etiology was recently elucidated as a compound inheritance model, including mostly one rare variant deletion CNV null allele and one common variant non-coding hypomorphic haplotype of the TBX6 gene. We demonstrated that the existence of a deletion in TBX6 led to an overestimation of the contribution of the SNPs on the hypomorphic allele. Furthermore, we generalized a model to explain the calculation bias, or distorted significance calculation for an association study, that can be ‘induced’ by CNVs at a locus. Meanwhile, overlapping between the disease-associated SNPs from published GWAS and common CNVs (overlap 10%) and pathogenic/likely pathogenic CNVs (overlap 99.69%) was significantly higher than the random distribution (p < 1 × 10−6 and p = 0.034, respectively), indicating that such co-existence of CNV and SNV alleles might generally influence data interpretation and potential outcomes of a GWAS. We also verified and assessed the influence of colocalizing CNVs to the detection sensitivity of disease-associated SNP variant alleles in another adolescent idiopathic scoliosis (AIS) genome-wide association study. We proposed that detecting co-existent CNVs when evaluating the association signals between SNPs and disease traits could improve genetic model analyses and better integrate GWAS with robust Mendelian principles.
Similar content being viewed by others
Abbreviations
- CNV:
-
Copy number variation
- SNV:
-
Single nucleotide variant
- SNP:
-
Single nucleotide polymorphism
- DISCO study:
-
Deciphering Disorders Involving Scoliosis and Comorbidities study
- GWAS:
-
Genome-wide association studies
- CS:
-
Congenital scoliosis
- NGS:
-
Next-generation sequencing
- qPCR:
-
Quantitative polymerase chain reaction
- aCGH:
-
Array-based comparative genomic hybridization microarray
- MLPA:
-
Multiplex ligation-dependent probe amplification
- HWE:
-
Hardy–Weinberg equilibrium
- AF:
-
Allele frequency
- CGR:
-
Complex genomic rearrangements
- CMT1A-REPs:
-
Charcot–Marie–Tooth disease type 1A-repeats
- ExAC:
-
Exome Aggregation Consortium
- NHGRI:
-
National Human Genome Research Institute
- DGV:
-
Database of Genomic Variants
- PSV:
-
Paralogous sequence variant
- OR:
-
Odds ratio
- CI:
-
Confidence interval
References
Albers CA et al (2012) Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome. Nat Genet 44:435–439, S431–S432. https://doi.org/10.1038/ng.1083
Andrews T et al (2015) The clustering of functionally related genes contributes to CNV-mediated disease. Genome Res 25:802–813. https://doi.org/10.1101/gr.184325.114
Antonacci F et al (2010) A large and complex structural polymorphism at 16p12.1 underlies microdeletion disease risk. Nat Genet 42:745–750. https://doi.org/10.1038/ng.643
Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE (2001) Segmental duplications: organization and impact within the current human genome project assembly. Genome Res 11:1005–1017. https://doi.org/10.1101/gr.187101
Bailey JA et al (2002) Recent segmental duplications in the human genome. Science 297:1003–1007. https://doi.org/10.1126/science.1072047
Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263–265. https://doi.org/10.1093/bioinformatics/bth457
Boone PM et al (2013) Deletions of recessive disease genes: CNV contribution to carrier states and disease-causing alleles. Genome Res 23:1383–1394. https://doi.org/10.1101/gr.156075.113
Campbell IM et al (2016) Multiallelic positions in the human genome: challenges for genetic analyses. Hum Mutat 37:231–234. https://doi.org/10.1002/humu.22944
Carvalho CM, Lupski JR (2008) Copy number variation at the breakpoint region of isochromosome 17q. Genome Res 18:1724–1732. https://doi.org/10.1101/gr.080697.108
Carvalho CM, Lupski JR (2016) Mechanisms underlying structural variant formation in genomic disorders. Nat Rev Genet 17:224–238. https://doi.org/10.1038/nrg.2015.25
Chick JM et al (2016) Defining the consequences of genetic variation on a proteome-wide scale. Nature 534:500–505. https://doi.org/10.1038/nature18270
Coe BP et al (2014) Refining analyses of copy number variation identifies specific genes associated with developmental delay. Nat Genet 46:1063–1071. https://doi.org/10.1038/ng.3092
Colella S et al (2007) QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res 35:2013–2025. https://doi.org/10.1093/nar/gkm076
Conrad DF et al (2010) Origins and functional impact of copy number variation in the human genome. Nature 464:704–712. https://doi.org/10.1038/nature08516
Cooper GM, Zerr T, Kidd JM, Eichler EE, Nickerson DA (2008) Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nat Genet 40:1199–1203. https://doi.org/10.1038/ng.236
Cooper GM et al (2011) A copy number variation morbidity map of developmental delay. Nat Genet 43:838–846. https://doi.org/10.1038/ng.909
Crossa J et al (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186:713–724. https://doi.org/10.1534/genetics.110.118521
Dellinger AE, Saw SM, Goh LK, Seielstad M, Young TL, Li YJ (2010) Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays. Nucleic Acids Res 38:e105. https://doi.org/10.1093/nar/gkq040
Fei Q et al (2010) The association analysis of TBX6 polymorphism with susceptibility to congenital scoliosis in a Chinese Han population. Spine (Phila Pa 1976) 35:983–988. https://doi.org/10.1097/BRS.0b013e3181bc963c
Flannick J, Florez JC (2016) Type 2 diabetes: genetic data sharing to advance complex disease research. Nat Rev Genet 17:535–549. https://doi.org/10.1038/nrg.2016.56
Flint J, Eskin E (2012) Genome-wide association studies in mice. Nat Rev Genet 13:807–817. https://doi.org/10.1038/nrg3335
Flipsen-ten Berg K et al (2007) Unmasking of a hemizygous WFS1 gene mutation by a chromosome 4p deletion of 8.3 Mb in a patient with Wolf–Hirschhorn syndrome. Eur J Hum Genet 15:1132–1138. https://doi.org/10.1038/sj.ejhg.5201899
Franke A et al (2010) Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat Genet 42:1118–1125. https://doi.org/10.1038/ng.717
Fredman D, White SJ, Potter S, Eichler EE, Den Dunnen JT, Brookes AJ (2004) Complex SNP-related sequence variation in segmental genome duplications. Nat Genet 36:861–866. https://doi.org/10.1038/ng1401
Genomes Project Consortium et al (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073. https://doi.org/10.1038/nature09534
Genomes Project Consortium et al (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65. https://doi.org/10.1038/nature11632
Giampietro PF et al (2003) Congenital and idiopathic scoliosis: clinical and genetic aspects. Clin Med Res 1:125–136. https://doi.org/10.3121/cmr.1.2.125
Gonzaga-Jauregui C et al (2015) Exome sequence analysis suggests that genetic burden contributes to phenotypic variability and complex neuropathy. Cell Rep 12:1169–1183. https://doi.org/10.1016/j.celrep.2015.07.023
Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17:333–351. https://doi.org/10.1038/nrg.2016.49
Gu S et al (2016) Mechanisms for the generation of two quadruplications associated with split-hand malformation. Hum Mutat 37:160–164. https://doi.org/10.1002/humu.22929
Han MR et al (2016) Genome-wide association study in East Asians identifies two novel breast cancer susceptibility loci. Hum Mol Genet 25:3361–3371. https://doi.org/10.1093/hmg/ddw164
Higashiyama R et al (2016) Association of copy number polymorphisms at the promoter and translated region of COMT with Japanese patients with schizophrenia. Am J Med Genet B Neuropsychiatr Genet 171B:447–457. https://doi.org/10.1002/ajmg.b.32426
Hinds DA, Kloek AP, Jen M, Chen X, Frazer KA (2006) Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat Genet 38:82–85. https://doi.org/10.1038/ng1695
Hormozdiari F, Alkan C, Eichler EE, Sahinalp SC (2009) Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res 19:1270–1278. https://doi.org/10.1101/gr.088633.108
International Schizophrenia C et al (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460:748–752. https://doi.org/10.1038/nature08185
Iossifov I et al (2014) The contribution of de novo coding mutations to autism spectrum disorder. Nature 515:216–221. https://doi.org/10.1038/nature13908
Itsara A et al (2009) Population analysis of large copy number variants and hotspots of human genetic disease. Am J Hum Genet 84:148–161. https://doi.org/10.1016/j.ajhg.2008.12.014
Kaminsky EB et al (2011) An evidence-based approach to establish the functional and clinical significance of copy number variants in intellectual and developmental disabilities. Genet Med 13:777–784. https://doi.org/10.1097/GIM.0b013e31822c79f9
Lai WR, Johnson MD, Kucherlapati R, Park PJ (2005) Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21:3763–3770. https://doi.org/10.1093/bioinformatics/bti611
Landrum MJ et al (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 44:D862–D868. https://doi.org/10.1093/nar/gkv1222
Lee SH et al (2012) Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat Genet 44:247–250. https://doi.org/10.1038/ng.1108
Lefebvre M et al (2016) Autosomal recessive variations of TBX6, from congenital scoliosis to spondylocostal dysostosis. Clin Genet 91:908–912. https://doi.org/10.1111/cge.12918
Lek M et al (2016) Analysis of protein-coding genetic variation in 60,706 humans. Nature 536:285–291. https://doi.org/10.1038/nature19057
Lindsay SJ, Khajavi M, Lupski JR, Hurles ME (2006) A chromosomal rearrangement hotspot can be identified from population genetic variation and is coincident with a hotspot for allelic recombination. Am J Hum Genet 79:890–902. https://doi.org/10.1086/508709
Liu S et al (2016) Association between ADAMTS-4 gene polymorphism and lumbar disc degeneration in Chinese Han population. J Orthop Res 34:860–864. https://doi.org/10.1002/jor.23081
Liu S et al (2017) Genetic polymorphism of LBX1 is associated with adolescent idiopathic scoliosis in Northern Chinese Han population. Spine (Phila Pa 1976) 42:1125–1129. https://doi.org/10.1097/BRS.0000000000002111
Locke AE et al (2015) Genetic studies of body mass index yield new insights for obesity biology. Nature 518:197–206. https://doi.org/10.1038/nature14177
Long J et al (2013) A common deletion in the APOBEC3 genes and breast cancer risk. J Natl Cancer Inst 105:573–579. https://doi.org/10.1093/jnci/djt018
Lupski JR (2003) 2002 Curt Stern Award Address. Genomic disorders recombination-based disease resulting from genomic architecture. Am J Hum Genet 72:246–252. https://doi.org/10.1086/346217
Lupski JR et al (1991) DNA duplication associated with Charcot–Marie–Tooth disease type 1A. Cell 66:219–232. https://doi.org/10.1016/0092-8674(91)90613-4
MacArthur DG et al (2014) Guidelines for investigating causality of sequence variants in human disease. Nature 508:469–476. https://doi.org/10.1038/nature13127
MacDonald JR, Ziman R, Yuen RK, Feuk L, Scherer SW (2014) The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res 42:D986–D992. https://doi.org/10.1093/nar/gkt958
Manolio TA (2009) Cohort studies and the genetics of complex disease. Nat Genet 41:5–6. https://doi.org/10.1038/ng0109-5
Marenne G, Chanock SJ, Malats N, Genin E (2013) Advantage of using allele-specific copy numbers when testing for association in regions with common copy number variants. PLoS One 8:e75350. https://doi.org/10.1371/journal.pone.0075350
Marouf C, Gohler S, Filho MI, Hajji O, Hemminki K, Nadifi S, Forsti A (2016) Analysis of functional germline variants in APOBEC3 and driver genes on breast cancer risk in Moroccan study population. BMC Cancer 16:165. https://doi.org/10.1186/s12885-016-2210-8
Matise TC et al (1994) Detection of tandem duplications and implications for linkage analysis. Am J Hum Genet 54:1110–1121
McCarroll SA et al (2008a) Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn’s disease. Nat Genet 40:1107–1112. https://doi.org/10.1038/ng.215
McCarroll SA et al (2008b) Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 40:1166–1174. https://doi.org/10.1038/ng.238
Miller DT et al (2010) Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am J Hum Genet 86:749–764. https://doi.org/10.1016/j.ajhg.2010.04.006
Mills RE et al (2011) Mapping copy number variation by population-scale genome sequencing. Nature 470:59–65. https://doi.org/10.1038/nature09708
NCBI:dbGaP Genotyping NIGMS Chromosomal Aberration and Inherited Disorder Samples. (Study Accession ID: phs000269.v1.p1). https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000269.v1.p1. Accessed 4 Jan 2018
Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12:443–451. https://doi.org/10.1038/nrg2986
Pang AW et al (2010) Towards a comprehensive structural variation map of an individual human genome. Genome Biol 11:R52. https://doi.org/10.1186/gb-2010-11-5-r52
Pinto D et al (2011) Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants. Nat Biotechnol 29:512–520. https://doi.org/10.1038/nbt.1852
Purcell S et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575. https://doi.org/10.1086/519795
Rees E et al (2016) Analysis of intellectual disability copy number variants for association with schizophrenia. JAMA Psychiatry 73:963–969. https://doi.org/10.1001/jamapsychiatry.2016.1831
Shen Y, Wu BL (2009) Designing a simple multiplex ligation-dependent probe amplification (MLPA) assay for rapid detection of copy number variants in the genome. J Genet Genom 36:257–265. https://doi.org/10.1016/S1673-8527(08)60113-7
Stankiewicz P, Lupski JR (2010) Structural variation in the human genome and its role in disease. Annu Rev Med 61:437–455. https://doi.org/10.1146/annurev-med-100708-204735
Takeda K et al (2017) Compound heterozygosity for null mutations and a common hypomorphic risk haplotype in Tbx6 causes congenital scoliosis. Hum Mutat 38:317–323. https://doi.org/10.1002/humu.23168
Tarailo-Graovac M, Zhu JYA, Matthews A, van Karnebeek CDM, Wasserman WW (2017) Assessment of the ExAC data set for the presence of individuals with pathogenic genotypes implicated in severe Mendelian pediatric disorders. Genet Med 19:1300–1308. https://doi.org/10.1038/gim.2017.50
Tennessen JA et al (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337:64–69. https://doi.org/10.1126/science.1219240
Trivellin G et al (2014) Gigantism and acromegaly due to Xq26 microduplications and GPR101 mutation. N Engl J Med 371:2363–2374. https://doi.org/10.1056/NEJMoa1408028
Visscher PM, Yang J, Goddard ME (2010) A commentary on ‘common SNPs explain a large proportion of the heritability for human height’ by Yang et al. (2010). Twin Res Hum Genet 13:517–524. https://doi.org/10.1375/twin.13.6.517
Wang K et al (2007) PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 17:1665–1674. https://doi.org/10.1101/gr.6861907
Weaver S et al (2010) Taking qPCR to a higher level: analysis of CNV reveals the power of high throughput qPCR to enhance quantitative resolution. Methods 50:271–276. https://doi.org/10.1016/j.ymeth.2010.01.003
Weischenfeldt J, Symmons O, Spitz F, Korbel JO (2013) Phenotypic impact of genomic structural variation: insights from and for human disease. Nat Rev Genet 14:125–138. https://doi.org/10.1038/nrg3373
Wellcome Trust Case Control C et al (2010) Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464:713–720. https://doi.org/10.1038/nature08979
Welter D et al (2014) The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42:D1001–D1006. https://doi.org/10.1093/nar/gkt1229
White JJ et al (2018) WNT signaling perturbations underlie the genetic heterogeneity of Robinow syndrome. Am J Hum Genet 102:27–43. https://doi.org/10.1016/j.ajhg.2017.10.002
Wu N et al (2014a) The involvement of ADAMTS-5 genetic polymorphisms in predisposition and diffusion tensor imaging alterations of lumbar disc degeneration. J Orthop Res 32:686–694. https://doi.org/10.1002/jor.22582
Wu N et al (2014b) Association of LMX1A genetic polymorphisms with susceptibility to congenital scoliosis in Chinese Han population. Spine (Phila Pa 1976) 39:1785–1791 https://doi.org/10.1097/BRS.0000000000000536
Wu N et al (2015) TBX6 null variants and a common hypomorphic allele in congenital scoliosis. N Engl J Med 372:341–350. https://doi.org/10.1056/NEJMoa1406829
Yamamoto S et al (2014) A drosophila genetic resource of mutants to study mechanisms underlying human genetic diseases. Cell 159:200–214. https://doi.org/10.1016/j.cell.2014.09.002
Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82. https://doi.org/10.1016/j.ajhg.2010.11.011
Yang TL, Guo Y, Li SM, Li SK, Tian Q, Liu YJ, Deng HW (2013a) Ethnic differentiation of copy number variation on chromosome 16p12.3 for association with obesity phenotypes in European and Chinese populations. Int J Obes (Lond) 37:188–190. https://doi.org/10.1038/ijo.2012.31
Yang Y et al (2013b) Clinical whole-exome sequencing for the diagnosis of Mendelian disorders. N Engl J Med 369:1502–1511. https://doi.org/10.1056/NEJMoa1306555
Yoon WH et al (2017) Loss of nardilysin, a mitochondrial co-chaperone for alpha-ketoglutarate dehydrogenase, promotes mTORC1 activation and neurodegeneration. Neuron 93:115–131. https://doi.org/10.1016/j.neuron.2016.11.038
Yuan B et al (2015) Comparative genomic analyses of the human NPHP1 locus reveal complex genomic architecture and its regional evolution in primates. PLoS Genet 11:e1005686. https://doi.org/10.1371/journal.pgen.1005686
Zarrei M, MacDonald JR, Merico D, Scherer SW (2015) A copy number variation map of the human genome. Nat Rev Genet 16:172–183. https://doi.org/10.1038/nrg3871
Zhao Q et al (2013) Rare CNVs and tag SNPs at 15q11.2 are associated with schizophrenia in the Han Chinese population. Schizophr Bull 39:712–719. https://doi.org/10.1093/schbul/sbr197
Zody MC et al (2008) Evolutionary toggling of the MAPT 17q21.31 inversion region. Nat Genet 40:1076–1083. https://doi.org/10.1038/ng.193
Acknowledgements
We would like to thank all the individuals involved in the study for their participation.
Funding
This research was funded in part by the National Natural Science Foundation of China (81501852 to N.W., 81472046 and 81772299 to Z.W., 81472045 and 81772301 to G.Q), Beijing Natural Science Foundation (7172175 to N.W.), Beijing Nova Program (Z161100004916123 to N.W.,), Beijing Nova Program Interdisciplinary Collaborative Project (xxjc201717 to N.W.), 2016 Milstein Medical Asian American Partnership Foundation Fellowship Award in Translational Medicine (to N.W.), The Central Level Public Interest Program for Scientific Research Institute (2016ZX310177 to N.W.), PUMC Youth Fund and the Fundamental Research Funds for the Central Universities (3332016006 to N.W.), CAMS Initiative Fund for Medical Sciences (2016-I2M-3-003 to G.Q. and N.W., 2016-I2M-2-006 and 2017-I2M-2-001 to Z.W.), the Distinguished Youth Foundation of Peking Union Medical College Hospital (JQ201506 to N.W.), the 2016 PUMCH Science Fund for Junior Faculty (PUMCH-2016-1.1 to N.W.), the US National Institutes of Health, National Institute of Neurological Disorders and Stroke (NINDS R01NS058529 and R35NS105078 to J.R.L), National Human Genome Research Institute/National Heart, Lung, and Blood Institute (NHGRI/NHLBI UM1 HG006542 to J.R.L), the National Human Genome Research Institute (NHGRI K08 HG008986 to J.E.P).
Author information
Authors and Affiliations
Consortia
Corresponding author
Ethics declarations
Conflict of interest
J.R.L. has stock ownership in 23andMe and Lasergen, is a paid consultant for Regeneron Pharmaceuticals, and is a coinventor on multiple the United States and European patents related to molecular diagnostics for inherited neuropathies, eye diseases and bacterial genomic fingerprinting. The Department of Molecular and Human Genetics at Baylor College of Medicine derives revenue from the chromosomal microarray analysis and clinical exome sequencing offered in the Baylor Genetics Laboratory (http://bmgl.com).
Electronic supplementary material
Below is the link to the electronic supplementary material.
439_2018_1910_MOESM1_ESM.xlsx
Table S1. The significant SNPs related to clinical conditions in the National Human Genome Research Institute (NHGRI) Catalog of Published GWAS (XLSX 561 KB)
439_2018_1910_MOESM3_ESM.xlsx
Table S3. The total of 27,096 SNPs with significant difference between 196 AIS cases and 303 control subjects in the AIS genome-wide association study (XLSX 1651 KB)
439_2018_1910_MOESM4_ESM.xlsx
Table S4. The overlapping CNVs and disease-associated SNPs in the same region either in the AIS patients or controls in the genome-wide association study (XLSX 197 KB)
Rights and permissions
About this article
Cite this article
Liu, J., Zhou, Y., Liu, S. et al. The coexistence of copy number variations (CNVs) and single nucleotide polymorphisms (SNPs) at a locus can result in distorted calculations of the significance in associating SNPs to disease. Hum Genet 137, 553–567 (2018). https://doi.org/10.1007/s00439-018-1910-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-018-1910-3