Abstract
Induced pluripotent stem cells (iPSCs) are an established cellular system to study the impact of genetic variants in derived cell types and developmental contexts. However, in their pluripotent state, the disease impact of genetic variants is less well known. Here, we integrate data from 1,367 human iPSC lines to comprehensively map common and rare regulatory variants in human pluripotent cells. Using this population-scale resource, we report hundreds of new colocalization events for human traits specific to iPSCs, and find increased power to identify rare regulatory variants compared with somatic tissues. Finally, we demonstrate how iPSCs enable the identification of causal genes for rare diseases.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All data used in the study are available from the Sequence Read Archive, dbGaP or the European Nucleotide Archive; the full data availability is provided in Supplementary Table 1. Supplementary Table 2 provides a description of the samples used in the study. Full summary statistics on significant eQTLs can be obtained from Zenodo (https://doi.org/10.5281/zenodo.4005576). The colocalization results are accessible from the LocusCompare portal (http://locuscompare.com/). We used eQTL summary statistics from GTEx (v7; https://gtexportal.org/home/datasets/) and the BIOS cohort (https://genenetwork.nl/biosqtlbrowser/). We further used GWAS summary statistics from the NHGRI-EBI GWAS catalog (Ensembl v92; https://www.ebi.ac.uk/gwas/), Phenome scanner (v2; http://www.phenoscanner.medschl.cam.ac.uk/) and GWAS studies aggregated in the LocusCompare study. We downloaded the GTEx colocalization results via LocusCompare. Other external data sources are referenced in the Methods and the main text. Source data are provided with this paper.
Code availability
Code produced for the following analyses is available on GitHub: eQTL mapping (https://github.com/PMBio/hipsci_pipeline/tree/master/limix_QTL_pipeline/), gene expression outlier analysis (https://github.com/csmail/i2qtl_outlier/) and colocalization analysis (https://github.com/mikegloudemans/ipsc-coloc).
References
Westra, H.-J. et al. Systematic identification of trans-eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).
Bonder, M. J. et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat. Genet. 49, 131–138 (2017).
Zhernakova, D. V. et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat. Genet. 49, 139–145 (2017).
GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Alasoo, K. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat. Genet. 50, 424–431 (2018).
Schwartzentruber, J. et al. Molecular and functional variation in iPSC-derived sensory neurons. Nat. Genet. 50, 54–61 (2018).
Cuomo, A. S. E. et al. Single-cell RNA sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression. Nat. Commun. 11, 810 (2020).
Jerber, J. et al. Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nat. Genet. https://doi.org/10.1038/s41588-021-00801-6 (2021).
Sun, N. et al. Patient-specific induced pluripotent stem cells as a model for familial dilated cardiomyopathy. Sci. Transl. Med. 4, 130ra47 (2012).
Lan, F. et al. Abnormal calcium handling properties underlie familial hypertrophic cardiomyopathy pathology in patient-specific induced pluripotent stem cells. Cell Stem Cell 12, 101–113 (2013).
Lee, J. et al. Activation of PDGF pathway links LMNA mutation to dilated cardiomyopathy. Nature 572, 335–340 (2019).
Kodo, K. et al. iPSC-derived cardiomyocytes reveal abnormal TGF-β signalling in left ventricular non-compaction cardiomyopathy. Nat. Cell Biol. 18, 1031–1042 (2016).
Wu, H. et al. Modelling diastolic dysfunction in induced pluripotent stem cell-derived cardiomyocytes from hypertrophic cardiomyopathy patients. Eur. Heart J. 40, 3685–3695 (2019).
Dubois, N. C. et al. SIRPA is a specific cell-surface marker for isolating cardiomyocytes derived from human pluripotent stem cells. Nat. Biotechnol. 29, 1011–1018 (2011).
Sterneckert, J. L., Reinhardt, P. & Schöler, H. R. Investigating human disease using stem cell models. Nat. Rev. Genet. 15, 625–639 (2014).
Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).
Panopoulos, A. D. et al. iPSCORE: a resource of 222 iPSC lines enabling functional characterization of genetic variation across a variety of cell types. Stem Cell Rep. 8, 1086–1100 (2017).
Pashos, E. E. et al. Large, diverse population cohorts of hiPSCs and derived hepatocyte-like cells reveal functional genetic variation at blood lipid-associated loci. Cell Stem Cell 20, 558–570 (2017).
Banovich, N. E. et al. Impact of regulatory variation across human iPSCs and differentiated cells. Genome Res. 28, 122–131 (2018).
Carcamo-Orive, I. et al. Analysis of transcriptional variability in a large human iPSC library reveals genetic and non-genetic determinants of heterogeneity. Cell Stem Cell 20, 518–532.e9 (2017).
Frésard, L. et al. Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nat. Med. 25, 911–919 (2019).
Li, X. et al. The impact of rare variation on gene expression across tissues. Nature 550, 239–243 (2017).
Choi, J. et al. A comparison of genetically matched cell lines reveals the equivalence of human iPSCs and ESCs. Nat. Biotechnol. 33, 1173–1181 (2015).
Thomas, S. M. et al. Reprogramming LCLs to iPSCs results in recovery of donor-specific gene expression signature. PLoS Genet. 11, e1005216 (2015).
Donovan, M. K. R., D’Antonio-Chronowska, A., D’Antonio, M. & Frazer, K. A. Cellular deconvolution of GTEx tissues powers eQTL studies to discover thousands of novel disease and cell-type associated regulatory variants. Nat. Commun. 11, 955 (2020).
Forbes, S. A. et al. COSMIC: mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Res. 39, D945–D950 (2010).
Gerrard, D. T. et al. An integrative transcriptomic atlas of organogenesis in human embryos. eLife 5, e15657 (2016).
Urbut, S. M., Wang, G., Carbonetto, P. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51, 187–195 (2019).
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Jakubosky, D. et al. Discovery and quality analysis of a comprehensive set of structural variants and short tandem repeats. Nat. Commun. 11, 2928 (2020).
Zhao, J. et al. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016).
Li, X. et al. Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants. Am. J. Hum. Genet. 95, 245–256 (2014).
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Ferraro, N. M. et al. Transcriptomic signatures across human tissues identify functional rare genetic variation. Science 369, eaaz5900 (2020).
Cummings, B. B. et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. 9, eaal5209 (2017).
Kremer, L. S. et al. Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat. Commun. 8, 15824 (2017).
Kernohan, K. D. et al. Whole-transcriptome sequencing in blood provides a diagnosis of spinal muscular atrophy with progressive myoclonic epilepsy. Hum. Mutat. 38, 611–614 (2017).
McKusick, V. A. Mendelian Inheritance in Man: a Catalog of Human Genes and Genetic Disorders (JHU Press, 1998).
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Kamat, M. A. et al. PhenoScanner V2: an expanded tool for searching human genotype–phenotype associations. Bioinformatics https://doi.org/10.1093/bioinformatics/btz469 (2019).
Liu, B., Gloudemans, M. J., Rao, A. S., Ingelsson, E. & Montgomery, S. B. Abundant associations with gene expression complicate GWAS follow-up. Nat. Genet. 51, 768–769 (2019).
van der Harst, P. & Verweij, N. Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ. Res. 122, 433–443 (2018).
Liu, J. Z. et al. Dense fine-mapping study identifies new susceptibility loci for primary biliary cirrhosis. Nat. Genet. 44, 1137–1141 (2012).
Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
Pongor, L. et al. A genome-wide approach to link genotype to clinical outcome by utilizing next-generation sequencing and gene chip data of 6,697 breast cancer patients. Genome Med. 7, 104 (2015).
Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).
Sanchez, E. et al. POLR1B and neural crest cell anomalies in Treacher Collins syndrome type 4. Genet. Med. 22, 547–556 (2020).
Howson, J. M. M. et al. Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms. Nat. Genet. 49, 1113–1119 (2017).
Pankratz, N. et al. Meta-analysis of Parkinson’s disease: identification of a novel locus, RIT2. Ann. Neurol. 71, 370–384 (2012).
Lambert, J. C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013).
Scott, R. A. et al. An expanded genome-wide association study of type 2. Diabetes 66, 2888–2902 (2017).
Webb, G. J., Siminovitch, K. A. & Hirschfield, G. M. The immunogenetics of primary biliary cirrhosis: a comprehensive review. J. Autoimmun. 64, 42–52 (2015).
1000 Genomes Project Consortiumet al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Streeter, I. et al. The human-induced pluripotent stem cell initiative-data resources for cellular genetics. Nucleic Acids Res. 45, D691–D697 (2017).
D’Antonio, M. et al. Insights into the mutational burden of human induced pluripotent stem cells from an integrative multi-omics approach. Cell Rep. 24, 883–894 (2018).
DeBoever, C. et al. Large-scale profiling reveals the influence of genetic variation on gene expression in human induced pluripotent stem cells. Cell Stem Cell 20, 533–546.e7 (2017).
Knowles, J. W., Hao, K., Xie, W., Weedon, M. N. & Zhang, Z. Genetic and functional analyses identify NAT2 as a human insulin sensitivity gene. Circulation 128, A10906 (2013).
Casale, F. P., Rakitsch, B., Lippert, C. & Stegle, O. Efficient set tests for the genetic analysis of correlated traits. Nat. Methods 12, 755–758 (2015).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Ongen, H., Buil, A., Brown, A. A., Dermitzakis, E. T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Saha, A. & Battle, A. False positives in trans-eQTL and coexpression analyses arising from RNA-sequencing alignment errors. F1000Res. 7, 1860 (2018).
Pedersen, B. S., Layer, R. M. & Quinlan, A. R. Vcfanno: fast, flexible annotation of genetic variants. Genome Biol. 17, 118 (2016).
Hall, C. L. et al. Frequency of genetic variants associated with arrhythmogenic right ventricular cardiomyopathy in the genome aggregation database. Eur. J. Hum. Genet. 26, 1312–1318 (2018).
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 1000, 2078–2079 (2009).
Churchhouse, C & Neale, B. Rapid GWAS of thousands of phenotypes for 337,000 samples in the UK Biobank. https://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank/ (2017).
Acknowledgements
The authors thank the HipSci, iPSCORE, GENSIPS and PhLiPS cohorts for their input into the study and (early) access to the data. This work was supported by: a strategic award from the Wellcome Trust and UK Medical Research Council (WT098503), the EMBL Interdisciplinary Postdoc (EI3POD) program under Marie Skłodowska-Curie Actions COFUND (grant no. 664726 to M.J.B. and D.S.); National Institutes of Health (NIH; T32 LM012409 to C.S.; T15 LM01127 to D.J.; U01 HL107388-01, P30DK116074 and SPO 130829 to I.C.-O.; HL107442, DK105541, DK112155 to K.A.F.); Stanford Graduate Fellowship (to M.J.G.); National Natural Science Foundation of China (31970554 to X.L.); National Key R&D Program of China (2019YFC1315804 to X.L.); Shanghai Municipal Science and Technology Major Project (2017SHZDZX01 to X.L.); EBI–Sanger Postdoctoral Fellowship (to N.C.); National Science Foundation Graduate Research Fellowship (to N.M.F.); and California Institute for Regenerative Medicine (GC1R-06673 to K.A.F.). Research in the Stegle laboratory is supported by the BMBF, the Volkswagen Foundation and the European Union (ERC project DECODE). S.B.M. is supported by NIH grants U01HG009431, R01HL142015, R01HG008150, R01AG066490 and U01HG009080. Research in the Knowles laboratory is supported by NIH grants DK116750, DK120565, DK106236, DK107437 and HL107388. We thank the staff in the Cellular Genetics and Phenotyping and Sequencing core facilities at the Wellcome Trust Sanger Institute. Work at the Wellcome Trust Sanger Institute was further supported by Wellcome Trust grant WT090851. Several datasets used in this study were sourced from the NextGen Consortium, available from the NCBI database of Genotypes and Phenotypes (dbGaP): iPSCORE (phs000924.v4.p1 and phs001325.v3.p1); PhLiPS (phs001341.v1.p1); and GENESiPS (phs001139.v1.p1). This work, in part, used supercomputing resources provided by the Stanford Genetics Bioinformatics Service Center, supported by NIH S10 instrumentation grant S10OD023452. Research reported in this manuscript was in part supported by the NIH Common Fund, through the Office of Strategic Coordination/Office of the NIH Director under award no. U01HG007708. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Author information
Authors and Affiliations
Consortia
Contributions
The main analyses and data preparations were performed by M.J.B. and C.S.; D.J. and M.D. performed SV calling and analysis; N.M.F. performed GTEx (v7) data processing; L.F. and X.L. performed rare-variant annotation and interpretation; H.K. and D.V. annotated and validated the rare-disease variants and genes for the HipSci rare-disease samples; M.J.G. performed the colocalization analysis; M.J.B., D.S., B.M. and D.H. developed the eQTL software; C.Z. generated iPSC lines for Undiagnosed Diseases Network rare-disease samples; N.C., I.C.-O. and Y.P. assisted with data processing and analysis; M.J.B., C.S., M.J.G., S.B.M. and O.S. wrote the manuscript; I.C.-O., N.C., N.M.F., K.A.F., L.F., M.J.G., D.J., M.T.W., D.B.Z. and B.M. assisted in editing the manuscript; M.J.B., C.S., E.S., K.A.F., O.S. and S.B.M. conceived and oversaw the study.
Corresponding authors
Ethics declarations
Competing interests
S.B.M. is a member of the Scientific Advisory Board of Myome. All other authors declare no competing interests.
Additional information
Peer review information Nature Genetics thanks Gabriel Hoffman and Vijay Sankaran for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Replication and consistency of effect sizes of eQTL discovered in the the original iPSC studies and replicated in i2QTL.
Shown are scatter plots between eQTL effect size estimates in this study (i2QTL, x-axis) versus effect size estimates from previous studies (y-axis). Dots correspond to eQTL discovered in the respective study. Black: eQTL with consistent effect direction. Red: eQTL with discordant effect direction. Replication defined at nominal P < 0.05 in i2QTL. a, Replication of HipSci in i2QTL. 70.4% of the effects are replicated; 98% of the eQTL have concordant effect direction. Differences in the approach for estimating effect sizes result in the observed variation. b, Replication of iPSCORE in i2QTL. 81% of the effects are replicated; 98.7% of the eQTL have concordant effect direction. Notably only SNP eQTL were considered whereas SVs were not considered for replication. c, Replication of GENESiPS in i2QTL; 76.7% of the effects are replicated; 99.5% of the effects have concordant effect direction. d, Replication of PhiLiPS in i2QTL. 76.8% of the effects are replicated; 97.5% of the effects have concordant effect direction.
Extended Data Fig. 2 Identification of multiple independent eQTL for the same gene using stepwise regression.
a, Histogram of the number of independent eQTL effects identified for individual eGenes. Up to 12 independent effects were identified. b, Zoom-in view displaying the number of eGenes with 5 to 8 independent effects. c, Zoom-in view displaying the number eGenes with 8 to 12 independent effects.
Extended Data Fig. 3 Sharing of lead eQTL signals between cell types and studies, considering i2QTL, GTEX and the iPS differentiation study from Cuomo et al.
a, Pairwise sharing of lead eQTL signals in i2QTL (iPSC) and 48 GTEx tissues. Shown is the fraction of shared eQTL signals relative to the total number of common genes and lead eQTL variants in the two respective maps. Shared eQTL signals are defined as eQTL with concordant effect direction and absolute effect size within a factor of two (Methods). b, Distribution of pairwise sharing as in A of iPSCs versus GTEx tissues (blue, N = 48 comparisons) versus pairwise sharing between GTEx tissues (red, N = 2,401 comparisons).c, Pairwise sharing of lead eQTL signals in i2QTL (iPSC) and 48 GTEx tissues as in A, however additionally including single-cell eQTL form Cuomo et al. in iPSCs (iPSCsc), differentiated cell types (mesendo, endoderm). d, Distribution of pairwise sharing as in C, considering iPSCs and differentiated cell types (bulk left, followed by iPSC single cell), mesendoderm (cyan), and endoderm (green) versus GTex tissues (N = 48 comparisons). During differentiation genetic signals in iPSC become more similar to those in GTEx tissues. Data in panels b and d are displayed as violin- and boxplot with the midpoint corresponding to the median, the lower and upper edges of the box to the first and third quartiles and the whiskers corresponding to the IQR ×1.5.
Extended Data Fig. 4 Properties of distal (trans) gene-level eQTL.
a, Dot plot of N = 862 trans-eQTL detected in iPSC (FDR < 10%). Dots correspond to individual trans-eQTL, with color denoting the variant category (blue: cis-eQTL, red: GWAS variant, purple: cis-eQTL and GWAS variant). b, Breakdown of unique trans-eQTL variants (N = 193) across different variant annotations. Darker shaded colors denote trans-eQTL linked to variants that have more than one annotation; lighter shades correspond variants that can be unique assigned to a given variant category. c, Mediation analysis of trans-eQTL, considering variants that are linked to a cis eQTL. The outer pie denotes the annotation of the underlying trans variant: GWAS only variants (n = 7, white), cis eQTL variants (n = 186, dark-blue). The inner pie displays results from the mediation analysis for individual trans variants: n = 7 trans eQTL are exclusively linked to GWAS variants and hence not mediated (white, ‘GWAS only trans-eQTL’); n = 58 trans eQTL variants are not significantly linked to mediation with any RNA trait (‘Non-significant link to any cis-eQTL’); 86 are exclusively linked to gene-level abundance (‘Only linked to gene-level cis-eQTL’), 7 are exclusively linked a RNA trait other than gene-level abundance (‘Linked only to non gene-level cis-eQTL’); 35 are linked to gene-level abundance and at least one additional RNA trait(‘Linked both to gene-level and other level cis-eQTL’).
Extended Data Fig. 5 Analysis of the trans-eQTL hotspot at ELF2.
a, Schematic representation of the genetic loci around ELF2 (left) and NAA15 (right). SNPs are annotated by evidence of cis-eQTL regulation on different traits (blue: transcript-ratio eQTL on ELF2, purple: gene-level eQTL on EFL2, green: splice eQTL on ELF2, brown: the APA eQTL on NAA15). b, LD structure between cis-eQTL variants implicated in the hotspot, annotated by cis-eQTL type as in a. c-f. Lead cis-eQTL effects by RNA trait for ELF2 and NAA15 across SNPs linked to downstream trans-eQTLs(n = 682 samples). Data are represented as a violin- and boxplot with the midpoint corresponding to the median, the lower and upper edges of the box to the first and third quartiles, the whiskers represent the interquartile range ×1.5. C. Splicing cis-eQTL on ELF2. d. Gene level cis-eQTL on ELF2. E. Transcript-ratio cis-eQTL on ELF2. F. Alternative polyadenilation cis-eQTL on NAA15. g, Co-expression network of genes that are controlled by the trans-eQTL linked to the hotspot at ELF2 (N = 37 genes), including ELF2 itself (center). Color of the bounding box around Genes denotes the cis-eQTL variant that drives the corresponding trans effect (colors as in d-g). Genes with multiple trans regulators are depicted with multiple colored rings. h, Replication of trans effects at ELF2 in a single cell differentiation study (Cuomo et al.). Shown are trans-eQTL effect sizes (beta’s) for the 12 ELF2-linked trans targets that show significant replication (defined as P < 0.05 and consistent effect direction) in any of the Cuomo et al tissues. From left to right: eQTL effect size in; the i2QTL study (discovery); in undifferentiated iPSC profiled using scRNA-seq; in mesendoderm profiled using scRNA-seq and in definitive endoderm profiled using scRNA-seq. Significant replication are indicated with a red asterisk.
Extended Data Fig. 6 Enrichment for rare, deleterious SNPs in iPSC and GTEx tissues.
Comparison of enrichments for singleton, highly-deleterious (CADD > 25) SNPs in iPSC versus GTEx v7 tissues analogous to Fig. 2b, however with an additional adjustment for the number of expressed genes. Displayed are enrichments for 10,000 random draws of 50 samples, controlling for the number of expressed genes (genes subset at a fixed number across tissues, N = 500 genes). Strongest enrichment is observed in iPSC. The data are represented as a boxplot where the middle line corresponds to the median, the lower and upper edges of the box corresponding to the first and third quartiles, the whiskers represent the interquartile range (IQR) ×1.5 and beyond the whiskers are outlier points.
Extended Data Fig. 7 Expression level of rare disease genes in iPSC versus GTEx tissues.
a, Distribution of gene expression level of XX rare disease genes (log10(FPKM + 1)) in i2QTL iPSC and 17 GTEx tissues with a median expression level of at least 1 FPKM (red dashed line). Expression in i2QTL highlighted in teal, GTEx tissues in yellow. Disease genes are expressed in iPSCs and only difficult to biopsy tissues in GTEx display higher expression levels. n = 2,952 biologically independent samples. Data are represented as boxplots with the middle line corresponding to the median, the lower and upper edges of the box to the first and third quartiles, the whiskers to the interquartile range (IQR) ×1.5. b, Expression level of genes in different curated gene lists, comparing i2QTL iPSC (left), all GTEx tissues (middle) and GTEx whole blod (right). Shown is the fraction of genes in each category for two expression bins: [0,1) FPKM expression absent or lowly expressed; [1,1e + 12) FPKM gene expressed.
Extended Data Fig. 8 Additional results from the colocalization analysis of eQTL and GWAS traits.
a, Summary of colocalization results for the Coronary artery Disease GWAS (van der Harst et al 2018). 36 out of 93 GWAS loci were identified as colocalized with an eQTL of at least one RNA trait (38.7%). Shown are the number of colocalization events for eQTL of different RNA traits. From left to right: any eQTL type (All), gene-level eQTL (Gene), exon eQTL (Exon), transcript eQTL (Transcript), splicing eQTL (Splicing), APA eQTL (APA). The number of GWAS colocalization that are uniquely linked to a given RNA trait eQTL are displayed using a triangle and the total number of colocalizations per trait is depicted as a circle. b, Colocalization between a gene-level eQTL for POLR1B and a GWAS hit (rs7586668:C > T) for Height. Left: Manhatten plots displaying the local association signal for the eQTL on POLR1B (bottom) and the GWAS signal on Height (top). Right: Scatter plot of negative log P-values for the GWAS signal (x-axis) for BMI versus the POLR1B eQTL signal (y-axis) for the corresponding region.
Extended Data Fig. 9 Enrichment for large-effect outlier-associated rare variants in colocalized genes.
Absolute effect size of GWAS trait associations for iPSC outlier- and non-outlier-associated variants, considering genes with varying degree of evidence for colocalization with common eQTL variants.
Extended Data Fig. 10 Optimization of the number of PEER factors to adjust for confounding expression heterogeneity.
Shown is the number of genes with at least one gene-level eQTL (eGenes) for the top 3,000 highest expressed genes in iPSC for increasing number of PEER factors. PEER factors are adjusted for as additional fixed effect covariates. The vertical red line denotes the number of PEER factors considered in all i2QTL analyses.
Supplementary information
Supplementary Information
Supplementary Results, Methods and Note
Supplementary Tables
Supplementary Tables 1–18
Source data
Source Data Fig. 1
Figure data.
Source Data Fig. 2
Figure data.
Source Data Fig. 3
Figure data.
Source Data Fig. 4
Figure data.
Rights and permissions
About this article
Cite this article
Bonder, M.J., Smail, C., Gloudemans, M.J. et al. Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics. Nat Genet 53, 313–321 (2021). https://doi.org/10.1038/s41588-021-00800-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-021-00800-7
This article is cited by
-
PICALO: principal interaction component analysis for the identification of discrete technical, cell-type, and environmental factors that mediate eQTLs
Genome Biology (2024)
-
Pitfalls and opportunities for applying latent variables in single-cell eQTL analyses
Genome Biology (2023)
-
DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits
Nature Genetics (2023)
-
Fine mapping spatiotemporal mechanisms of genetic variants underlying cardiac traits and disease
Nature Communications (2023)
-
Genetic effects on the timing of parturition and links to fetal birth weight
Nature Genetics (2023)