Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics

Bonder, Marc Jan; Smail, Craig; Gloudemans, Michael J.; Frésard, Laure; Jakubosky, David; D’Antonio, Matteo; Li, Xin; Ferraro, Nicole M.; Carcamo-Orive, Ivan; Mirauta, Bogdan; Seaton, Daniel D.; Cai, Na; Vakili, Dara; Horta, Danilo; Zhao, Chunli; Zastrow, Diane B.; Bonner, Devon E.; Wheeler, Matthew T.; Kilpinen, Helena; Knowles, Joshua W.; Smith, Erin N.; Frazer, Kelly A.; Montgomery, Stephen B.; Stegle, Oliver

doi:10.1038/s41588-021-00800-7

Article
Published: 04 March 2021

Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics

Nature Genetics volume 53, pages 313–321 (2021)Cite this article

11k Accesses
31 Citations
116 Altmetric
Metrics details

Subjects

Abstract

Induced pluripotent stem cells (iPSCs) are an established cellular system to study the impact of genetic variants in derived cell types and developmental contexts. However, in their pluripotent state, the disease impact of genetic variants is less well known. Here, we integrate data from 1,367 human iPSC lines to comprehensively map common and rare regulatory variants in human pluripotent cells. Using this population-scale resource, we report hundreds of new colocalization events for human traits specific to iPSCs, and find increased power to identify rare regulatory variants compared with somatic tissues. Finally, we demonstrate how iPSCs enable the identification of causal genes for rare diseases.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Map of *cis* genetic regulation in human induced pluripotent cells.**

**Fig. 2: Linking rare variants to gene expression outliers.**

**Fig. 3: Colocalization of disease and trait variants with iPSC eQTLs.**

**Fig. 4: Integration of common-variant colocalization analyses with outlier-associated rare variants.**

Single-cell allele-specific expression analysis reveals dynamic and cell-type-specific regulatory effects

Article Open access 09 October 2023

Guanghao Qi, Benjamin J. Strober, … Alexis Battle

Genetic mapping of cell type specificity for complex traits

Article Open access 19 July 2019

Kyoko Watanabe, Maša Umićević Mirkov, … Danielle Posthuma

Direct inference and control of genetic population structure from RNA sequencing data

Article Open access 02 August 2023

Muhamad Fachrul, Abhilasha Karkey, … Michael Inouye

Data availability

All data used in the study are available from the Sequence Read Archive, dbGaP or the European Nucleotide Archive; the full data availability is provided in Supplementary Table 1. Supplementary Table 2 provides a description of the samples used in the study. Full summary statistics on significant eQTLs can be obtained from Zenodo (https://doi.org/10.5281/zenodo.4005576). The colocalization results are accessible from the LocusCompare portal (http://locuscompare.com/). We used eQTL summary statistics from GTEx (v7; https://gtexportal.org/home/datasets/) and the BIOS cohort (https://genenetwork.nl/biosqtlbrowser/). We further used GWAS summary statistics from the NHGRI-EBI GWAS catalog (Ensembl v92; https://www.ebi.ac.uk/gwas/), Phenome scanner (v2; http://www.phenoscanner.medschl.cam.ac.uk/) and GWAS studies aggregated in the LocusCompare study. We downloaded the GTEx colocalization results via LocusCompare. Other external data sources are referenced in the Methods and the main text. Source data are provided with this paper.

Code availability

Code produced for the following analyses is available on GitHub: eQTL mapping (https://github.com/PMBio/hipsci_pipeline/tree/master/limix_QTL_pipeline/), gene expression outlier analysis (https://github.com/csmail/i2qtl_outlier/) and colocalization analysis (https://github.com/mikegloudemans/ipsc-coloc).

References

Westra, H.-J. et al. Systematic identification of trans-eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).
Article CAS PubMed PubMed Central Google Scholar
Bonder, M. J. et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat. Genet. 49, 131–138 (2017).
Article CAS PubMed Google Scholar
Zhernakova, D. V. et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat. Genet. 49, 139–145 (2017).
Article CAS PubMed Google Scholar
GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Article PubMed Central Google Scholar
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Article CAS PubMed PubMed Central Google Scholar
Alasoo, K. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat. Genet. 50, 424–431 (2018).
Article CAS PubMed PubMed Central Google Scholar
Schwartzentruber, J. et al. Molecular and functional variation in iPSC-derived sensory neurons. Nat. Genet. 50, 54–61 (2018).
Article CAS PubMed Google Scholar
Cuomo, A. S. E. et al. Single-cell RNA sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression. Nat. Commun. 11, 810 (2020).
Article CAS PubMed PubMed Central Google Scholar
Jerber, J. et al. Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nat. Genet. https://doi.org/10.1038/s41588-021-00801-6 (2021).
Sun, N. et al. Patient-specific induced pluripotent stem cells as a model for familial dilated cardiomyopathy. Sci. Transl. Med. 4, 130ra47 (2012).
Article PubMed PubMed Central Google Scholar
Lan, F. et al. Abnormal calcium handling properties underlie familial hypertrophic cardiomyopathy pathology in patient-specific induced pluripotent stem cells. Cell Stem Cell 12, 101–113 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lee, J. et al. Activation of PDGF pathway links LMNA mutation to dilated cardiomyopathy. Nature 572, 335–340 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kodo, K. et al. iPSC-derived cardiomyocytes reveal abnormal TGF-β signalling in left ventricular non-compaction cardiomyopathy. Nat. Cell Biol. 18, 1031–1042 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wu, H. et al. Modelling diastolic dysfunction in induced pluripotent stem cell-derived cardiomyocytes from hypertrophic cardiomyopathy patients. Eur. Heart J. 40, 3685–3695 (2019).
Article CAS PubMed PubMed Central Google Scholar
Dubois, N. C. et al. SIRPA is a specific cell-surface marker for isolating cardiomyocytes derived from human pluripotent stem cells. Nat. Biotechnol. 29, 1011–1018 (2011).
Article CAS PubMed PubMed Central Google Scholar
Sterneckert, J. L., Reinhardt, P. & Schöler, H. R. Investigating human disease using stem cell models. Nat. Rev. Genet. 15, 625–639 (2014).
Article CAS PubMed Google Scholar
Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).
Article CAS PubMed PubMed Central Google Scholar
Panopoulos, A. D. et al. iPSCORE: a resource of 222 iPSC lines enabling functional characterization of genetic variation across a variety of cell types. Stem Cell Rep. 8, 1086–1100 (2017).
Article CAS Google Scholar
Pashos, E. E. et al. Large, diverse population cohorts of hiPSCs and derived hepatocyte-like cells reveal functional genetic variation at blood lipid-associated loci. Cell Stem Cell 20, 558–570 (2017).
Article CAS PubMed PubMed Central Google Scholar
Banovich, N. E. et al. Impact of regulatory variation across human iPSCs and differentiated cells. Genome Res. 28, 122–131 (2018).
Article CAS PubMed PubMed Central Google Scholar
Carcamo-Orive, I. et al. Analysis of transcriptional variability in a large human iPSC library reveals genetic and non-genetic determinants of heterogeneity. Cell Stem Cell 20, 518–532.e9 (2017).
Article CAS PubMed Google Scholar
Frésard, L. et al. Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nat. Med. 25, 911–919 (2019).
Article PubMed PubMed Central CAS Google Scholar
Li, X. et al. The impact of rare variation on gene expression across tissues. Nature 550, 239–243 (2017).
Article PubMed PubMed Central Google Scholar
Choi, J. et al. A comparison of genetically matched cell lines reveals the equivalence of human iPSCs and ESCs. Nat. Biotechnol. 33, 1173–1181 (2015).
Article CAS PubMed PubMed Central Google Scholar
Thomas, S. M. et al. Reprogramming LCLs to iPSCs results in recovery of donor-specific gene expression signature. PLoS Genet. 11, e1005216 (2015).
Article PubMed PubMed Central CAS Google Scholar
Donovan, M. K. R., D’Antonio-Chronowska, A., D’Antonio, M. & Frazer, K. A. Cellular deconvolution of GTEx tissues powers eQTL studies to discover thousands of novel disease and cell-type associated regulatory variants. Nat. Commun. 11, 955 (2020).
Article CAS PubMed PubMed Central Google Scholar
Forbes, S. A. et al. COSMIC: mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Res. 39, D945–D950 (2010).
Article PubMed PubMed Central CAS Google Scholar
Gerrard, D. T. et al. An integrative transcriptomic atlas of organogenesis in human embryos. eLife 5, e15657 (2016).
Article PubMed PubMed Central Google Scholar
Urbut, S. M., Wang, G., Carbonetto, P. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51, 187–195 (2019).
Article CAS PubMed Google Scholar
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Article CAS PubMed Google Scholar
Jakubosky, D. et al. Discovery and quality analysis of a comprehensive set of structural variants and short tandem repeats. Nat. Commun. 11, 2928 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zhao, J. et al. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016).
Article CAS PubMed PubMed Central Google Scholar
Li, X. et al. Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants. Am. J. Hum. Genet. 95, 245–256 (2014).
Article CAS PubMed PubMed Central Google Scholar
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ferraro, N. M. et al. Transcriptomic signatures across human tissues identify functional rare genetic variation. Science 369, eaaz5900 (2020).
Cummings, B. B. et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. 9, eaal5209 (2017).
Kremer, L. S. et al. Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat. Commun. 8, 15824 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kernohan, K. D. et al. Whole-transcriptome sequencing in blood provides a diagnosis of spinal muscular atrophy with progressive myoclonic epilepsy. Hum. Mutat. 38, 611–614 (2017).
Article CAS PubMed PubMed Central Google Scholar
McKusick, V. A. Mendelian Inheritance in Man: a Catalog of Human Genes and Genetic Disorders (JHU Press, 1998).
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kamat, M. A. et al. PhenoScanner V2: an expanded tool for searching human genotype–phenotype associations. Bioinformatics https://doi.org/10.1093/bioinformatics/btz469 (2019).
Liu, B., Gloudemans, M. J., Rao, A. S., Ingelsson, E. & Montgomery, S. B. Abundant associations with gene expression complicate GWAS follow-up. Nat. Genet. 51, 768–769 (2019).
Article CAS PubMed PubMed Central Google Scholar
van der Harst, P. & Verweij, N. Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ. Res. 122, 433–443 (2018).
Article PubMed PubMed Central CAS Google Scholar
Liu, J. Z. et al. Dense fine-mapping study identifies new susceptibility loci for primary biliary cirrhosis. Nat. Genet. 44, 1137–1141 (2012).
Article CAS PubMed PubMed Central Google Scholar
Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
Article CAS PubMed PubMed Central Google Scholar
Pongor, L. et al. A genome-wide approach to link genotype to clinical outcome by utilizing next-generation sequencing and gene chip data of 6,697 breast cancer patients. Genome Med. 7, 104 (2015).
Article PubMed PubMed Central CAS Google Scholar
Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).
Article CAS PubMed PubMed Central Google Scholar
Sanchez, E. et al. POLR1B and neural crest cell anomalies in Treacher Collins syndrome type 4. Genet. Med. 22, 547–556 (2020).
Article PubMed Google Scholar
Howson, J. M. M. et al. Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms. Nat. Genet. 49, 1113–1119 (2017).
Article CAS PubMed PubMed Central Google Scholar
Pankratz, N. et al. Meta-analysis of Parkinson’s disease: identification of a novel locus, RIT2. Ann. Neurol. 71, 370–384 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lambert, J. C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013).
Article CAS PubMed PubMed Central Google Scholar
Scott, R. A. et al. An expanded genome-wide association study of type 2. Diabetes 66, 2888–2902 (2017).
Article CAS PubMed PubMed Central Google Scholar
Webb, G. J., Siminovitch, K. A. & Hirschfield, G. M. The immunogenetics of primary biliary cirrhosis: a comprehensive review. J. Autoimmun. 64, 42–52 (2015).
Article CAS PubMed PubMed Central Google Scholar
1000 Genomes Project Consortiumet al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article CAS Google Scholar
Streeter, I. et al. The human-induced pluripotent stem cell initiative-data resources for cellular genetics. Nucleic Acids Res. 45, D691–D697 (2017).
Article CAS PubMed Google Scholar
D’Antonio, M. et al. Insights into the mutational burden of human induced pluripotent stem cells from an integrative multi-omics approach. Cell Rep. 24, 883–894 (2018).
Article PubMed PubMed Central CAS Google Scholar
DeBoever, C. et al. Large-scale profiling reveals the influence of genetic variation on gene expression in human induced pluripotent stem cells. Cell Stem Cell 20, 533–546.e7 (2017).
Article CAS PubMed PubMed Central Google Scholar
Knowles, J. W., Hao, K., Xie, W., Weedon, M. N. & Zhang, Z. Genetic and functional analyses identify NAT2 as a human insulin sensitivity gene. Circulation 128, A10906 (2013).
Casale, F. P., Rakitsch, B., Lippert, C. & Stegle, O. Efficient set tests for the genetic analysis of correlated traits. Nat. Methods 12, 755–758 (2015).
Article CAS PubMed Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Ongen, H., Buil, A., Brown, A. A., Dermitzakis, E. T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).
Article CAS PubMed Google Scholar
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Article CAS PubMed PubMed Central Google Scholar
Saha, A. & Battle, A. False positives in trans-eQTL and coexpression analyses arising from RNA-sequencing alignment errors. F1000Res. 7, 1860 (2018).
Article CAS PubMed Google Scholar
Pedersen, B. S., Layer, R. M. & Quinlan, A. R. Vcfanno: fast, flexible annotation of genetic variants. Genome Biol. 17, 118 (2016).
Article PubMed PubMed Central CAS Google Scholar
Hall, C. L. et al. Frequency of genetic variants associated with arrhythmogenic right ventricular cardiomyopathy in the genome aggregation database. Eur. J. Hum. Genet. 26, 1312–1318 (2018).
Article CAS PubMed PubMed Central Google Scholar
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).
Article CAS PubMed Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 1000, 2078–2079 (2009).
Article CAS Google Scholar
Churchhouse, C & Neale, B. Rapid GWAS of thousands of phenotypes for 337,000 samples in the UK Biobank. https://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank/ (2017).

Download references

Acknowledgements

The authors thank the HipSci, iPSCORE, GENSIPS and PhLiPS cohorts for their input into the study and (early) access to the data. This work was supported by: a strategic award from the Wellcome Trust and UK Medical Research Council (WT098503), the EMBL Interdisciplinary Postdoc (EI3POD) program under Marie Skłodowska-Curie Actions COFUND (grant no. 664726 to M.J.B. and D.S.); National Institutes of Health (NIH; T32 LM012409 to C.S.; T15 LM01127 to D.J.; U01 HL107388-01, P30DK116074 and SPO 130829 to I.C.-O.; HL107442, DK105541, DK112155 to K.A.F.); Stanford Graduate Fellowship (to M.J.G.); National Natural Science Foundation of China (31970554 to X.L.); National Key R&D Program of China (2019YFC1315804 to X.L.); Shanghai Municipal Science and Technology Major Project (2017SHZDZX01 to X.L.); EBI–Sanger Postdoctoral Fellowship (to N.C.); National Science Foundation Graduate Research Fellowship (to N.M.F.); and California Institute for Regenerative Medicine (GC1R-06673 to K.A.F.). Research in the Stegle laboratory is supported by the BMBF, the Volkswagen Foundation and the European Union (ERC project DECODE). S.B.M. is supported by NIH grants U01HG009431, R01HL142015, R01HG008150, R01AG066490 and U01HG009080. Research in the Knowles laboratory is supported by NIH grants DK116750, DK120565, DK106236, DK107437 and HL107388. We thank the staff in the Cellular Genetics and Phenotyping and Sequencing core facilities at the Wellcome Trust Sanger Institute. Work at the Wellcome Trust Sanger Institute was further supported by Wellcome Trust grant WT090851. Several datasets used in this study were sourced from the NextGen Consortium, available from the NCBI database of Genotypes and Phenotypes (dbGaP): iPSCORE (phs000924.v4.p1 and phs001325.v3.p1); PhLiPS (phs001341.v1.p1); and GENESiPS (phs001139.v1.p1). This work, in part, used supercomputing resources provided by the Stanford Genetics Bioinformatics Service Center, supported by NIH S10 instrumentation grant S10OD023452. Research reported in this manuscript was in part supported by the NIH Common Fund, through the Office of Strategic Coordination/Office of the NIH Director under award no. U01HG007708. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Author information

These authors contributed equally: Marc Jan Bonder, Craig Smail.
These authors jointly supervised this work: Stephen B. Montgomery, Oliver Stegle.

Authors and Affiliations

European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK
Marc Jan Bonder, Bogdan Mirauta, Daniel D. Seaton, Na Cai, Danilo Horta, Marc Jan Bonder, Daniel Seaton & Oliver Stegle
European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
Marc Jan Bonder, Marc Jan Bonder & Oliver Stegle
Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
Marc Jan Bonder, Marc Jan Bonder & Oliver Stegle
Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
Craig Smail, Michael J. Gloudemans & Nicole M. Ferraro
Genomic Medicine Center, Children’s Mercy Research Institute and Children’s Mercy Kansas City, Kansas City, MO, USA
Craig Smail
Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
Laure Frésard & Stephen B. Montgomery
Biomedical Sciences Graduate Program, University of California, San Diego, La Jolla, CA, USA
David Jakubosky & David A. Jakubosky
Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, USA
David Jakubosky & David A. Jakubosky
Department of Pediatrics and Rady Children’s Hospital, University of California, San Diego, La Jolla, CA, USA
Matteo D’Antonio & Kelly A. Frazer
CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
Xin Li
Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA, USA
Ivan Carcamo-Orive, Matthew T. Wheeler & Joshua W. Knowles
Wellcome Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK
Na Cai, Helena Kilpinen & Oliver Stegle
Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg, Germany
Na Cai
UCL Great Ormond Street Institute of Child Health, University College London, London, UK
Dara Vakili & Helena Kilpinen
Faculty of Medicine, Imperial College London, London, UK
Dara Vakili
Stanford Center for Undiagnosed Diseases, Stanford University, Stanford, CA, USA
Chunli Zhao, Diane B. Zastrow, Devon E. Bonner & Matthew T. Wheeler
Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland
Helena Kilpinen
Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
Helena Kilpinen
Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, USA
Erin N. Smith & Kelly A. Frazer
Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
Stephen B. Montgomery
Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Christopher D. Brown & YoSon Park

Authors

Marc Jan Bonder
View author publications
You can also search for this author in PubMed Google Scholar
Craig Smail
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Gloudemans
View author publications
You can also search for this author in PubMed Google Scholar
Laure Frésard
View author publications
You can also search for this author in PubMed Google Scholar
David Jakubosky
View author publications
You can also search for this author in PubMed Google Scholar
Matteo D’Antonio
View author publications
You can also search for this author in PubMed Google Scholar
Xin Li
View author publications
You can also search for this author in PubMed Google Scholar
Nicole M. Ferraro
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Carcamo-Orive
View author publications
You can also search for this author in PubMed Google Scholar
Bogdan Mirauta
View author publications
You can also search for this author in PubMed Google Scholar
Daniel D. Seaton
View author publications
You can also search for this author in PubMed Google Scholar
Na Cai
View author publications
You can also search for this author in PubMed Google Scholar
Dara Vakili
View author publications
You can also search for this author in PubMed Google Scholar
Danilo Horta
View author publications
You can also search for this author in PubMed Google Scholar
Chunli Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Diane B. Zastrow
View author publications
You can also search for this author in PubMed Google Scholar
Devon E. Bonner
View author publications
You can also search for this author in PubMed Google Scholar
Matthew T. Wheeler
View author publications
You can also search for this author in PubMed Google Scholar
Helena Kilpinen
View author publications
You can also search for this author in PubMed Google Scholar
Joshua W. Knowles
View author publications
You can also search for this author in PubMed Google Scholar
Erin N. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Kelly A. Frazer
View author publications
You can also search for this author in PubMed Google Scholar
Stephen B. Montgomery
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Stegle
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

HipSci Consortium

Marc Jan Bonder
, Helena Kilpinen
, Daniel Seaton
& Oliver Stegle

iPSCORE consortium

Matteo D’Antonio
, Kelly A. Frazer
, David A. Jakubosky
& Erin N. Smith

Undiagnosed Diseases Network

Chunli Zhao
, Diane B. Zastrow
, Devon E. Bonner
& Matthew T. Wheeler

PhLiPS consortium

Christopher D. Brown
& YoSon Park

Contributions

The main analyses and data preparations were performed by M.J.B. and C.S.; D.J. and M.D. performed SV calling and analysis; N.M.F. performed GTEx (v7) data processing; L.F. and X.L. performed rare-variant annotation and interpretation; H.K. and D.V. annotated and validated the rare-disease variants and genes for the HipSci rare-disease samples; M.J.G. performed the colocalization analysis; M.J.B., D.S., B.M. and D.H. developed the eQTL software; C.Z. generated iPSC lines for Undiagnosed Diseases Network rare-disease samples; N.C., I.C.-O. and Y.P. assisted with data processing and analysis; M.J.B., C.S., M.J.G., S.B.M. and O.S. wrote the manuscript; I.C.-O., N.C., N.M.F., K.A.F., L.F., M.J.G., D.J., M.T.W., D.B.Z. and B.M. assisted in editing the manuscript; M.J.B., C.S., E.S., K.A.F., O.S. and S.B.M. conceived and oversaw the study.

Corresponding authors

Correspondence to Marc Jan Bonder, Craig Smail, Stephen B. Montgomery or Oliver Stegle.

Ethics declarations

Competing interests

S.B.M. is a member of the Scientific Advisory Board of Myome. All other authors declare no competing interests.

Additional information

Peer review information Nature Genetics thanks Gabriel Hoffman and Vijay Sankaran for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Replication and consistency of effect sizes of eQTL discovered in the the original iPSC studies and replicated in i2QTL.

Shown are scatter plots between eQTL effect size estimates in this study (i2QTL, x-axis) versus effect size estimates from previous studies (y-axis). Dots correspond to eQTL discovered in the respective study. Black: eQTL with consistent effect direction. Red: eQTL with discordant effect direction. Replication defined at nominal P < 0.05 in i2QTL. a, Replication of HipSci in i2QTL. 70.4% of the effects are replicated; 98% of the eQTL have concordant effect direction. Differences in the approach for estimating effect sizes result in the observed variation. b, Replication of iPSCORE in i2QTL. 81% of the effects are replicated; 98.7% of the eQTL have concordant effect direction. Notably only SNP eQTL were considered whereas SVs were not considered for replication. c, Replication of GENESiPS in i2QTL; 76.7% of the effects are replicated; 99.5% of the effects have concordant effect direction. d, Replication of PhiLiPS in i2QTL. 76.8% of the effects are replicated; 97.5% of the effects have concordant effect direction.

Extended Data Fig. 2 Identification of multiple independent eQTL for the same gene using stepwise regression.

a, Histogram of the number of independent eQTL effects identified for individual eGenes. Up to 12 independent effects were identified. b, Zoom-in view displaying the number of eGenes with 5 to 8 independent effects. c, Zoom-in view displaying the number eGenes with 8 to 12 independent effects.

Extended Data Fig. 3 Sharing of lead eQTL signals between cell types and studies, considering i2QTL, GTEX and the iPS differentiation study from Cuomo et al.

a, Pairwise sharing of lead eQTL signals in i2QTL (iPSC) and 48 GTEx tissues. Shown is the fraction of shared eQTL signals relative to the total number of common genes and lead eQTL variants in the two respective maps. Shared eQTL signals are defined as eQTL with concordant effect direction and absolute effect size within a factor of two (Methods). b, Distribution of pairwise sharing as in A of iPSCs versus GTEx tissues (blue, N = 48 comparisons) versus pairwise sharing between GTEx tissues (red, N = 2,401 comparisons).c, Pairwise sharing of lead eQTL signals in i2QTL (iPSC) and 48 GTEx tissues as in A, however additionally including single-cell eQTL form Cuomo et al. in iPSCs (iPSCsc), differentiated cell types (mesendo, endoderm). d, Distribution of pairwise sharing as in C, considering iPSCs and differentiated cell types (bulk left, followed by iPSC single cell), mesendoderm (cyan), and endoderm (green) versus GTex tissues (N = 48 comparisons). During differentiation genetic signals in iPSC become more similar to those in GTEx tissues. Data in panels b and d are displayed as violin- and boxplot with the midpoint corresponding to the median, the lower and upper edges of the box to the first and third quartiles and the whiskers corresponding to the IQR ×1.5.

Extended Data Fig. 4 Properties of distal (trans) gene-level eQTL.

a, Dot plot of N = 862 trans-eQTL detected in iPSC (FDR < 10%). Dots correspond to individual trans-eQTL, with color denoting the variant category (blue: cis-eQTL, red: GWAS variant, purple: cis-eQTL and GWAS variant). b, Breakdown of unique trans-eQTL variants (N = 193) across different variant annotations. Darker shaded colors denote trans-eQTL linked to variants that have more than one annotation; lighter shades correspond variants that can be unique assigned to a given variant category. c, Mediation analysis of trans-eQTL, considering variants that are linked to a cis eQTL. The outer pie denotes the annotation of the underlying trans variant: GWAS only variants (n = 7, white), cis eQTL variants (n = 186, dark-blue). The inner pie displays results from the mediation analysis for individual trans variants: n = 7 trans eQTL are exclusively linked to GWAS variants and hence not mediated (white, ‘GWAS only trans-eQTL’); n = 58 trans eQTL variants are not significantly linked to mediation with any RNA trait (‘Non-significant link to any cis-eQTL’); 86 are exclusively linked to gene-level abundance (‘Only linked to gene-level cis-eQTL’), 7 are exclusively linked a RNA trait other than gene-level abundance (‘Linked only to non gene-level cis-eQTL’); 35 are linked to gene-level abundance and at least one additional RNA trait(‘Linked both to gene-level and other level cis-eQTL’).

Extended Data Fig. 5 Analysis of the trans-eQTL hotspot at ELF2.

a, Schematic representation of the genetic loci around ELF2 (left) and NAA15 (right). SNPs are annotated by evidence of cis-eQTL regulation on different traits (blue: transcript-ratio eQTL on ELF2, purple: gene-level eQTL on EFL2, green: splice eQTL on ELF2, brown: the APA eQTL on NAA15). b, LD structure between cis-eQTL variants implicated in the hotspot, annotated by cis-eQTL type as in a. c-f. Lead cis-eQTL effects by RNA trait for ELF2 and NAA15 across SNPs linked to downstream trans-eQTLs(n = 682 samples). Data are represented as a violin- and boxplot with the midpoint corresponding to the median, the lower and upper edges of the box to the first and third quartiles, the whiskers represent the interquartile range ×1.5. C. Splicing cis-eQTL on ELF2. d. Gene level cis-eQTL on ELF2. E. Transcript-ratio cis-eQTL on ELF2. F. Alternative polyadenilation cis-eQTL on NAA15. g, Co-expression network of genes that are controlled by the trans-eQTL linked to the hotspot at ELF2 (N = 37 genes), including ELF2 itself (center). Color of the bounding box around Genes denotes the cis-eQTL variant that drives the corresponding trans effect (colors as in d-g). Genes with multiple trans regulators are depicted with multiple colored rings. h, Replication of trans effects at ELF2 in a single cell differentiation study (Cuomo et al.). Shown are trans-eQTL effect sizes (beta’s) for the 12 ELF2-linked trans targets that show significant replication (defined as P < 0.05 and consistent effect direction) in any of the Cuomo et al tissues. From left to right: eQTL effect size in; the i2QTL study (discovery); in undifferentiated iPSC profiled using scRNA-seq; in mesendoderm profiled using scRNA-seq and in definitive endoderm profiled using scRNA-seq. Significant replication are indicated with a red asterisk.

Extended Data Fig. 6 Enrichment for rare, deleterious SNPs in iPSC and GTEx tissues.

Comparison of enrichments for singleton, highly-deleterious (CADD > 25) SNPs in iPSC versus GTEx v7 tissues analogous to Fig. 2b, however with an additional adjustment for the number of expressed genes. Displayed are enrichments for 10,000 random draws of 50 samples, controlling for the number of expressed genes (genes subset at a fixed number across tissues, N = 500 genes). Strongest enrichment is observed in iPSC. The data are represented as a boxplot where the middle line corresponds to the median, the lower and upper edges of the box corresponding to the first and third quartiles, the whiskers represent the interquartile range (IQR) ×1.5 and beyond the whiskers are outlier points.

Extended Data Fig. 7 Expression level of rare disease genes in iPSC versus GTEx tissues.

a, Distribution of gene expression level of XX rare disease genes (log10(FPKM + 1)) in i2QTL iPSC and 17 GTEx tissues with a median expression level of at least 1 FPKM (red dashed line). Expression in i2QTL highlighted in teal, GTEx tissues in yellow. Disease genes are expressed in iPSCs and only difficult to biopsy tissues in GTEx display higher expression levels. n = 2,952 biologically independent samples. Data are represented as boxplots with the middle line corresponding to the median, the lower and upper edges of the box to the first and third quartiles, the whiskers to the interquartile range (IQR) ×1.5. b, Expression level of genes in different curated gene lists, comparing i2QTL iPSC (left), all GTEx tissues (middle) and GTEx whole blod (right). Shown is the fraction of genes in each category for two expression bins: [0,1) FPKM expression absent or lowly expressed; [1,1e + 12) FPKM gene expressed.

Extended Data Fig. 8 Additional results from the colocalization analysis of eQTL and GWAS traits.

a, Summary of colocalization results for the Coronary artery Disease GWAS (van der Harst et al 2018). 36 out of 93 GWAS loci were identified as colocalized with an eQTL of at least one RNA trait (38.7%). Shown are the number of colocalization events for eQTL of different RNA traits. From left to right: any eQTL type (All), gene-level eQTL (Gene), exon eQTL (Exon), transcript eQTL (Transcript), splicing eQTL (Splicing), APA eQTL (APA). The number of GWAS colocalization that are uniquely linked to a given RNA trait eQTL are displayed using a triangle and the total number of colocalizations per trait is depicted as a circle. b, Colocalization between a gene-level eQTL for POLR1B and a GWAS hit (rs7586668:C > T) for Height. Left: Manhatten plots displaying the local association signal for the eQTL on POLR1B (bottom) and the GWAS signal on Height (top). Right: Scatter plot of negative log P-values for the GWAS signal (x-axis) for BMI versus the POLR1B eQTL signal (y-axis) for the corresponding region.

Extended Data Fig. 9 Enrichment for large-effect outlier-associated rare variants in colocalized genes.

Absolute effect size of GWAS trait associations for iPSC outlier- and non-outlier-associated variants, considering genes with varying degree of evidence for colocalization with common eQTL variants.

Extended Data Fig. 10 Optimization of the number of PEER factors to adjust for confounding expression heterogeneity.

Shown is the number of genes with at least one gene-level eQTL (eGenes) for the top 3,000 highest expressed genes in iPSC for increasing number of PEER factors. PEER factors are adjusted for as additional fixed effect covariates. The vertical red line denotes the number of PEER factors considered in all i2QTL analyses.

Supplementary information

Supplementary Information

Supplementary Results, Methods and Note

Reporting Summary

Supplementary Tables

Supplementary Tables 1–18

Source data

Source Data Fig. 1

Figure data.

Source Data Fig. 2

Figure data.

Source Data Fig. 3

Figure data.

Source Data Fig. 4

Figure data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bonder, M.J., Smail, C., Gloudemans, M.J. et al. Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics. Nat Genet 53, 313–321 (2021). https://doi.org/10.1038/s41588-021-00800-7

Download citation

Received: 04 October 2019
Accepted: 25 January 2021
Published: 04 March 2021
Issue Date: March 2021
DOI: https://doi.org/10.1038/s41588-021-00800-7

This article is cited by

PICALO: principal interaction component analysis for the identification of discrete technical, cell-type, and environmental factors that mediate eQTLs
- Martijn Vochteloo
- Patrick Deelen
- Lude Franke
Genome Biology (2024)
Pitfalls and opportunities for applying latent variables in single-cell eQTL analyses
- Angli Xue
- Seyhan Yazar
- Joseph E. Powell
Genome Biology (2023)
DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits
- Meritxell Oliva
- Kathryn Demanelis
- Brandon L. Pierce
Nature Genetics (2023)
Fine mapping spatiotemporal mechanisms of genetic variants underlying cardiac traits and disease
- Matteo D’Antonio
- Jennifer P. Nguyen
- Kelly A. Frazer
Nature Communications (2023)
Genetic effects on the timing of parturition and links to fetal birth weight
- Pol Solé-Navais
- Christopher Flatley
- Bo Jacobsson
Nature Genetics (2023)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

HipSci Consortium

iPSCORE consortium

Undiagnosed Diseases Network

PhLiPS consortium

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links