Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Large-scale integration of the plasma proteome with genetics and disease

Abstract

The plasma proteome can help bridge the gap between the genome and diseases. Here we describe genome-wide association studies (GWASs) of plasma protein levels measured with 4,907 aptamers in 35,559 Icelanders. We found 18,084 associations between sequence variants and levels of proteins in plasma (protein quantitative trait loci; pQTL), of which 19% were with rare variants (minor allele frequency (MAF) < 1%). We tested plasma protein levels for association with 373 diseases and other traits and identified 257,490 associations. We integrated pQTL and genetic associations with diseases and other traits and found that 12% of 45,334 lead associations in the GWAS Catalog are with variants in high linkage disequilibrium with pQTL. We identified 938 genes encoding potential drug targets with variants that influence levels of possible biomarkers. Combining proteomics, genomics and transcriptomics, we provide a valuable resource that can be used to improve understanding of disease pathogenesis and to assist with drug discovery and development.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Study design and main results.
Fig. 2: The concepts of pQTL, eQTL and colocalization with disease-associated variants.
Fig. 3: Overview of pQTL associations with plasma protein levels.
Fig. 4: Cis pQTL, cis eQTL and PAVs.
Fig. 5: Integration of pQTL with skin and CRC.
Fig. 6: Integration of pQTL with AD and psoriasis.
Fig. 7: Annotation of cis pQTL artifact status.

Similar content being viewed by others

Data availability

GWAS summary statistics for all 4,907 aptamers are available at https://www.decode.com/summarydata/. Sequence variants passing GATK filters that support the findings of this study have been deposited in the European Variation Archive under accession number PRJEB15197. Other data presented in this study are included in this publication (and its Supplementary information). As we have been in the past, we are open to collaboration on this topic. The UK Biobank Resource was used under application number 56270. FinnGen and BioBank Japan data are publicly available and were downloaded from https://www.finngen.fi/ and http://jenger.riken.jp/en/, respectively.

URLs for other external data used are as follows: the GWAS Catalog (https://www.ebi.ac.uk/gwas/), the STRING database (https://string-db.org/, in the file 9606.protein.actions.v11.txt.gz), the GTEx project (https://gtexportal.org/home/), the Human Protein Atlas (https://www.proteinatlas.org/).

Code availability

We used publicly available software (URLs are listed below) in conjunction with the above described algorithms in the sequence-processing pipeline (whole-genome sequencing, association testing and RNA sequencing mapping and analysis): BWA-MEM version 0.7.10, https://github.com/lh3/bwa; GenomeAnalysisTKLite version 2.3.9, https://github.com/broadgsa/gatk/; Picard tools version 1.117, https://broadinstitute.github.io/picard/; SAMtools version 1.3, http://samtools.github.io/; bedtools version 2.25.0-76-g5e7c696z, https://github.com/arq5x/bedtools2/; Variant Effect Predictor (release 100), https://github.com/Ensembl/ensembl-vep; BOLT-LMM version 2.1, https://data.broadinstitute.org/alkesgroup/BOLT-LMM/downloads/; IMPUTE2 version 2.3.1, https://mathgen.stats.ox.ac.uk/impute/impute_v2.html; dbSNP version 140, http://www.ncbi.nlm.nih.gov/SNP/; BiNGO version 3.0.3, https://www.psb.ugent.be/cbd/papers/BiNGO/Download.html; Cytoscape version 3.7.1, https://cytoscape.org/download.html. We used R (version 3.6.0) extensively to analyze data and create plots. Figs. 1 and 2 were created using https://biorender.com/.

References

  1. Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).

    Article  CAS  PubMed  Google Scholar 

  2. Loos, R. J. F. 15 years of genome-wide association studies and no signs of slowing down. Nat. Commun. 11, 5900 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Suhre, K. et al. Connecting genetic risk to disease end points through the human blood plasma proteome. Nat. Commun. 8, 14357 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Sun, B. B. et al. Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Emilsson, V. et al. Co-regulatory networks of human serum proteins link genetics to disease. Science 361, 769–773 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Folkersen, L. et al. Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease. PLoS Genet. 13, e1006706 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Yao, C. et al. Genome-wide mapping of plasma protein QTLs identifies putatively causal genes and pathways for cardiovascular disease. Nat. Commun. 9, 3268 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Folkersen, L. et al. Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat. Metab. 2, 1135–1148 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Zheng, J. et al. Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. Nat. Genet. 52, 1122–1131 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Suhre, K., McCarthy, M. I. & Schwenk, J. M. Genetics meets proteomics: perspectives for large population-based studies. Nat. Rev. Genet. 22, 19–37 (2020).

  11. Rohloff, J. C. et al. Nucleic acid ligands with protein-like side chains: modified aptamers and their use as diagnostic and therapeutic agents. Mol. Ther. Nucleic Acids 3, e201 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Lundberg, M., Eriksson, A., Tran, B., Assarsson, E. & Fredriksson, S. Homogeneous antibody-based proximity extension assays provide sensitive and specific detection of low-abundant proteins in human blood. Nucleic Acids Res. 39, e102 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Nioi, P. et al. Variant ASGR1 associated with a reduced risk of coronary artery disease. N. Engl. J. Med. 374, 2131–2141 (2016).

    Article  CAS  PubMed  Google Scholar 

  14. Gudbjartsson, D. F. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015).

    Article  CAS  PubMed  Google Scholar 

  15. Klarin, D. et al. Genome-wide association analysis of venous thromboembolism identifies new risk loci and genetic overlap with arterial vascular disease. Nat. Genet. 51, 1574–1579 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Sennblad, B. et al. Genome-wide association study with additional genetic and post-transcriptional analyses reveals novel regulators of plasma factor XI levels. Hum. Mol. Genet. 26, 637–649 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Law, P. J. et al. Association analyses identify 31 new risk loci for colorectal cancer susceptibility. Nat. Commun. 10, 2154 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Howe, J. R. et al. Germline mutations of the gene encoding bone morphogenetic protein receptor 1A in juvenile polyposis. Nat. Genet. 28, 184–187 (2001).

    Article  CAS  PubMed  Google Scholar 

  19. Miller, M. W. et al. Cloning of the mouse agouti gene predicts a secreted protein ubiquitously expressed in mice carrying the lethal yellow mutation. Genes Dev. 7, 454–467 (1993).

    Article  CAS  PubMed  Google Scholar 

  20. Rieder, S., Taourit, S., Mariat, D., Langlois, B. & Guérin, G. Mutations in the agouti (ASIP), the extension (MC1R), and the brown (TYRP1) loci and their association to coat color phenotypes in horses (Equus caballus). Mamm. Genome 12, 450–455 (2001).

    Article  CAS  PubMed  Google Scholar 

  21. Surendran, P. et al. Trans-ancestry meta-analyses identify rare and common variants associated with blood pressure and hypertension. Nat. Genet. 48, 1151–1161 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Surendran, P. et al. Discovery of rare variants associated with blood pressure regulation through meta-analysis of 1.3 million individuals. Nat. Genet. 52, 1314–1332 (2020).

  23. Szklarczyk, D. et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).

    Article  CAS  PubMed  Google Scholar 

  24. Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Ragimbeau, J. et al. The tyrosine kinase Tyk2 controls IFNAR1 cell surface expression. EMBO J. 22, 537–547 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Spracklen, C. N. et al. Identification of type 2 diabetes loci in 433,540 East Asian individuals. Nature 582, 240–245 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. The Haplotype Reference Consortium. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).

    Article  Google Scholar 

  28. Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

    Article  PubMed  Google Scholar 

  29. Fagerberg, L. et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397–406 (2014).

    Article  CAS  PubMed  Google Scholar 

  30. Joshi, A. D. et al. Four susceptibility loci for gallstone disease identified in a meta-analysis of genome-wide association studies. Gastroenterology 151, 351–363 (2016).

    Article  CAS  PubMed  Google Scholar 

  31. Ferkingstad, E. et al. Genome-wide association meta-analysis yields 20 loci associated with gallstone disease. Nat. Commun. 9, 5101 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Admirand, W. H. & Small, D. M. The physicochemical basis of cholesterol gallstone formation in man. J. Clin. Invest. 47, 1043–1052 (1968).

    Article  PubMed Central  Google Scholar 

  33. Memon, N. et al. Developmental regulation of the gut–liver (FGF19–CYP7A1) axis in neonates. J. Matern. Fetal Neonatal Med. 33, 987–992 (2020).

    Article  CAS  PubMed  Google Scholar 

  34. Holzer, P. & Farzi, A. Neuropeptides and the microbiota–gut–brain axis. Adv. Exp. Med. Biol. 817, 195–219 (2014).

  35. Jonsson, T. et al. Variant of TREM2 associated with the risk of Alzheimer’s disease. N. Engl. J. Med. 368, 107–116 (2013).

    Article  CAS  PubMed  Google Scholar 

  36. Deming, Y. et al. The MS4A gene cluster is a key modulator of soluble TREM2 and Alzheimer’s disease risk. Sci. Transl. Med. 11, eaau2291 (2019).

  37. Schröder, J. M. & Harder, J. Human β-defensin-2. Int. J. Biochem. Cell Biol. 31, 645–651 (1999).

    Article  PubMed  Google Scholar 

  38. Jin, T. et al. Serum human β-defensin-2 is a possible biomarker for monitoring response to JAK inhibitor in psoriasis patients. Dermatology 233, 164–169 (2017).

    Article  CAS  PubMed  Google Scholar 

  39. Tsoi, L. C. et al. Identification of 15 new psoriasis susceptibility loci highlights the role of innate immunity. Nat. Genet. 44, 1341–1348 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Wang, Y. et al. Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics. Nucleic Acids Res. 48, D1031–D1041 (2019).

  41. Samson, M. et al. Resistance to HIV-1 infection in Caucasian individuals bearing mutant alleles of the CCR-5 chemokine receptor gene. Nature 382, 722–725 (1996).

    Article  CAS  PubMed  Google Scholar 

  42. Kim, M. B. et al. CCR5 receptor antagonists in preclinical to phase II clinical development for treatment of HIV. Expert Opin. Investig. Drugs 25, 1377–1392 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Parman, Y. et al. Sixty years of transthyretin familial amyloid polyneuropathy (TTR-FAP) in Europe: where are we now? A European network approach to defining the epidemiology and management patterns for TTR-FAP. Curr. Opin. Neurol. 29, S3–S13 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Hammarström, P., Schneider, F. & Kelly, J. W. Trans-suppression of misfolding in an amyloid disease. Science 293, 2459–2462 (2001).

    Article  PubMed  Google Scholar 

  45. Magrinelli, F. et al. Pharmacological treatment for familial amyloid polyneuropathy. Cochrane Database Syst. Rev. 4, CD012395 (2020).

    PubMed  Google Scholar 

  46. Pietzner, M. et al. Genetic architecture of host proteins involved in SARS-CoV-2 infection. Nat. Commun. 11, 6397 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Rafnar, T. et al. The Icelandic Cancer Project—a population-wide approach to studying cancer. Nat. Rev. Cancer 4, 488–492 (2004).

    Article  CAS  PubMed  Google Scholar 

  48. Saevarsdottir, S. et al. FLT3 stop mutation increases FLT3 ligand level and risk of autoimmune thyroid disease. Nature 584, 619–623 (2020).

    Article  CAS  PubMed  Google Scholar 

  49. Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

    Article  CAS  PubMed  Google Scholar 

  53. Galili, T., O’Callaghan, A., Sidi, J. & Sievert, C. heatmaply: an R package for creating interactive cluster heatmaps for online publishing. Bioinformatics 34, 1600–1602 (2018).

    Article  CAS  PubMed  Google Scholar 

  54. Yavorska, O. O. & Burgess, S. MendelianRandomization: an R package for performing Mendelian randomization analyses using summarized data. Int. J. Epidemiol. 46, 1734–1739 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 37, 658–665 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Tsoi, L. C. et al. Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants. Nat. Commun. 8, 15382 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors received no specific funding for this work.

Author information

Authors and Affiliations

Authors

Contributions

E.F., P.S., B.A.A., G.S., M.I.M., F.Z., S.A.G., T.E., M.O.U., G.L.N., S.H.L., D.F.G., U.T. and K.S. designed the study and interpreted results. M.K.M., S.S., S.N.S., T.A.O., V.S., V.T., H.S., I.J., H.H., T.R., J.S., U.T. and K.S. carried out participant ascertainment and recruitment. E.F., P.S., B.A.A., G.S., M.I.M., E.L.S., A.O., B.V.H., B.O.J., F.Z., G.H.H., G.M., G.A.A., H.K., K.J., R.F., S.A.G., S.R., M.O.U., P.M., S.H.L., D.F.G. and U.T. performed sequencing, genotyping, expression, proteomic, statistical and/or bioinformatic analyses. K.G., O.T.M. and J.S. planned and performed the functional laboratory work. E.F., P.S., B.A.A., G.S., A.H., A.O., G.L.N., S.H.L., D.F.G., U.T. and K.S. drafted the manuscript. All authors contributed to the final version of the paper.

Corresponding authors

Correspondence to Patrick Sulem or Kari Stefansson.

Ethics declarations

Competing interests

All authors are affiliated with deCODE genetics/Amgen Inc. and declare competing interests as employees.

Additional information

Peer review information Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Secondary cis and trans pQTL associations.

a) Number of secondary cis associations after conditional analysis for each of the 1,881 sentinel cis pQTL associations for a given protein, b) Number of secondary trans associations after conditional analysis for each of the 16,203 sentinel trans pQTL associations for a given protein.

Extended Data Fig. 2 Effects of cis pQTL in Olink vs SomaScan.

Cis pQTL effects (in units of standard deviations) reported in the SCALLOP study (using Olink cardiovascular 1 panel) vs effects in present study (using SomaScan).

Extended Data Fig. 3 Correlations between SomaScan and Olink protein measurements.

Histogram of correlations between SomaScan and Olink protein measurements for 87 proteins measured using both SomaScan and Olink (cardivascular 1 panel).

Extended Data Fig. 4 Effects of sentinel cis pQTL subdivided by presence or absence of PAV.

Effects (absolute value, SD) of sentinel cis pQTL, subdivided by presence (n = 658) or absence (n = 1,223) of protein altering variants in high linkage disequilibrium (LD; r2 > 0.80). The boxplots show the median and lower and upper quartiles; whiskers extend to 1.5 times inter-quartile range; points beyond whiskers are plotted individually.

Extended Data Fig. 5 Heatmap based on squared correlation of effects proteins for non-specific loci.

Heatmap and non-supervised hierarchical clustering based on squared correlation (r2) of effects on all 4,907 proteins for all pairs of non-specific loci. The plot shows only the main clusters discussed in the text; detailed results are in Supplementary table 10.

Extended Data Fig. 6 Association of rs6471717-C at CYP7A1 with gallstone risk and FGF19 levels.

Diagram illustrating the association of rs6471717-C at CYP7A1 with gallstone risk and FGF19 levels. Blue arrows show reported relationships, black arrows associations seen in the present study, while the green arrow shows the inference we draw from the analysis. Plus and minus signs indicate the directions of effects.

Extended Data Fig. 7 Effect of TREM2 missense variants on TREM2 measurements by SomaScan and ELISA.

Estimate of the specificity of the sTREM2 levels in SomaScan. TREM2 levels in plasma measured by SomaScan versus TREM2 measured by ELISA on the same sample, coloured by genotype of the TREM2 R47H missense variant, rs75932628, or by genotype of the TREM2 R62H missense variant, rs143332484 (there were no compound heterozygotes). The six individuals that are homozygous for either Arg47His or Arg62His have extremely low levels of TREM2 measured by SomaScan compared to non-carriers inconsistent with the measurements by ELISA. Heterozygotes for either variant also have lower levels that non-carriers when measured by SomaScan but not when measured by ELISA. We conclude that the observed association of Arg47His allele with TREM2 levels based on SomaScan is an artefact due to altered conformation of TREM2 mediated by the coding change leading to a reduced affinity of the SomaScan aptamer targeting TREM2.

Extended Data Fig. 8 MS4A gene cluster locus and associations with Alzheimer’s disease, TREM2 protein levels and RNA expression of MS4A cluster genes.

MS4A gene cluster locus and associations with Alzheimer’s disease, TREM2 plasma protein levels, MS4A4A blood RNA epression, MS4A6A blood RNA expression, and MS4A2 blood RNA expression.

Extended Data Fig. 9 Bidirectional Mendelian Randomization analyses for Alzheimer’s disease and TREM2 level.

Mendelian Randomization (MR) analysis using a) TREM2 levels as exposure and Alzheimer’s disease risk as outcome and b) Alzheimer’s disease risk as exposure and TREM2 plasma protein levels as outcome. Blue lines are estimated MR-IVW effects, the areas between the blue dashed lines show 95% confidence regions for the MR effects.

Supplementary information

Supplementary Information

Supplementary Note and Figs. 1–4

Reporting Summary

Peer Review Information

Supplementary Tables

Supplementary Tables 1–19.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ferkingstad, E., Sulem, P., Atlason, B.A. et al. Large-scale integration of the plasma proteome with genetics and disease. Nat Genet 53, 1712–1721 (2021). https://doi.org/10.1038/s41588-021-00978-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-021-00978-w

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing