Cell Genomics
Volume 3, Issue 1, 11 January 2023, 100241
Journal home page for Cell Genomics

Article
Global Biobank analyses provide lessons for developing polygenic risk scores across diverse cohorts

https://doi.org/10.1016/j.xgen.2022.100241Get rights and content
Under a Creative Commons license
open access

Highlights

  • PRS accuracy is heterogeneous across disease endpoints, ancestries, and biobanks

  • Larger sample sizes and greater diversity of GBMI improves PRS accuracy

  • Lessons and guidelines for developing PRS with multi-ancestry GWASs are provided

Summary

Polygenic risk scores (PRSs) have been widely explored in precision medicine. However, few studies have thoroughly investigated their best practices in global populations across different diseases. We here utilized data from Global Biobank Meta-analysis Initiative (GBMI) to explore methodological considerations and PRS performance in 9 different biobanks for 14 disease endpoints. Specifically, we constructed PRSs using pruning and thresholding (P + T) and PRS-continuous shrinkage (CS). For both methods, using a European-based linkage disequilibrium (LD) reference panel resulted in comparable or higher prediction accuracy compared with several other non-European-based panels. PRS-CS overall outperformed the classic P + T method, especially for endpoints with higher SNP-based heritability. Notably, prediction accuracy is heterogeneous across endpoints, biobanks, and ancestries, especially for asthma, which has known variation in disease prevalence across populations. Overall, we provide lessons for PRS construction, evaluation, and interpretation using GBMI resources and highlight the importance of best practices for PRS in the biobank-scale genomics era.

Keywords

Global-Biobank Meta-analysis Initiative
polygenic risk scores
multi-ancestry genetic prediction
accuracy heterogeneity

Data and code availability

The all-biobank and ancestry-specific GWAS summary statistics are publicly available for downloading at https://www.globalbiobankmeta.org/resources and browsed at the PheWeb Browser http://results.globalbiobankmeta.org/. The PRS weights re-estimated using PRC-CS-auto for multi-ancestry GWAS including all biobanks and leave-UKBB-out multi-ancestry GWAS have been uploaded to PGS Catalog (https://www.pgscatalog.org/) under the study ID PGP000262. 1000 Genome Phase 3 data can be accessed at NCBI FTP site: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp. We used UKB data via application 31063. The software used in this study can be found at: Plink (https://www.cog-genomics.org/plink/), PRS-CS (https://github.com/getian107/PRScs), and SBayesS/GCTB (https://cnsgenomics.com/software/gctb/). The codes used in this study have been deposited to Zenodo: https://doi.org/10.5281/zenodo.7321467. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Cited by (0)

29

These authors contributed equally

30

Lead contact