Evaluating and improving heritability models using summary statistics

Doug Speed; John Holmes; David J Balding

doi:10.1101/736496

Abstract

There is currently much debate regarding the best way to model how heritability varies across the genome. The authors of GCTA recommend the GCTA-LDMS-I Model, the authors of LD Score Regression recommend the Baseline LD Model, while we have instead recommended the LDAK Model. Here we provide a statistical framework for assessing heritability models using summary statistics from genome-wide association studies. Using data from studies of 31 complex human traits (average sample size 136,000), we show that the Baseline LD Model is the most realistic of the existing heritability models, but that it can be improved by incorporating features from the LDAK Model. Our framework also provides a method for estimating the selection-related parameter α from summary statistics. We find strong evidence (P<1e-6) of negative genome-wide selection for traits including height, systolic blood pressure and college education, and that the impact of selection is stronger inside functional categories such as coding SNPs and promoter regions.

Footnotes

In our first version, we provided a method for estimating the selection-related parameter alpha across the whole genome. In this version, we show it is also possible to obtain regional estimates (using which we provide evidence that selection acts more strongly on functional SNPs).

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.