Open Access
June 2015 Wavelet-based genetic association analysis of functional phenotypes arising from high-throughput sequencing assays
Heejung Shim, Matthew Stephens
Ann. Appl. Stat. 9(2): 665-686 (June 2015). DOI: 10.1214/14-AOAS776

Abstract

Understanding how genetic variants influence cellular-level processes is an important step toward understanding how they influence important organismal-level traits, or “phenotypes,” including human disease susceptibility. To this end, scientists are undertaking large-scale genetic association studies that aim to identify genetic variants associated with molecular and cellular phenotypes, such as gene expression, transcription factor binding, or chromatin accessibility. These studies use high-throughput sequencing assays (e.g., RNA-seq, ChIP-seq, DNase-seq) to obtain high-resolution data on how the traits vary along the genome in each sample. However, typical association analyses fail to exploit these high-resolution measurements, instead aggregating the data at coarser resolutions, such as genes, or windows of fixed length. Here we develop and apply statistical methods that better exploit the high-resolution data. The key idea is to treat the sequence data as measuring an underlying “function” that varies along the genome, and then, building on wavelet-based methods for functional data analysis, test for association between genetic variants and the underlying function. Applying these methods to identify genetic variants associated with chromatin accessibility (dsQTLs), we find that they identify substantially more associations than a simpler window-based analysis, and in total we identify 772 novel dsQTLs not identified by the original analysis.

Citation

Download Citation

Heejung Shim. Matthew Stephens. "Wavelet-based genetic association analysis of functional phenotypes arising from high-throughput sequencing assays." Ann. Appl. Stat. 9 (2) 665 - 686, June 2015. https://doi.org/10.1214/14-AOAS776

Information

Received: 1 July 2013; Revised: 1 June 2014; Published: June 2015
First available in Project Euclid: 20 July 2015

zbMATH: 06499925
MathSciNet: MR3371330
Digital Object Identifier: 10.1214/14-AOAS776

Keywords: Bayesian inference , ChIP-seq , chromatin accessibility , DNase-seq , functional data , genetic association analysis , hierarchical model , high-throughput sequencing assays , RNA-Seq , Wavelets

Rights: Copyright © 2015 Institute of Mathematical Statistics

Vol.9 • No. 2 • June 2015
Back to Top