short-paper

A collective ranking method for genome-wide association studies

Authors:
Jie Liu

Univ. of Wisconsin-Madison

Univ. of Wisconsin-Madison
View Profile

,
Elizabeth Burnside

Univ. of Wisconsin-Madison

Univ. of Wisconsin-Madison
View Profile

,
Humberto Vidaillet

Marshfield Clinic Research Foundation

Marshfield Clinic Research Foundation
View Profile

,
David Page

Univ. of Wisconsin-Madison

Univ. of Wisconsin-Madison
View Profile

BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and BiomedicineOctober 2012Pages 313–320https://doi.org/10.1145/2382936.2382976

Published:07 October 2012Publication History

BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Pages 313–320

ABSTRACT

Genome-wide association studies (GWAS) analyze genetic variation (SNPs) across the entire human genome, searching for SNPs that are associated with certain phenotypes, most often diseases, such as breast cancer. In GWAS, we seek a ranking of SNPs in terms of their relevance to the given phenotype. However, because certain SNPs are known to be highly correlated with one another across individuals, it can be beneficial to take into account these correlations when ranking. If a SNP appears associated with the phenotype, and we question whether this association is real, the extent to which its neighbors (correlated SNPs) also appear associated can be informative. Therefore, we propose CollectRank, a ranking approach which allows SNPs to reinforce one another via the correlation structure. CollectRank is loosely analogous to the well-known PageRank algorithm. We first evaluate CollectRank on synthetic data generated from a variety of genetic models under different settings. The numerical results suggest CollectRank can significantly outperform common GWAS methods at the cost of a small amount of extra computation. We further evaluate CollectRank on two real-world GWAS on breast cancer and atrial fibrillation/flutter, and CollectRank performs well in both studies. We finally provide a theoretical analysis that also suggests CollectRank's advantages.

References

C. Ambroise and G. J. McLachlan. Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences of the United States of America, 99(10):6562--6566, 2002.Google ScholarCross Ref
P. Armitage. Tests for linear trends in proportions and frequencies. BIOMETRICS, 11:375--386, 1955.Google ScholarCross Ref
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International Conference on World Wide Web 7, WWW7, pages 107--117, Amsterdam, The Netherlands, The Netherlands, 1998. Elsevier Science Publishers B. V. Google ScholarDigital Library
C. S. Carlson, M. A. Eberle, M. J. Rieder, Q. Yi, L. Kruglyak, and D. A. Nickerson. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet, 74(1):106--120, January 2004.Google ScholarCross Ref
W. G. Cochran. Some methods for strengthening the common chi-square tests. BIOMETRICS, 10:417--451, 1954.Google ScholarCross Ref
P. I. W. de Bakker, R. Yelensky, I. Pe'er, S. B. Gabriel, M. J. Daly, and D. Altshuler. Efficiency and power in genetic association studies. Nature Genetics, 37(11):1217--1223, November 2005.Google ScholarCross Ref
D. F. Easton, K. A. Pooley, A. M. Dunning, P. D. P. Pharoah, D. Thompson, D. G. Ballinger, J. P. Struewing,..., A. Mannermaa, V.-M. Kosma, V. Kataja, J. Hartikainen, N. E. Day, D. R. Cox, and B. A. J. Ponder. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature, 447:1087--1093, May 2007.Google ScholarCross Ref
E. Eskin. Increasing power in association studies by using linkage disequilibrium structure and molecular function as prior information. Genome Research, 18(4):653--660, 2008.Google ScholarCross Ref
B. Freidlin, G. Zheng, Z. Li, and J. L. Gastwirth. Trend tests for case-control studies of genetic markers: power, sample size and robustness. HUM HERED, 53(3):146--152, 2002.Google ScholarCross Ref
E. Halperin and D. A. Stephan. Maximizing power in association studies. Nature Biotechnology, 27(3):255--256, March 2009.Google ScholarCross Ref
B. Han, H. M. Kang, M. S. Seo, N. Zaitlen, and E. Eskin. Efficient association study design via power-optimized tag SNP selection. Annals of Human Genetics, 72(6):834--847, November 2008.Google ScholarCross Ref
D. J. Hunter, P. Kraft, K. B. Jacobs, D. G. Cox, M. Yeager, S. E. Hankinson, S. Wacholder, Z. Wang, R. Welch, A. Hutchinson, J. Wang, K. Yu, N. Chatterjee, N. Orr, W. C. Willett, G. A. Colditz, R. G. Ziegler, C. D. Berg, S. S. Buys, C. A. Mccarty, H. S. Feigelson, E. E. Calle, M. J. Thun, R. B. Hayes, M. Tucker, D. S. Gerhard, J. F. Fraumeni, R. N. Hoover, G. Thomas, and S. J. Chanock. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nature Genetics, 39(7):870--874, May 2007.Google ScholarCross Ref
K. Kira and L. A. Rendell. A practical approach to feature selection. In Proceedings of the Ninth International Workshop on Machine Learning, pages 249--256, 1992. Google ScholarDigital Library
P. Lichtenstein, N. V. Holm, P. K. Verkasalo, A. Iliadou, J. Kaprio, M. Koskenvuo, E. Pukkala, A. Skytthe, and K. Hemminki. Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med, 343:78--85, 2000.Google ScholarCross Ref
C. McCarty, R. Wilke, P. Giampietro, S. Wesbrook, and M. Caldwell. Marshfield Clinic Personalized Medicine Research Project (PMRP): design, methods and recruitment for a large population-based biobank. PERS MED, 2:49--79, 2005.Google ScholarCross Ref
C. A. McCarty, R. L. Chisholm, C. G. Chute, I. J. Kullo, G. P. Jarvik, E. B. Larson, R. Li, D. R. Masys, M. D. Ritchie, D. M. Roden, J. P. Struewing, W. A. Wolf, and eMERGE Team. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC MED GENET, 4(1):13, 2011.Google Scholar
P. D. P. Pharoah, A. C. Antoniou, D. F. Easton, and B. A. J. Ponder. Polygenes, risk prediction, and targeted prevention of breast cancer. N Engl J Med, 358(26):2796--2803, June 2008.Google ScholarCross Ref
W. Press, B. Flannery, S. Teukolsky, and W. Vetterling. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 1992. Google ScholarDigital Library
J. K. Pritchard and M. Przeworski. Linkage disequilibrium in humans: models and data. Am J Hum Genet, 69(1):1--14, 2001.Google ScholarCross Ref
S. L. Slager and D. J. Schaid. Case-control studies of genetic markers: power and sample size approximations for Armitage's test for trend. HUM HERED, 52(3):149--153, 2001.Google ScholarCross Ref
Z. Su, J. Marchini, and P. Donnelly. HAPGEN2: simulation of multiple disease SNPs. BIOINFORMATICS, 2011. Google ScholarDigital Library
The International HapMap Consortium. The international HapMap project. Nature, 426:789--796, 2003.Google ScholarCross Ref
M. Waddell, D. Page, F. Zhan, B. Barlogie, and J. Shaughnessy, Jr. Predicting cancer susceptibility from single-nucleotide polymorphism data: A case study in multiple myeloma. In Proceedings of BIOKDD '05, Chicago, Illinois, August 2005, Aug 2005. Google ScholarDigital Library
Z. Wei, K. Wang, H.-Q. Qu, H. Zhang, J. Bradfield, C. Kim, E. Frackleton, C. Hou, J. T. Glessner, R. Chiavacci, C. Stanley, D. Monos, S. F. A. Grant, C. Polychronakos, and H. Hakonarson. From disease association to risk assessment: An optimistic view from genome-wide association studies on type 1 diabetes. PLoS Genetics, 5:e1000678, 2009.Google ScholarCross Ref
M. C. Wu, P. Kraft, M. P. Epstein, D. M. Taylor, S. J. Chanock, D. J. Hunter, and X. Lin. Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet, 86(6):929--942, June 2010.Google ScholarCross Ref
T. T. Wu, Y. F. Chen, T. Hastie, E. M. Sobel, and K. Lange. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics, 25(6):714--721, 2009. Google ScholarDigital Library
L. Yu and H. Liu. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5:1205--1224, 2004. Google ScholarDigital Library
N. Zaitlen, H. M. Kang, E. Eskin, and E. Halperin. Leveraging the hapmap correlation structure in association studies. Am J Hum Genet, 80(4):683--691, April 2007.Google ScholarCross Ref

Index Terms

A collective ranking method for genome-wide association studies
1. Applied computing
  1. Life and medical sciences
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction
  2. Information systems applications
    1. Decision support systems
      1. Expert systems

Recommendations

Efficient design and analysis of genome-wide association studies
Read More
Privacy-preserving data exploration in genome-wide association studies
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Genome-wide association studies (GWAS) have become a popular method for analyzing sets of DNA sequences in order to discover the genetic basis of disease. Unfortunately, statistics published as the result of GWAS can be used to identify individuals ...
Read More
High-throughput analysis of epistasis in genome-wide association studies with BiForce

Motivation: Gene–gene interactions (epistasis) are thought to be important in shaping complex traits, but they have been under-explored in genome-wide association studies (GWAS) due to the computational challenge of enumerating billions of single ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
October 2012
725 pages
ISBN:9781450316705
DOI:10.1145/2382936
General Chair:
Sanjay Ranka
University of Florida
,
Program Chairs:
Tamer Kahveci
University of Florida
,
Mona Singh
Princeton University
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 7 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
atrial fibrillation/flutter
breast cancer
genome-wide association studies
ranking
Qualifiers
- short-paper
Conference

Acceptance Rates
BCB '12 Paper Acceptance Rate33of159submissions,21%Overall Acceptance Rate254of885submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 120
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A collective ranking method for genome-wide association studies

BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient design and analysis of genome-wide association studies

Privacy-preserving data exploration in genome-wide association studies

High-throughput analysis of epistasis in genome-wide association studies with BiForce

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A collective ranking method for genome-wide association studies

BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine

ABSTRACT

References

Cited By

Index Terms

Recommendations

Efficient design and analysis of genome-wide association studies

Privacy-preserving data exploration in genome-wide association studies

High-throughput analysis of epistasis in genome-wide association studies with BiForce

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media