skip to main content
10.1145/2382936.2382976acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

A collective ranking method for genome-wide association studies

Published:07 October 2012Publication History

ABSTRACT

Genome-wide association studies (GWAS) analyze genetic variation (SNPs) across the entire human genome, searching for SNPs that are associated with certain phenotypes, most often diseases, such as breast cancer. In GWAS, we seek a ranking of SNPs in terms of their relevance to the given phenotype. However, because certain SNPs are known to be highly correlated with one another across individuals, it can be beneficial to take into account these correlations when ranking. If a SNP appears associated with the phenotype, and we question whether this association is real, the extent to which its neighbors (correlated SNPs) also appear associated can be informative. Therefore, we propose CollectRank, a ranking approach which allows SNPs to reinforce one another via the correlation structure. CollectRank is loosely analogous to the well-known PageRank algorithm. We first evaluate CollectRank on synthetic data generated from a variety of genetic models under different settings. The numerical results suggest CollectRank can significantly outperform common GWAS methods at the cost of a small amount of extra computation. We further evaluate CollectRank on two real-world GWAS on breast cancer and atrial fibrillation/flutter, and CollectRank performs well in both studies. We finally provide a theoretical analysis that also suggests CollectRank's advantages.

References

  1. C. Ambroise and G. J. McLachlan. Selection bias in gene extraction on the basis of microarray gene-expression data. Proceedings of the National Academy of Sciences of the United States of America, 99(10):6562--6566, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  2. P. Armitage. Tests for linear trends in proportions and frequencies. BIOMETRICS, 11:375--386, 1955.Google ScholarGoogle ScholarCross RefCross Ref
  3. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International Conference on World Wide Web 7, WWW7, pages 107--117, Amsterdam, The Netherlands, The Netherlands, 1998. Elsevier Science Publishers B. V. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. S. Carlson, M. A. Eberle, M. J. Rieder, Q. Yi, L. Kruglyak, and D. A. Nickerson. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet, 74(1):106--120, January 2004.Google ScholarGoogle ScholarCross RefCross Ref
  5. W. G. Cochran. Some methods for strengthening the common chi-square tests. BIOMETRICS, 10:417--451, 1954.Google ScholarGoogle ScholarCross RefCross Ref
  6. P. I. W. de Bakker, R. Yelensky, I. Pe'er, S. B. Gabriel, M. J. Daly, and D. Altshuler. Efficiency and power in genetic association studies. Nature Genetics, 37(11):1217--1223, November 2005.Google ScholarGoogle ScholarCross RefCross Ref
  7. D. F. Easton, K. A. Pooley, A. M. Dunning, P. D. P. Pharoah, D. Thompson, D. G. Ballinger, J. P. Struewing,..., A. Mannermaa, V.-M. Kosma, V. Kataja, J. Hartikainen, N. E. Day, D. R. Cox, and B. A. J. Ponder. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature, 447:1087--1093, May 2007.Google ScholarGoogle ScholarCross RefCross Ref
  8. E. Eskin. Increasing power in association studies by using linkage disequilibrium structure and molecular function as prior information. Genome Research, 18(4):653--660, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  9. B. Freidlin, G. Zheng, Z. Li, and J. L. Gastwirth. Trend tests for case-control studies of genetic markers: power, sample size and robustness. HUM HERED, 53(3):146--152, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  10. E. Halperin and D. A. Stephan. Maximizing power in association studies. Nature Biotechnology, 27(3):255--256, March 2009.Google ScholarGoogle ScholarCross RefCross Ref
  11. B. Han, H. M. Kang, M. S. Seo, N. Zaitlen, and E. Eskin. Efficient association study design via power-optimized tag SNP selection. Annals of Human Genetics, 72(6):834--847, November 2008.Google ScholarGoogle ScholarCross RefCross Ref
  12. D. J. Hunter, P. Kraft, K. B. Jacobs, D. G. Cox, M. Yeager, S. E. Hankinson, S. Wacholder, Z. Wang, R. Welch, A. Hutchinson, J. Wang, K. Yu, N. Chatterjee, N. Orr, W. C. Willett, G. A. Colditz, R. G. Ziegler, C. D. Berg, S. S. Buys, C. A. Mccarty, H. S. Feigelson, E. E. Calle, M. J. Thun, R. B. Hayes, M. Tucker, D. S. Gerhard, J. F. Fraumeni, R. N. Hoover, G. Thomas, and S. J. Chanock. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nature Genetics, 39(7):870--874, May 2007.Google ScholarGoogle ScholarCross RefCross Ref
  13. K. Kira and L. A. Rendell. A practical approach to feature selection. In Proceedings of the Ninth International Workshop on Machine Learning, pages 249--256, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Lichtenstein, N. V. Holm, P. K. Verkasalo, A. Iliadou, J. Kaprio, M. Koskenvuo, E. Pukkala, A. Skytthe, and K. Hemminki. Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med, 343:78--85, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  15. C. McCarty, R. Wilke, P. Giampietro, S. Wesbrook, and M. Caldwell. Marshfield Clinic Personalized Medicine Research Project (PMRP): design, methods and recruitment for a large population-based biobank. PERS MED, 2:49--79, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  16. C. A. McCarty, R. L. Chisholm, C. G. Chute, I. J. Kullo, G. P. Jarvik, E. B. Larson, R. Li, D. R. Masys, M. D. Ritchie, D. M. Roden, J. P. Struewing, W. A. Wolf, and eMERGE Team. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC MED GENET, 4(1):13, 2011.Google ScholarGoogle Scholar
  17. P. D. P. Pharoah, A. C. Antoniou, D. F. Easton, and B. A. J. Ponder. Polygenes, risk prediction, and targeted prevention of breast cancer. N Engl J Med, 358(26):2796--2803, June 2008.Google ScholarGoogle ScholarCross RefCross Ref
  18. W. Press, B. Flannery, S. Teukolsky, and W. Vetterling. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. K. Pritchard and M. Przeworski. Linkage disequilibrium in humans: models and data. Am J Hum Genet, 69(1):1--14, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. L. Slager and D. J. Schaid. Case-control studies of genetic markers: power and sample size approximations for Armitage's test for trend. HUM HERED, 52(3):149--153, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  21. Z. Su, J. Marchini, and P. Donnelly. HAPGEN2: simulation of multiple disease SNPs. BIOINFORMATICS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. The International HapMap Consortium. The international HapMap project. Nature, 426:789--796, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  23. M. Waddell, D. Page, F. Zhan, B. Barlogie, and J. Shaughnessy, Jr. Predicting cancer susceptibility from single-nucleotide polymorphism data: A case study in multiple myeloma. In Proceedings of BIOKDD '05, Chicago, Illinois, August 2005, Aug 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Z. Wei, K. Wang, H.-Q. Qu, H. Zhang, J. Bradfield, C. Kim, E. Frackleton, C. Hou, J. T. Glessner, R. Chiavacci, C. Stanley, D. Monos, S. F. A. Grant, C. Polychronakos, and H. Hakonarson. From disease association to risk assessment: An optimistic view from genome-wide association studies on type 1 diabetes. PLoS Genetics, 5:e1000678, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  25. M. C. Wu, P. Kraft, M. P. Epstein, D. M. Taylor, S. J. Chanock, D. J. Hunter, and X. Lin. Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet, 86(6):929--942, June 2010.Google ScholarGoogle ScholarCross RefCross Ref
  26. T. T. Wu, Y. F. Chen, T. Hastie, E. M. Sobel, and K. Lange. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics, 25(6):714--721, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Yu and H. Liu. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5:1205--1224, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Zaitlen, H. M. Kang, E. Eskin, and E. Halperin. Leveraging the hapmap correlation structure in association studies. Am J Hum Genet, 80(4):683--691, April 2007.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A collective ranking method for genome-wide association studies

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
                October 2012
                725 pages
                ISBN:9781450316705
                DOI:10.1145/2382936

                Copyright © 2012 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 7 October 2012

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • short-paper

                Acceptance Rates

                BCB '12 Paper Acceptance Rate33of159submissions,21%Overall Acceptance Rate254of885submissions,29%
              • Article Metrics

                • Downloads (Last 12 months)4
                • Downloads (Last 6 weeks)0

                Other Metrics

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader