ABSTRACT
Pharmacogenomics studies the impact of genetic variation of patients on drug responses and searches for correlations between gene expression or Single Nucleotide Polymorphisms (SNPs) of patient's genome and the toxicity or efficacy of a drug. SNPs data, produced by microarray platforms, need to be preprocessed and analyzed in order to find correlation between the presence/absence of SNPs and the toxicity or efficacy of a drug. Due to the large number of samples and the high resolution of instruments, the data to be analyzed can be very huge, requiring high performance computing. The paper presents the design and experimentation of Cloud4SNP, a novel Cloud-based bioinformatics tool for the parallel preprocessing and statistical analysis of pharmacogenomics SNP microarray data. Experimental evaluation shows good speed-up and scalability. Moreover, the availability on the Cloud platform allows to face in an elastic way the requirements of small as well as very large pharmacogenomics studies.
- G. Barton, J. Abbott, N. Chiba, D. Huang, Y. Huang, M. Krznaric, J. Mack-Smith, A. Saleem, B. Sherman, B. Tiwari, C. Tomlinson, T. Aitman, J. Darlington, L. Game, M. Sternberg, and S. Butcher. Emaas: An extensible grid-based rich internet application for microarray data analysis and management. BMC Bioinformatics, 9(1):493, 2008.Google ScholarCross Ref
- J. K. Burmester, M. Sedova, M. H. Shapero, and E. Mansfield. Dmet microarray technology for pharmacogenomics-based personalized medicine. Microarray Methods for Drug Discovery, Methods in Molecular Biology, 632:99--124, 2010.Google ScholarCross Ref
- M. Cannataro, P. H. Guzzi, and P. Veltri. Protein-to-protein interactions: Technologies, databases, and algorithms. ACM Comput. Surv., 43(1):1:1--1:36, 2010. Google ScholarDigital Library
- E. Cesario, M. Lackovic, D. Talia, and P. Trunfio. Programming knowledge discovery workflows in service-oriented distributed systems. Concurrency and Computation: Practice and Experience, 25(10):1482--1504, July 2013.Google ScholarCross Ref
- D. Churches, G. Gombás, A. Harrison, J. Maassen, C. Robinson, M. S. Shields, I. J. Taylor, and I. Wang. Programming scientific and distributed workflow with Triana services. Concurrency and Computation: Practice and Experience, 18(10):1021--1037, 2006. Google ScholarDigital Library
- E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, S. Patil, M.-H. Su, K. Vahi, and M. Livny. Pegasus: Mapping Scientific Workflows onto the Grid. In M. Dikaiakos, editor, Grid Computing, volume 3165 of Lecture Notes in Computer Science, chapter 2, pages 131--140. Springer Berlin / Heidelberg, 2004.Google Scholar
- M. T. Di Martino, M. Arbitrio, P. H. Guzzi, E. Leone, F. Baudi, E. Piro, T. Prantera, I. Cucinotto, T. Calimeri, M. Rossi, P. Veltri, M. Cannataro, P. Tagliaferri, and P. Tassone. A peroxisome proliferator-activated receptor gamma (pparg) polymorphism is associated with zoledronic acid-related osteonecrosis of the jaw in multiple myeloma patients: analysis by dmet microarray profiling. British Journal of Haematology, pages 529--533, 2011.Google Scholar
- M. T. Di Martino, M. Arbitrio, E. Leone, P. H. Guzzi, M. Saveria Rotundo, D. Ciliberto, V. Tomaino, F. Fabiani, D. Talarico, P. Sperlongano, P. Doldo, M. Cannataro, M. Caraglia, P. Tassone, and P. Tagliaferri. Single nucleotide polymorphisms of ABCC5 and ABCG1 transporter genes correlate to irinotecan-associated gastrointestinal toxicity in colorectal cancer patients: A DMET microarray profiling study. Cancer Biology and Therapy, 12(9):780--787, November 1 2011.Google ScholarCross Ref
- J. Goecks, A. Nekrutenko, J. Taylor, and T. G. Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biology, 11(8):R86+, Aug. 2010.Google Scholar
- P. H. Guzzi, G. Agapito, M. T. Di Martino, M. Arbitrio, P. Tagliaferrri, P. Tassone, and M. Cannataro. DMET-analyzer: automatic analysis of affymetrix DMET data. BMC Bioinformatics, 13:258:258+, Oct. 2012.Google ScholarCross Ref
- P. H. Guzzi and M. Cannataro. mu-cs: An extension of the tm4 platform to manage affymetrix binary data. BMC Bioinformatics, 11:315, 2010.Google ScholarCross Ref
- D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. R. Pocock, P. Li, and T. Oinn. Taverna: a tool for building and running workflows of services. Nucleic Acids Research, 34(suppl 2):729--732, July 2006.Google ScholarCross Ref
- B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the kepler system: Research articles. Concurr. Comput.: Pract. Exper., 18(10):1039--1065, Aug. 2006. Google ScholarDigital Library
- F. Marozzo, D. Talia, and P. Trunfio. A cloud framework for parameter sweeping data mining applications. In Proc. of the 3rd IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2011), pages 367--374, Athens, Greece, 1 December 2011. Google ScholarDigital Library
- F. Marozzo, D. Talia, and P. Trunfio. A cloud framework for big data analytics workflows on azure. In Proc. of the 2012 High Performance Computing Workshop, HPC 2012. 2012.Google Scholar
- F. Marozzo, D. Talia, and P. Trunfio. Using clouds for scalable knowledge discovery applications. In Euro-Par Workshops, pages 220--227, Rhodes Island, Greece, August 2012. Google ScholarDigital Library
- C. Phillips. SNP Databases. In A. A. Komar, editor, Single Nucleotide Polymorphisms, volume 578, chapter 3, pages 43--71. Humana Press, Totowa, NJ, 2009.Google Scholar
- M. U. Schmidberger M, Vicedo E. affypara-a bioconductor package for parallelized preprocessing algorithms of affymetrix microarray data. Bioinform Biol Insights, 30(22):83--7, 2009.Google Scholar
- D. Talia and P. Trunfio. How distributed data mining tasks can thrive as knowledge services. Communications of the ACM, 53(7):132--137, July 2010. Google ScholarDigital Library
- Cloud4SNP: Distributed Analysis of SNP Microarray Data on the Cloud
Recommendations
Pharmacogenomics: analysing SNPs in the CYP2D6 gene using amino acid properties
The CYP2D6 gene is responsible for metabolising a large portion of the commonly prescribed drugs. Because of its importance, various approaches have been taken to analyse CYP2D6 and Single Nucleotide Polymorphisms (SNPs) throughout its sequence. This ...
Cross-validation and cross-study validation of chronic lymphocytic leukaemia with exome sequences and machine learning
The era of genomics brings the potential of better DNA-based risk prediction and treatment. We explore this problem for chronic lymphocytic leukaemia that is one of the largest whole exome data set available from the NIH dbGaP database. We perform a ...
Comments