ABSTRACT
Modern high resolution Mass Spectrometry instruments can generate millions of spectra in a single systems biology experiment. Each spectrum consists of thousands of peaks but only a small number of peaks actively contribute to deduction of peptides. Therefore, pre-processing of MS data to detect noisy and non-useful peaks are an active area of research. Most of the sequential noise reducing algorithms are impractical to use as a pre-processing step due to high time-complexity. In this paper, we present a GPU based dimensionality-reduction algorithm, called G-MSR, for MS2 spectra. Our proposed algorithm uses novel data structures which optimize the memory and computational operations inside GPU. These novel data structures include Binary Spectra and Quantized Indexed Spectra (QIS). The former helps in communicating essential information between CPU and GPU using minimum amount of data while latter enables us to store and process complex 3-D data structure into a 1-D array structure while maintaining the integrity of MS data. Our proposed algorithm also takes into account the limited memory of GPUs and switches between in-core and out-of-core modes based upon the size of input data. G-MSR achieves a peak speed-up of 386x over its sequential counterpart and is shown to process over a million spectra in just 32 seconds. The code for this algorithm is available as a GPL open-source at GitHub at the following link: https://github.com/pcdslab/G-MSR.
- Ruedi Aebersold and Matthias Mann 2003. Mass spectrometry-based proteomics. Nature, Vol. 422, 6928 (2003), 198--207.Google Scholar
- Muaaz Gul Awan and Fahad Saeed 2015. On the sampling of big mass spectrometry data. In The International Society for Computers and Their Applications (ISCA).Google Scholar
- Muaaz Gul Awan and Fahad Saeed 2016natexlaba. GPU-ArraySort: A parallel, in-place algorithm for sorting large number of arrays Parallel Processing Workshops (ICPPW), 2016 45th International Conference on Parallel Processing. IEEE, 78--87.Google Scholar
- Muaaz Gul Awan and Fahad Saeed 2016natexlabb. MS-REDUCE: An ultrafast technique for reduction of Big Mass Spectrometry Data for high-throughput processing. Bioinformatics (2016), btw023.Google Scholar
- Marshall Bern, David Goldberg, W. Hayes McDonald, and John R. Yates IIII 2004. Automatic Quality Assessment of Peptide Tandem Mass Spectra. Bioinformatics Vol. 20 (2004). Google ScholarDigital Library
- Daniel Cederman and Philippas Tsigas 2009. Gpu-quicksort: A practical quicksort algorithm for graphics processors. Journal of Experimental Algorithmics (JEA) Vol. 14 (2009), 4. Google ScholarDigital Library
- Sankua Chao, James R Green, and Jeffrey C Smith. 2014,. Evaluation of a GPGPU-based de novo Peptide Sequencing Algorithm. Journal of Medical and Biological Engineering, Vol. 34, 5 (2014,).Google ScholarCross Ref
- Benjamin J. Diament and William Stafford Noble 2011. Faster SEQUEST Searching for Peptide Identification from Tandem Mass Spectra. Journal of Proteome Research Vol. 10, 9 (July 2011), 3871--3879.Google ScholarCross Ref
- Jiarui Ding, Jinhong Shi, Guy G Poirier, and Fang Xiang Wu. 2009. A novel approach to denoising ion trap tandem mass spectra. Proteome Science, Vol. 7, 9 (March 2009).Google ScholarCross Ref
- Jimmy K Eng, Ashley L McCormack, and John R Yates. 1994. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry, Vol. 5, 11 (1994).Google ScholarCross Ref
- Yusuke Fujii, Takuya Azumi, Nobuhiko Nishio, Shinpei Kato, and Masato Edahiro 2013. Data transfer matters for GPU computing. In Parallel and Distributed Systems (ICPADS), 2013 International Conference on. IEEE, 275--282. Google ScholarDigital Library
- Linnet K 2013. Toxicological Screening and Quantitation Using Liquid Chromatography/Time-of-Flight Mass Spectrometry. Journal of Forensic Science and Criminology, Vol. 1, 1 (2013).Google Scholar
- Hyunwoo Kim, Hosung Jo, Heejin Park, and Eunok Paek. 2015. HiXCorr: a portable high-speed XCorr engine for high-resolution tandem mass spectrometry. Bioinformatics (2015), btv490.Google Scholar
- Luc Alexis Leuthold, Jean-Franccois Mandscheff, Marc Fathi, Christian Giroud, Marc Augsburger, Emmanuel Varesio, and Gerard Hopfgartner 2006. Desorption electrospray ionization mass spectrometry: direct toxicological screening and analysis of illicit Ecstasy tablets. Rapid Communications in Mass Spectrometry Vol. 20, 2 (2006), 103--110.Google ScholarCross Ref
- Nedim Mujezinovic, Gunther Raidl, James R. A. Hutchins, Jan-Michael Peters, Karl Mechtler, and Frank Eisenhaber 2006. Cleaning of raw peptide MS/MS spectra: Improved protein identification following deconvolution of multiply charged peaks, isotope clusters, and removal of background noise. Proteome Science Vol. 6 (2006).Google Scholar
- Nedim Mujezinovic, Georg Schneider, Michael Wildpaner, Karl Mechtler, and Frank Eisenhaber. 2010. Reducing the haystack to find the needle: improved protein identification after fast elimination of non-interpretable peptide MS/MS spectra and noise reduction. BMC Genomics Vol. 11 (2010).Google Scholar
- Nvidia 2016. CUDA Toolkit Documentation v7.5. (2016). showURL%http://docs.nvidia.com/cuda/index.html#axzz42Wi4k0QcGoogle Scholar
- Trairak Pisitkun, Rong-Fong Shen, and Mark Knepper. 2004. Identification and proteomic profiling of exosomes in human urine. Proceedings of the National Academy of Sciences of the United States of America, Vol. 101, 36 (2004). 2014. Mass spectrometry-based proteomics: from cancer biology to protein biomarkers, drug targets, and clinical applications. American Society of Clinical Oncology.Google ScholarCross Ref
- Boyang Zhao, Trairak Pisitkun, Jason D. Hoffert, Mark A. Knepper, and Fahad Saeed. 2012. CPhos: A program to calculate and visualize evolutionarily conserved functional phosphorylation sites. Proteomics, Vol. 12, 22 (2012), 3299--3303. endthebibliographyGoogle ScholarCross Ref
Index Terms
- An Out-of-Core GPU based Dimensionality Reduction Algorithm for Big Mass Spectrometry Data and Its Application in Bottom-up Proteomics
Recommendations
Out-of-core implementation for accelerator kernels on heterogeneous clouds
Cloud environments today are increasingly featuring hybrid nodes containing multicore CPU processors and a diverse mix of accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors, and Field-Programmable Gate Arrays (FPGAs) to ...
POTAMOS mass spectrometry calculator
Mass spectrometry is a widely used technique for protein identification and it has also become the method of choice in order to detect and characterize the post-translational modifications (PTMs) of proteins. Many software tools have been developed to ...
A Partial Set Covering Model for Protein Mixture Identification Using Mass Spectrometry Data
Protein identification is a key and essential step in mass spectrometry (MS) based proteome research. To date, there are many protein identification strategies that employ either MS data or MS/MS data for database searching. While MS-based methods ...
Comments