skip to main content
10.1145/3107411.3107466acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper
Public Access

An Out-of-Core GPU based Dimensionality Reduction Algorithm for Big Mass Spectrometry Data and Its Application in Bottom-up Proteomics

Published:20 August 2017Publication History

ABSTRACT

Modern high resolution Mass Spectrometry instruments can generate millions of spectra in a single systems biology experiment. Each spectrum consists of thousands of peaks but only a small number of peaks actively contribute to deduction of peptides. Therefore, pre-processing of MS data to detect noisy and non-useful peaks are an active area of research. Most of the sequential noise reducing algorithms are impractical to use as a pre-processing step due to high time-complexity. In this paper, we present a GPU based dimensionality-reduction algorithm, called G-MSR, for MS2 spectra. Our proposed algorithm uses novel data structures which optimize the memory and computational operations inside GPU. These novel data structures include Binary Spectra and Quantized Indexed Spectra (QIS). The former helps in communicating essential information between CPU and GPU using minimum amount of data while latter enables us to store and process complex 3-D data structure into a 1-D array structure while maintaining the integrity of MS data. Our proposed algorithm also takes into account the limited memory of GPUs and switches between in-core and out-of-core modes based upon the size of input data. G-MSR achieves a peak speed-up of 386x over its sequential counterpart and is shown to process over a million spectra in just 32 seconds. The code for this algorithm is available as a GPL open-source at GitHub at the following link: https://github.com/pcdslab/G-MSR.

References

  1. Ruedi Aebersold and Matthias Mann 2003. Mass spectrometry-based proteomics. Nature, Vol. 422, 6928 (2003), 198--207.Google ScholarGoogle Scholar
  2. Muaaz Gul Awan and Fahad Saeed 2015. On the sampling of big mass spectrometry data. In The International Society for Computers and Their Applications (ISCA).Google ScholarGoogle Scholar
  3. Muaaz Gul Awan and Fahad Saeed 2016natexlaba. GPU-ArraySort: A parallel, in-place algorithm for sorting large number of arrays Parallel Processing Workshops (ICPPW), 2016 45th International Conference on Parallel Processing. IEEE, 78--87.Google ScholarGoogle Scholar
  4. Muaaz Gul Awan and Fahad Saeed 2016natexlabb. MS-REDUCE: An ultrafast technique for reduction of Big Mass Spectrometry Data for high-throughput processing. Bioinformatics (2016), btw023.Google ScholarGoogle Scholar
  5. Marshall Bern, David Goldberg, W. Hayes McDonald, and John R. Yates IIII 2004. Automatic Quality Assessment of Peptide Tandem Mass Spectra. Bioinformatics Vol. 20 (2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Daniel Cederman and Philippas Tsigas 2009. Gpu-quicksort: A practical quicksort algorithm for graphics processors. Journal of Experimental Algorithmics (JEA) Vol. 14 (2009), 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sankua Chao, James R Green, and Jeffrey C Smith. 2014,. Evaluation of a GPGPU-based de novo Peptide Sequencing Algorithm. Journal of Medical and Biological Engineering, Vol. 34, 5 (2014,).Google ScholarGoogle ScholarCross RefCross Ref
  8. Benjamin J. Diament and William Stafford Noble 2011. Faster SEQUEST Searching for Peptide Identification from Tandem Mass Spectra. Journal of Proteome Research Vol. 10, 9 (July 2011), 3871--3879.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jiarui Ding, Jinhong Shi, Guy G Poirier, and Fang Xiang Wu. 2009. A novel approach to denoising ion trap tandem mass spectra. Proteome Science, Vol. 7, 9 (March 2009).Google ScholarGoogle ScholarCross RefCross Ref
  10. Jimmy K Eng, Ashley L McCormack, and John R Yates. 1994. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry, Vol. 5, 11 (1994).Google ScholarGoogle ScholarCross RefCross Ref
  11. Yusuke Fujii, Takuya Azumi, Nobuhiko Nishio, Shinpei Kato, and Masato Edahiro 2013. Data transfer matters for GPU computing. In Parallel and Distributed Systems (ICPADS), 2013 International Conference on. IEEE, 275--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Linnet K 2013. Toxicological Screening and Quantitation Using Liquid Chromatography/Time-of-Flight Mass Spectrometry. Journal of Forensic Science and Criminology, Vol. 1, 1 (2013).Google ScholarGoogle Scholar
  13. Hyunwoo Kim, Hosung Jo, Heejin Park, and Eunok Paek. 2015. HiXCorr: a portable high-speed XCorr engine for high-resolution tandem mass spectrometry. Bioinformatics (2015), btv490.Google ScholarGoogle Scholar
  14. Luc Alexis Leuthold, Jean-Franccois Mandscheff, Marc Fathi, Christian Giroud, Marc Augsburger, Emmanuel Varesio, and Gerard Hopfgartner 2006. Desorption electrospray ionization mass spectrometry: direct toxicological screening and analysis of illicit Ecstasy tablets. Rapid Communications in Mass Spectrometry Vol. 20, 2 (2006), 103--110.Google ScholarGoogle ScholarCross RefCross Ref
  15. Nedim Mujezinovic, Gunther Raidl, James R. A. Hutchins, Jan-Michael Peters, Karl Mechtler, and Frank Eisenhaber 2006. Cleaning of raw peptide MS/MS spectra: Improved protein identification following deconvolution of multiply charged peaks, isotope clusters, and removal of background noise. Proteome Science Vol. 6 (2006).Google ScholarGoogle Scholar
  16. Nedim Mujezinovic, Georg Schneider, Michael Wildpaner, Karl Mechtler, and Frank Eisenhaber. 2010. Reducing the haystack to find the needle: improved protein identification after fast elimination of non-interpretable peptide MS/MS spectra and noise reduction. BMC Genomics Vol. 11 (2010).Google ScholarGoogle Scholar
  17. Nvidia 2016. CUDA Toolkit Documentation v7.5. (2016). showURL%http://docs.nvidia.com/cuda/index.html#axzz42Wi4k0QcGoogle ScholarGoogle Scholar
  18. Trairak Pisitkun, Rong-Fong Shen, and Mark Knepper. 2004. Identification and proteomic profiling of exosomes in human urine. Proceedings of the National Academy of Sciences of the United States of America, Vol. 101, 36 (2004). 2014. Mass spectrometry-based proteomics: from cancer biology to protein biomarkers, drug targets, and clinical applications. American Society of Clinical Oncology.Google ScholarGoogle ScholarCross RefCross Ref
  19. Boyang Zhao, Trairak Pisitkun, Jason D. Hoffert, Mark A. Knepper, and Fahad Saeed. 2012. CPhos: A program to calculate and visualize evolutionarily conserved functional phosphorylation sites. Proteomics, Vol. 12, 22 (2012), 3299--3303. endthebibliographyGoogle ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An Out-of-Core GPU based Dimensionality Reduction Algorithm for Big Mass Spectrometry Data and Its Application in Bottom-up Proteomics

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ACM-BCB '17: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics
          August 2017
          800 pages
          ISBN:9781450347228
          DOI:10.1145/3107411

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 August 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          ACM-BCB '17 Paper Acceptance Rate42of132submissions,32%Overall Acceptance Rate254of885submissions,29%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader