Abstract
We report on a new de novo peptide sequencing algorithm that uses spectral graph partitioning. In this approach, relationships between m/z peaks are represented by attractive and repulsive springs, and the vibrational modes of the spring system are used to infer information about the peaks (such as “likely b-ion” or “likely y-ion”). We demonstrate the effectiveness of this approach by comparison with other de novo sequencers on test sets of ion-trap and QTOF spectra, including spectra of mixtures of peptides. On all data sets we outperform the other sequencers. Along with spectral graph theory techniques, EigenMS incorporates another improvement of independent interest: robust statistical methods for recalibration of time-of-flight mass measurements. Robust recalibration greatly outperforms simple least-squares recalibration, achieving about three times the accuracy for one QTOF data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alpert, C., Kahng, A., Yao, S.: Spectral partitioning: the more eigenvectors, the better. Discrete Applied Math. 90, 3–26 (1999)
Andreev, V.P., et al.: A universal denoising and peak picking algorithm for LC-MS based on matched filtration in the chromatographic time domain. Anal. Chem. 75, 6314–6326 (2003)
Bafna, V., Edwards, N.: SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics 17, S13–S21 (2001)
Bafna, V., Edwards, N.: On de novo interpretation of tandem mass spectra for peptide identification. In: RECOMB 2003, pp. 9–18 (2003)
Bartels, C.: Fast algorithm for peptide sequencing by mass spectrometry. Biomedical and Environmental Mass Spectrometry 19, 363–368 (1990)
Bern, M., Goldberg, D.: Automatic quality assessment of peptide tandem mass spectra. Bioinformatics, ISMB special issue (2004)
Chen, T., Kao, M.-Y., Tepel, M., Rush, J., Church, G.M.: A dynamic programming approach to de novo peptide sequencing by mass spectrometry. J. Computational Biology 8, 325–337 (2001)
Chung, F.R.K.: Spectral Graph Theory. CBMS Series, vol. 92. American Mathematical Society, Providence (1997)
Clauser, K.R., Baker, P.R., Burlingame, A.L.: The role of accurate mass measurment (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. Anal. Chem. 71, 2871–2882 (1999)
Creasy, D.M., Cottrell, J.S.: Error tolerant searching of uninterpreted tandem mass spectrometry data. Proteomics 2, 1426–1434 (2002)
Dančik, V., Addona, T.A., Clauser, K.R., Vath, J.E., Pevzner, P.A.: De novo peptide sequencing via tandem mass spectrometry. J. Computational Biology 6, 327–342 (1999)
Day, R.M., Borziak, A., Gorin, A.: PPM-Chain – de novo peptide identification program comparable in performance to Sequest. Proc. IEEE Computational Systems Bioinformatics, 505–508 (2004)
Demmel, J.: Lecture notes on graph partitioning, http://www.cs.berkeley.edu/~demmel/cs267/lecture20/lecture20.html
Elias, J.E., Gibbons, F.D., King, O.D., Roth, F.P., Gygi, S.P.: Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nature Biotechnology 22, 214–219 (2004)
Eng, J.K., McCormack, A.L., Yates III, J.R.: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994)
Fiedler, M.: A property of eigenvectors of nonnegative symmetric matrices and its applications to graph theory. Czech. Math. J. 25, 619–633 (1975)
Gobom, J., Mueller, M., Egelhofer, V., Theiss, D., Lehrach, H., Nordhoff, E.: A calibration method that simplifies and improves accurate determination of peptide molecular masses by MALDI-TOF MS. Anal. Chem. 74, 3915–3923 (2002)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore (1996)
Han, Y., Ma, B., Zhang, K.: SPIDER: software for protein identification from sequence tags with de novo sequencing error. Proc. IEEE Computational Systems Bioinformatics, 206–215 (2004)
Havilio, M., Haddad, Y., Smilansky, Z.: Intensity-based statistical scorer for tandem mass spectrometry. Anal. Chem. 75, 435–444 (2003)
Havilio, M.: Automatic peptide identification using de novo sequencing and efficient indexing. Poster presentation. In: Fifth International Symp. Mass Spectrometry in the Health and Life Sciences, San Francisco (2001)
Kinter, M., Sherman, N.E.: Protein Sequencing and Identification Using Tandem Mass Spectrometry. John Wiley & Sons, Chichester (2000)
Lanczos, C.: An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Nat. Bur. Stand. 45, 255–282 (1950), http://www.netlib.org/laso/
Liebler, D.C.: Introduction to Proteomics: Tools for the New Biology. Humana Press, Totowa (2002)
Lubeck, O., Sewell, C., Gu, S., Chen, X., Cai, D.M.: New computational approaches for de novo peptide sequencing from MS/MS experiments. Proc. IEEE 90, 1868–1874 (2002)
Ma, B., Zhang, K., Liang, C.: An effective algorithm for the peptide de novo sequencing from MS/MS spectrum. In: Symp. Comb. Pattern Matching 2003, pp. 266–278 (2003)
Ma, B., Zhang, K., Hendrie, C., Liang, C., Li, M., Doherty-Kirby, A., Lajoie, G.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Comm. in Mass Spectrometry 17, 2337–2342 (2003), http://www.bioinformaticssolutions.com
MacCoss, M.J., et al.: Shotgun identification of protein modifications from protein complexes and lens tissue. Proc. Natl. Acad. Sciences 99, 7900–7905 (2002)
Mann, M., Wilm, M.: Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66, 4390–4399 (1994)
Perkins, D.N., Pappin, D.J.C., Creasy, D.M., Cottrell, J.S.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999)
Pevzner, P.A., Dančik, V., Tang, C.L.: Mutation-tolerant protein identification by mass spectrometry. J. Comput. Bio. 7, 777–787 (2000)
Pevzner, P.A., Mulyukov, Z., Dančik, V., Tang, C.L.: Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. Genome Research 11, 290–299 (2001)
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. John Wiley & Sons, Chichester (1987)
Searle, B.C., et al.: High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results. Anal. Chem. 76, 2220–2230 (2004)
Shevchenko, A., Wilm, M., Mann, M.: Peptide mass spectrometry for homology searches and cloning of genes. J. Protein Chem. 5, 481–490 (1997)
Shevchenko, A., et al.: Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. Anal. Chem. 73, 1917–1926 (2001)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Machine Intell. 22, 888–905 (2000)
Siuzdak, G.: The Expanding Role of Mass Spectrometry in Biotechnology. MCC Press (2003)
Tabb, D.L., Smith, L.L., Breci, L.A., Wysocki, V.H., Lin, D., Yates III, J.R.: Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic digests. Anal. Chem. 75, 1155–1163 (2003)
Tabb, D.L., Saraf, A., Yates III, J.R.: GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. Anal. Chem. 75, 6415–6421 (2003)
Tabb, D.L., MacCoss, M.J., Wu, C.C., Anderson, S.D., Yates III, J.R.: Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. Anal. Chem. 75, 2470–2477 (2003)
Taylor, J., Johnson, R.: Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Anal. Chem. 73, 2594–2604 (2001)
Uttenweiler-Joseph, S., Neubauer, G., Christoforidis, S., Zerial, M., Wilm, M.: Automated de novo sequencing of proteins using the differential scanning technique. Proteomics 1, 668–682 (2001)
Yan, B., Pan, C., Olman, V.N., Hettich, R.L., Xu, Y.: Separation of ion types in tandem mass spectrometry data interpretation – a graph-theoretic approach. Proc. IEEE Computational Systems Bioinformatics, 236–244 (2004)
Yates III, J.R., Eng, J., McCormack, A., Schietz, D.: Method to correlate tandem mass spectra of modified peptides to amino acid sequences in a protein database. Anal. Chem. 67, 1426–1436 (1995)
Zhang, Z.: Least median of squares. Web-site tutorial, http://www-sop.inria.fr/robotvis/personnel/zzhang/Publis/Tutorial-Estim/node25.html
Zhong, H., Zhang, Y., Wen, Z., Li, L.: Protein sequencing by mass analysis of polypeptide ladders after controlled protein proteolysis. Nature Biotechnology 22, 1291–1296 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bern, M., Goldberg, D. (2005). EigenMS: De Novo Analysis of Peptide Tandem Mass Spectra by Spectral Graph Partitioning. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds) Research in Computational Molecular Biology. RECOMB 2005. Lecture Notes in Computer Science(), vol 3500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415770_27
Download citation
DOI: https://doi.org/10.1007/11415770_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25866-7
Online ISBN: 978-3-540-31950-4
eBook Packages: Computer ScienceComputer Science (R0)