Multiple sequence comparison: A peptide matching approach

Sagot, Marie-France; Viari, Alain; Soldano, Henri

doi:10.1007/3-540-60044-2_55

Marie-France Sagot^1,3,
Alain Viari¹ &
Henri Soldano^1,2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 937))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

145 Accesses
4 Citations

Abstract

We present in this paper a peptide matching approach to the multiple comparison of a set of protein sequences. This approach consists in looking for all the words that are common to q of these sequences, where q is a parameter.

The comparison between words is done by using as reference an object called a model. In the case of proteins, a model is a product of subsets of the alphabet σ of the amino acids. These subsets belong to a cover of σ, that is, their union covers all of σ. A word is said to be an instance of a model if it belongs to the model.

A further flexibility is introduced in the comparison by allowing for up to e errors in the comparison between a word and a model. A word is said to be this time an occurrence of a model if the Levenshtein distance between it and an instance of the model is inferior or equal to e. Two words are said to be similar if there is at least one model of which both are occurrences. In the special case where e = 0, the occurrences of a model are simply its instances. If a model M has occurrences in at least q of the sequences of the set, M is said to occur in the set.

The algorithm presented here is an efficient and exact way of looking for all the models, of a fixed length k or of the greatest possible length k _max, that occur in a set of sequences. It is linear in the total length n of the sequences and proportional to k ^2e+1 where k ≪ n is a small value in all practical situations.

Models are closely related to what is called a consensus in the biocomputing area, and covers are a good way of representing complex relationships between the amino acids.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

BLAST and FASTA Similarity Searching for Multiple Sequence Alignment

Fast Trie-Based Method for Multiple Pairwise Sequence Alignment

Article 01 January 2019

Sequence Comparison Without Alignment: The SpaM Approaches

References

G.J. Barton and M.J.E. Sternberg. A strategy for the rapid multiple alignment of protein sequences: confidence levels from tertiary structure comparisons. J. Mol. Biol., 198:327–337, 1987.
PubMed Google Scholar
D.L. Brutlag, J-P. Dautricourt, S. Maulik, and J. Relph. Improved sensitivity of biological sequence database searches. Comput. Applic. Biosc., 6:237–245, 1990.
Google Scholar
H. Carrillo and D.J. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48:1073–1083, 1988.
Google Scholar
A.L. Cobbs. Fast identification of approximately matching substrings. In M. Crochemore and D. Gusfield, editors, Combinatorial Pattern Matching, pages 64–74. Springer Verlag, 1994.
Google Scholar
M.O. Dayhoff, R.M. Schwartz, and B.C. Orcutt. A model of evolutionary change in proteins. In M.O. Dayhoff, editor, Atlas of Protein Sequence an Structure, volume 5 suppl.3, pages 345–352. Natl. Biomed. Res. Found., 1978.
Google Scholar
D.F. Feng and R.F. Doolittle. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol., 25:351–360, 1987.
PubMed Google Scholar
D. Gusfield. Efficient method for multiple sequence alignment with guaranteed error bounds. Bull. of Math. Biol., 55:141–154, 1993.
Google Scholar
S. Henikoff. Comparative sequence analysis: finding genes. In D.W. Smith, editor, Biocomputing. Informatics and Genome Projects, pages 87–117. Academic Press, 1994.
Google Scholar
S. Henikoff and J.G. Henikoff. Automated assembly of protein blocks for database searching. Nucl. Acids Res., 19:6565–6572, 1991.
PubMed Google Scholar
S. Karlin and G. Ghandour. Multiple alphabet amino acid sequence comparisons of the immunoglobulin k-chain constant domain. Proc. Natl. Acad. Sci. USA, 82:8597–8601, 1985.
PubMed Google Scholar
R.M. Karp, R.E. Miller, and A.L. Rosenberg. Rapid identification of repeated patterns in strings, trees and arrays. pages 125–136. Proc. 4th Annu. ACM Symp. Theory of Computing, 1972.
Google Scholar
A. Landraud, J.F. Avril, and P. Chretienne. An algorithm for finding a common structure shared by a family of strings. IEEE Trans. on Pattern Analysis and Machine Intelligence, 11:890–895, 1989.
Google Scholar
H.M. Martinez. A flexible multiple sequence alignment program. Nucleic Acids Res., 16:1683–1691, 1988.
PubMed Google Scholar
R.N. Pau. Nitrogenases without molybdenum. Trends Biochem. Sci., 14:183–186, 1989.
PubMed Google Scholar
J. Posfai, A.S. Bhagwat, G. Posfai, and R.J. Roberts. Prediction motifs derived from cytosine methyltransferases. Nucl. Acids Res., 17:2421–2435, 1989.
PubMed Google Scholar
M. F. Sagot, A. Viari, J. Pothier, and H. Soldano. Finding flexible patterns in a text — an application to 3D molecular matching. pages 117–145, Seattle, Washington, USA, 1994. First International IEEE Workshop on Shape and Pattern Matching in Computational Biology.
Google Scholar
M.F. Sagot, V. Escalier, A. Viari, and H. Soldano. Searching for repeated words in a text allowing for mismatches and gaps. Viñas del Mar, Chili, 1994. Second South American Workshop on String Processing.
Google Scholar
D. Sankoff. Minimum mutation trees of sequences. SIAM J. Appl. Math., 28:35–42, 1975.
Article Google Scholar
D. Sankoff and R.J. Cedergreen. Simultaneous comparison of three or more sequences related by a tree. In D. Sankoff and J.B. Kruskall, editors, Time Warps, String Edits, and Macromolecules. The Theory and Practice of sequence Comparison, pages 253–263. Addison-Wesley, 1983.
Google Scholar
G.D. Schuler, S.F. Altschul, and D.J. Lipman. A workbench for multiple alignment construction and analysis. Proteins: Struct., Func., and Genetics, 9:180–190, 1991.
Google Scholar
H. Soldano, A. Viari, and M. Champesme. Searching for flexible repeated patterns using a non transitive similarity relation. Pattern Recognition Letters, 1994. in press.
Google Scholar
S. Subbiah and S.C. Harrison. A method for multiple sequence alignment with gaps. J. Mol Biol., 209:539–548, 1989.
PubMed Google Scholar
W.R. Taylor. The classification of amino acid conservation. J. Theor. Biol., 119:205–218, 1986.
PubMed Google Scholar
A. Viari and J. Pothier. SmartMulti: a tool for the multiple alignment of protein sequences using flexible blocks. Atelier de BioInformatique, 11, rue P. et M. Curie — 75005 Paris, 1994. in preparation.
Google Scholar
A.K.C. Wong, S.C. Chan, and D.K.Y. Chiu. A multiple sequence comparison method. Bull. Math. Biol., 55:465–486, 1993.
PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Atelier de BioInformatique, CPASO-URA CNRS 448 Section de Physique et Chimie de l'Institut Curie 1, rue P. et M. Curie, 75005, Paris, France
Marie-France Sagot, Alain Viari & Henri Soldano
URA CNRS 1507, LIPN-Université de Paris Nord, Avenue J.B. Clément, 93430, Villetaneuse
Henri Soldano
Institut Gaspard Monge, Université de Marne la Vallée, 2, rue de la Butte Verte, 93160, Noisy le Grand
Marie-France Sagot

Authors

Marie-France Sagot
View author publications
You can also search for this author in PubMed Google Scholar
Alain Viari
View author publications
You can also search for this author in PubMed Google Scholar
Henri Soldano
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zvi Galil Esko Ukkonen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sagot, MF., Viari, A., Soldano, H. (1995). Multiple sequence comparison: A peptide matching approach. In: Galil, Z., Ukkonen, E. (eds) Combinatorial Pattern Matching. CPM 1995. Lecture Notes in Computer Science, vol 937. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60044-2_55

Download citation

DOI: https://doi.org/10.1007/3-540-60044-2_55
Published: 31 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60044-2
Online ISBN: 978-3-540-49412-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Multiple sequence comparison: A peptide matching approach

Abstract

Access this chapter

Preview

Similar content being viewed by others

BLAST and FASTA Similarity Searching for Multiple Sequence Alignment

Fast Trie-Based Method for Multiple Pairwise Sequence Alignment

Sequence Comparison Without Alignment: The SpaM Approaches

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multiple sequence comparison: A peptide matching approach

Abstract

Access this chapter

Preview

Similar content being viewed by others

BLAST and FASTA Similarity Searching for Multiple Sequence Alignment

Fast Trie-Based Method for Multiple Pairwise Sequence Alignment

Sequence Comparison Without Alignment: The SpaM Approaches

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation