Abstract
Approximate matching techniques based on string alignment are important tools for investigating similarities between strings, such as those representing DNA and protein sequences. We propose a constraint based approach for parametric sequence alignment which allows for more general string alignment queries where the alignment cost can itself be parameterized as a query with some initial constraints. Thus, the costs need not be fixed in a parametric alignment query unlike the case in normal alignment. The basic dynamic programming string edit distance algorithm is generalized to a naive algorithm which uses inequalities to represent the alignment score. The naive algorithm is rather costly and the remainder of the paper develops an improvement which prunes alternatives where it can and approximates the alternatives otherwise. This reduces the number of inequalities significantly and strengthens the constraint representation with equalities. We present some preliminary results using parametric alignment on some general alignment queries.
Similar content being viewed by others
References
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). A basic local alignment search tool. Journal of Molecular Biology, 215: 403–410.
Benner, S. A., Cohen, M. A., & Gonnet, G. H. (1993). Empirical and structural models for insertions and deletions in the divergent evolution of proteins. Journal of Molecular Biology, 229: 1065–1082.
Colmerauer, A. (1990). An introduction to Prolog III. Communications of the ACM, 33(7): 69–90.
Dayhoff, M., Schwartz, R. M., & Orcutt, B. C. (1978). A model of evolutionary change in proteins. in M. Dayhoff, ed., Atlas of Protein Sequence and Structure, volume 5,supplement 3, pages 345–352. National Biomedical Foundation.
Dix, T., & Yee, C. N. (1994). A restriction mapping engine using constraint logic programming. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pages 112–120.
Gusfield, D., Balasubramanian, K., & Naor, D. (1992). Parametric optimization of sequence alignment. In Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms, pages 432–439.
Gusfield, D., Balasubramanian, K., & Naor, D. (1994). Parametric optimization of sequence alignment. Algorithmica, 12(3): 312–326.
Gusfield, D., & Stelling, P. (1996). Parametric and inverse-parametric sequence alignment with XPARAL. In Methods in Enzymology, Volume 266, Computer Methods for Macromolecular Sequence Analysis, New York: Academic Press. pages 481–494.
Henikoff, S., & Henikoff, J. G. (1992). Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences USA, 89(22): 10915–10919.
Jaffar, J., Michaylov, S., Stuckey, P. J., & Yap, R. H. C. (1992). The CLP(ℛ) language and system. ACM Transactions on Programming Languages and Systems, 14(3): 339–395.
Jones, D. T., Taylor, W. R., & Thornton, J. M. (1992). The rapid generation of mutation data matrices from protein sequences. Computer Applications in the Biosciences, 8(3): 275–282.
Needleman, S., & Wunsch, C. (1970). A general method applicable to the search for similarities in the amino acid sequences of two proteins. Journal of Molecular Biology, 48: 444–453.
Olson, B., & Fernández-Baca, D. (1998). Multiparameter sensitivity analysis for sequence comparison. Poster abstracts in the RECOMB 98: Second Annual International Conference on Computational Molecular Biology.
Pearson, W. R., & Lipman, D. J. (1998). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences USA, 85: 2444–2488.
Pearson, W. R. (1994). Searching protein sequence databases—is optimal best? In S. Suhai, ed., Computational Methods in Genome Research, pages 111–120. New York: Plenum Press.
Rajasekar, A. (1994). Applications in constraint logic programming with strings. PPCP'94: Proceedings of the Second Workshop on Principles and Practice of Constraint Programming, Lecture Notes in Computer Science, pages 109–122. Berlin: Springer-Verlag.
Sankoff, D., & Kruskal, J. B., eds. (1983). Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Reading, MA: Addison Wesley.
Sellers, P. H. (1990). On the theory and computation of evolutionary distances. SIAM Journal of Applied Mathematics, 26: 444–453.
Setubal, J., & Meidanis, J. (1997). Introduction to Computational Molecular Biology. PWS Publishing.
Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147: 195–197.
Vingron, M., & Waterman, M. S. (1994). Sequence alignment and penalty choice: review of concepts, case studies and implications. Journal of Molecular Biology, 235: 1–12.
Wagner, R. A., & Fisher, M. J. (1974). The string-to-string correction problem. Journal of the Association of Computing Machinery, 21(1): 168–173.
Walinsky, C. (1989). CLP(Σ*): constraint logic programming with regular sets. In Proceedings of the 6th International Conference on Logic Programming, pages 181–186. Cambridge, MA: MIT Press.
Waterman, M. S. (1983). Sequence alignments in the neighborhood of the optimum with general application to dynamic programming. Proceedings of the National Academy of Sciences USA, 80: 3123–3124.
Waterman, M. S. (1989). Mathematical Methods for DNA Sequences. Boca Raton, FL: CRC Press (Chapter 3 is on sequence alignment).
Waterman, M. S., Eggert, S., & Lander, F. (1992). Parametric sequence comparisons. Proceedings of the National Academy of Sciences USA, 89: 6090–6093.
Waterman, M. S. (1995). Introduction to Computational Biology. Cambridge, UK: Chapman & Hall.
Yap, R. H. C. (1991). Restriction site mapping in CLP(ℛ). In Proceedings of the 8th International Conference on Logic Programming, pages 521–534. Cambridge, MA: MIT Press.
Yap, R. H. C. (1993). A constraint logic programming framework for constructing DNA restriction maps. Artificial Intelligence in Medicine, 5: 447–464.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Yap, R.H.C. Parametric Sequence Alignment with Constraints. Constraints 6, 157–172 (2001). https://doi.org/10.1023/A:1011429504996
Issue Date:
DOI: https://doi.org/10.1023/A:1011429504996