Skip to main content
Log in

Parametric Sequence Alignment with Constraints

  • Published:
Constraints Aims and scope Submit manuscript

Abstract

Approximate matching techniques based on string alignment are important tools for investigating similarities between strings, such as those representing DNA and protein sequences. We propose a constraint based approach for parametric sequence alignment which allows for more general string alignment queries where the alignment cost can itself be parameterized as a query with some initial constraints. Thus, the costs need not be fixed in a parametric alignment query unlike the case in normal alignment. The basic dynamic programming string edit distance algorithm is generalized to a naive algorithm which uses inequalities to represent the alignment score. The naive algorithm is rather costly and the remainder of the paper develops an improvement which prunes alternatives where it can and approximates the alternatives otherwise. This reduces the number of inequalities significantly and strengthens the constraint representation with equalities. We present some preliminary results using parametric alignment on some general alignment queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). A basic local alignment search tool. Journal of Molecular Biology, 215: 403–410.

    Google Scholar 

  2. Benner, S. A., Cohen, M. A., & Gonnet, G. H. (1993). Empirical and structural models for insertions and deletions in the divergent evolution of proteins. Journal of Molecular Biology, 229: 1065–1082.

    Google Scholar 

  3. Colmerauer, A. (1990). An introduction to Prolog III. Communications of the ACM, 33(7): 69–90.

    Google Scholar 

  4. Dayhoff, M., Schwartz, R. M., & Orcutt, B. C. (1978). A model of evolutionary change in proteins. in M. Dayhoff, ed., Atlas of Protein Sequence and Structure, volume 5,supplement 3, pages 345–352. National Biomedical Foundation.

  5. Dix, T., & Yee, C. N. (1994). A restriction mapping engine using constraint logic programming. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pages 112–120.

  6. Gusfield, D., Balasubramanian, K., & Naor, D. (1992). Parametric optimization of sequence alignment. In Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms, pages 432–439.

  7. Gusfield, D., Balasubramanian, K., & Naor, D. (1994). Parametric optimization of sequence alignment. Algorithmica, 12(3): 312–326.

    Google Scholar 

  8. Gusfield, D., & Stelling, P. (1996). Parametric and inverse-parametric sequence alignment with XPARAL. In Methods in Enzymology, Volume 266, Computer Methods for Macromolecular Sequence Analysis, New York: Academic Press. pages 481–494.

    Google Scholar 

  9. Henikoff, S., & Henikoff, J. G. (1992). Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences USA, 89(22): 10915–10919.

    Google Scholar 

  10. Jaffar, J., Michaylov, S., Stuckey, P. J., & Yap, R. H. C. (1992). The CLP(ℛ) language and system. ACM Transactions on Programming Languages and Systems, 14(3): 339–395.

    Google Scholar 

  11. Jones, D. T., Taylor, W. R., & Thornton, J. M. (1992). The rapid generation of mutation data matrices from protein sequences. Computer Applications in the Biosciences, 8(3): 275–282.

    Google Scholar 

  12. Needleman, S., & Wunsch, C. (1970). A general method applicable to the search for similarities in the amino acid sequences of two proteins. Journal of Molecular Biology, 48: 444–453.

    Google Scholar 

  13. Olson, B., & Fernández-Baca, D. (1998). Multiparameter sensitivity analysis for sequence comparison. Poster abstracts in the RECOMB 98: Second Annual International Conference on Computational Molecular Biology.

  14. Pearson, W. R., & Lipman, D. J. (1998). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences USA, 85: 2444–2488.

    Google Scholar 

  15. Pearson, W. R. (1994). Searching protein sequence databases—is optimal best? In S. Suhai, ed., Computational Methods in Genome Research, pages 111–120. New York: Plenum Press.

    Google Scholar 

  16. Rajasekar, A. (1994). Applications in constraint logic programming with strings. PPCP'94: Proceedings of the Second Workshop on Principles and Practice of Constraint Programming, Lecture Notes in Computer Science, pages 109–122. Berlin: Springer-Verlag.

    Google Scholar 

  17. Sankoff, D., & Kruskal, J. B., eds. (1983). Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Reading, MA: Addison Wesley.

    Google Scholar 

  18. Sellers, P. H. (1990). On the theory and computation of evolutionary distances. SIAM Journal of Applied Mathematics, 26: 444–453.

    Google Scholar 

  19. Setubal, J., & Meidanis, J. (1997). Introduction to Computational Molecular Biology. PWS Publishing.

  20. Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147: 195–197.

    Google Scholar 

  21. Vingron, M., & Waterman, M. S. (1994). Sequence alignment and penalty choice: review of concepts, case studies and implications. Journal of Molecular Biology, 235: 1–12.

    Google Scholar 

  22. Wagner, R. A., & Fisher, M. J. (1974). The string-to-string correction problem. Journal of the Association of Computing Machinery, 21(1): 168–173.

    Google Scholar 

  23. Walinsky, C. (1989). CLP(Σ*): constraint logic programming with regular sets. In Proceedings of the 6th International Conference on Logic Programming, pages 181–186. Cambridge, MA: MIT Press.

    Google Scholar 

  24. Waterman, M. S. (1983). Sequence alignments in the neighborhood of the optimum with general application to dynamic programming. Proceedings of the National Academy of Sciences USA, 80: 3123–3124.

    Google Scholar 

  25. Waterman, M. S. (1989). Mathematical Methods for DNA Sequences. Boca Raton, FL: CRC Press (Chapter 3 is on sequence alignment).

    Google Scholar 

  26. Waterman, M. S., Eggert, S., & Lander, F. (1992). Parametric sequence comparisons. Proceedings of the National Academy of Sciences USA, 89: 6090–6093.

    Google Scholar 

  27. Waterman, M. S. (1995). Introduction to Computational Biology. Cambridge, UK: Chapman & Hall.

    Google Scholar 

  28. Yap, R. H. C. (1991). Restriction site mapping in CLP(ℛ). In Proceedings of the 8th International Conference on Logic Programming, pages 521–534. Cambridge, MA: MIT Press.

    Google Scholar 

  29. Yap, R. H. C. (1993). A constraint logic programming framework for constructing DNA restriction maps. Artificial Intelligence in Medicine, 5: 447–464.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yap, R.H.C. Parametric Sequence Alignment with Constraints. Constraints 6, 157–172 (2001). https://doi.org/10.1023/A:1011429504996

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011429504996

Navigation