Parametric Sequence Alignment with Constraints

Yap, Roland H. C.

doi:10.1023/A:1011429504996

Parametric Sequence Alignment with Constraints

Published: June 2001

Volume 6, pages 157–172, (2001)
Cite this article

Constraints Aims and scope Submit manuscript

Roland H. C. Yap¹

66 Accesses
2 Citations
Explore all metrics

Abstract

Approximate matching techniques based on string alignment are important tools for investigating similarities between strings, such as those representing DNA and protein sequences. We propose a constraint based approach for parametric sequence alignment which allows for more general string alignment queries where the alignment cost can itself be parameterized as a query with some initial constraints. Thus, the costs need not be fixed in a parametric alignment query unlike the case in normal alignment. The basic dynamic programming string edit distance algorithm is generalized to a naive algorithm which uses inequalities to represent the alignment score. The naive algorithm is rather costly and the remainder of the paper develops an improvement which prunes alternatives where it can and approximates the alternatives otherwise. This reduces the number of inequalities significantly and strengthens the constraint representation with equalities. We present some preliminary results using parametric alignment on some general alignment queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). A basic local alignment search tool. Journal of Molecular Biology, 215: 403–410.
Google Scholar
Benner, S. A., Cohen, M. A., & Gonnet, G. H. (1993). Empirical and structural models for insertions and deletions in the divergent evolution of proteins. Journal of Molecular Biology, 229: 1065–1082.
Google Scholar
Colmerauer, A. (1990). An introduction to Prolog III. Communications of the ACM, 33(7): 69–90.
Google Scholar
Dayhoff, M., Schwartz, R. M., & Orcutt, B. C. (1978). A model of evolutionary change in proteins. in M. Dayhoff, ed., Atlas of Protein Sequence and Structure, volume 5,supplement 3, pages 345–352. National Biomedical Foundation.
Dix, T., & Yee, C. N. (1994). A restriction mapping engine using constraint logic programming. In Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pages 112–120.
Gusfield, D., Balasubramanian, K., & Naor, D. (1992). Parametric optimization of sequence alignment. In Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms, pages 432–439.
Gusfield, D., Balasubramanian, K., & Naor, D. (1994). Parametric optimization of sequence alignment. Algorithmica, 12(3): 312–326.
Google Scholar
Gusfield, D., & Stelling, P. (1996). Parametric and inverse-parametric sequence alignment with XPARAL. In Methods in Enzymology, Volume 266, Computer Methods for Macromolecular Sequence Analysis, New York: Academic Press. pages 481–494.
Google Scholar
Henikoff, S., & Henikoff, J. G. (1992). Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences USA, 89(22): 10915–10919.
Google Scholar
Jaffar, J., Michaylov, S., Stuckey, P. J., & Yap, R. H. C. (1992). The CLP(ℛ) language and system. ACM Transactions on Programming Languages and Systems, 14(3): 339–395.
Google Scholar
Jones, D. T., Taylor, W. R., & Thornton, J. M. (1992). The rapid generation of mutation data matrices from protein sequences. Computer Applications in the Biosciences, 8(3): 275–282.
Google Scholar
Needleman, S., & Wunsch, C. (1970). A general method applicable to the search for similarities in the amino acid sequences of two proteins. Journal of Molecular Biology, 48: 444–453.
Google Scholar
Olson, B., & Fernández-Baca, D. (1998). Multiparameter sensitivity analysis for sequence comparison. Poster abstracts in the RECOMB 98: Second Annual International Conference on Computational Molecular Biology.
Pearson, W. R., & Lipman, D. J. (1998). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences USA, 85: 2444–2488.
Google Scholar
Pearson, W. R. (1994). Searching protein sequence databases—is optimal best? In S. Suhai, ed., Computational Methods in Genome Research, pages 111–120. New York: Plenum Press.
Google Scholar
Rajasekar, A. (1994). Applications in constraint logic programming with strings. PPCP'94: Proceedings of the Second Workshop on Principles and Practice of Constraint Programming, Lecture Notes in Computer Science, pages 109–122. Berlin: Springer-Verlag.
Google Scholar
Sankoff, D., & Kruskal, J. B., eds. (1983). Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Reading, MA: Addison Wesley.
Google Scholar
Sellers, P. H. (1990). On the theory and computation of evolutionary distances. SIAM Journal of Applied Mathematics, 26: 444–453.
Google Scholar
Setubal, J., & Meidanis, J. (1997). Introduction to Computational Molecular Biology. PWS Publishing.
Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of Molecular Biology, 147: 195–197.
Google Scholar
Vingron, M., & Waterman, M. S. (1994). Sequence alignment and penalty choice: review of concepts, case studies and implications. Journal of Molecular Biology, 235: 1–12.
Google Scholar
Wagner, R. A., & Fisher, M. J. (1974). The string-to-string correction problem. Journal of the Association of Computing Machinery, 21(1): 168–173.
Google Scholar
Walinsky, C. (1989). CLP(Σ*): constraint logic programming with regular sets. In Proceedings of the 6th International Conference on Logic Programming, pages 181–186. Cambridge, MA: MIT Press.
Google Scholar
Waterman, M. S. (1983). Sequence alignments in the neighborhood of the optimum with general application to dynamic programming. Proceedings of the National Academy of Sciences USA, 80: 3123–3124.
Google Scholar
Waterman, M. S. (1989). Mathematical Methods for DNA Sequences. Boca Raton, FL: CRC Press (Chapter 3 is on sequence alignment).
Google Scholar
Waterman, M. S., Eggert, S., & Lander, F. (1992). Parametric sequence comparisons. Proceedings of the National Academy of Sciences USA, 89: 6090–6093.
Google Scholar
Waterman, M. S. (1995). Introduction to Computational Biology. Cambridge, UK: Chapman & Hall.
Google Scholar
Yap, R. H. C. (1991). Restriction site mapping in CLP(ℛ). In Proceedings of the 8th International Conference on Logic Programming, pages 521–534. Cambridge, MA: MIT Press.
Google Scholar
Yap, R. H. C. (1993). A constraint logic programming framework for constructing DNA restriction maps. Artificial Intelligence in Medicine, 5: 447–464.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, National University of Singapore, Singapore, 117543, Republic of Singapore
Roland H. C. Yap

Authors

Roland H. C. Yap
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yap, R.H.C. Parametric Sequence Alignment with Constraints. Constraints 6, 157–172 (2001). https://doi.org/10.1023/A:1011429504996

Download citation

Issue Date: June 2001
DOI: https://doi.org/10.1023/A:1011429504996

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parametric Sequence Alignment with Constraints

Abstract

Access this article

Similar content being viewed by others

Longest Common Substring with Approximately k Mismatches

Introduction to Bioinformatics

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Parametric Sequence Alignment with Constraints

Abstract

Access this article

Similar content being viewed by others

Longest Common Substring with Approximately k Mismatches

Introduction to Bioinformatics

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation