Longest common subsequences of two random sequences

Vacláv Chvátal; David Sankoff

doi:10.2307/3212444

Summary

Given two random k-ary sequences of length n, what is f(n,k), the expected length of their longest common subsequence? This problem arises in the study of molecular evolution. We calculate f(n,k) for all k, where n ≦ 5, and f(n,2) where n ≦ 10. We study the limiting behaviour of n–1f(n,k) and derive upper and lower bounds on these limits for all k. Finally we estimate by Monte-Carlo methods f(100,k), f(1000,2) and f(5000,2).

References

[1] Fekete, M. (1923) Über die Verteilung der Wurzeln bei gewissen algebraischen Gleichungen mit ganzzahligen Koeffizienten. Math. Z. 17, 228–249.CrossRef Google Scholar

[2] Riordan, J. (1958) An Introduction to Combinatorial Analysis. John Wiley, New York.Google Scholar

[3] Sankoff, D. (1972) Matching sequences under deletion/insertion constraints. Proc. Nat. Acad. Sci. U.S.A. 69, 4–6.CrossRef Google Scholar PubMed

[4] Sankoff, D. and Cedergren, R. J. (1973) A test for nucleotide sequence homology. J. Mol. Biol. 77, 159–164.CrossRef Google Scholar PubMed

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Nussinov, Ruth Pieczenik, George Griggs, Jerrold R. and Kleitman, Daniel J. 1978. Algorithms for Loop Matchings. SIAM Journal on Applied Mathematics, Vol. 35, Issue. 1, p. 68.

Deken, Joseph G. 1979. Some limit results for longest common subsequences. Discrete Mathematics, Vol. 26, Issue. 1, p. 17.

Steele, Michael J. 1982. Long Common Subsequences and the Proximity of Two Random Strings. SIAM Journal on Applied Mathematics, Vol. 42, Issue. 4, p. 731.

Nussinov, Ruth 1983. Efficient algorithms for searching for exact repetition of nucleotide sequences. Journal of Molecular Evolution, Vol. 19, Issue. 3-4, p. 283.

Nussinov, Ruth 1983. An efficient code searching for sequence homology and DNA duplication. Journal of Theoretical Biology, Vol. 100, Issue. 2, p. 319.

Chung, F. R. K. Fishburn, P. C. and Wei, V. K. 1985. Cross-monotone subsequences. Order, Vol. 1, Issue. 4, p. 351.

Beyer, William A. Sellers, Peter H. and Waterman, Michael S. 1985. Stanislaw M. Ulam's contributions to theoretical theory. Letters in Mathematical Physics, Vol. 10, Issue. 2-3, p. 231.

Arratia, Richard and Waterman, Michael S 1985. An Erdös-Rényi law with shifts. Advances in Mathematics, Vol. 55, Issue. 1, p. 13.

Fousler, David E. and Karlin, Samuel 1987. Maximal success durations for a semi-Markov process. Stochastic Processes and their Applications, Vol. 24, Issue. 2, p. 203.

Karlin, Samuel and Ost, Friedemann 1987. Counts of long aligned word matches among random letter sequences. Advances in Applied Probability, Vol. 19, Issue. 2, p. 293.

Simon, Imre 1989. Electronic Dictionaries and Automata in Computational Linguistics. Vol. 377, Issue. , p. 79.

Chang, W.I. and Lawler, E.L. 1990. Approximate string matching in sublinear expected time. p. 116.

AHO, Alfred V. 1990. Algorithms and Complexity. p. 255.

Louchard, Guy and Szpankowski, Wojciech 1993. Combinatorial Pattern Matching. Vol. 684, Issue. , p. 152.

Rinsma-Melchert, Ingrid 1993. The expected number of matches in optimal global sequence alignments. New Zealand Journal of Botany, Vol. 31, Issue. 3, p. 219.

Jiang, Tao and Li, Ming 1994. Automata, Languages and Programming. Vol. 820, Issue. , p. 191.

Vingron, Martin and Waterman, Michael S. 1994. Sequence alignment and penalty choice. Journal of Molecular Biology, Vol. 235, Issue. 1, p. 1.

Dančík, Vlado and Paterson, Mike 1994. STACS 94. Vol. 775, Issue. , p. 669.

Paterson, Mike and Dančík, Vlado 1994. Mathematical Foundations of Computer Science 1994. Vol. 841, Issue. , p. 127.

Reich, Jens G. 1994. Computational Methods in Genome Research. p. 137.

Download full list

Article contents

Longest common subsequences of two random sequences

Summary

Keywords

Access options

References

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Longest common subsequences of two random sequences

Summary

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests