Skip to main content
Log in

Modeling DNA Base Substitution in Large Genomic Regions from Two Organisms

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

We studied the substitution patterns in 7661 well-conserved human–mouse alignments corresponding to the intergenic regions of human chromosome 22. Alignments with a high average GC content tend to have a higher human GC content than mouse GC content, indicating a lack of stationarity. Segmenting the alignments into four groups of GC content and fitting the general reversible substitution model (REV) separately gave significantly better fits than the overall fit and the levels of fit are close to that expected under an REV model. In addition, most of the fitted rate matrices are not of the HKY type but are remarkably strand-symmetric, and we constructed a number of substitution matrices that should be useful for genomic DNA sequence alignment. We did not find obvious signs of temporal inhomogeneity in the substitution rates and concluded that the conserved intergenic regions in human chromosome 22 and mouse appear to have evolved from their common ancestors via a process that is approximately reversible and strand-symmetric, assuming site homogeneity and independence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1

Similar content being viewed by others

References

  1. N Bray L Pachter (2003) ArticleTitleThe MAVID multiple alignment server. Nucleic Acids Res 31 3525–3526 Occurrence Handle10.1093/nar/gkg623 Occurrence Handle1:CAS:528:DC%2BD3sXltVWisLc%3D Occurrence Handle12824358

    Article  CAS  PubMed  Google Scholar 

  2. C Burge S Karlin (1997) ArticleTitlePrediction of complete gene structures in human genomic DNA. J Mol Biol 268 78–94 Occurrence Handle10.1006/jmbi.1997.0951 Occurrence Handle1:CAS:528:DyaK2sXjtlSqtL4%3D Occurrence Handle9149143

    Article  CAS  PubMed  Google Scholar 

  3. J Castresana (2002) ArticleTitleEstimation of genetic distances from human and mouse introns. Genome Biol 3 Occurrence Handle12093383

    PubMed  Google Scholar 

  4. Chiaromonte F, Yap V, Miller W (2002) Scoring pairwise genomic sequence alignments. Proceedings of Pacific Symposium on Biocomputing, pp 115–126

  5. M Dayhoff R Eck (1968) A model of evolutionary change in proteins. M Dayhoff R Eck (Eds) Atlas of protein sequence and structure, Vol 3. National Biomedical Research Foundation Silver Sping, MD 33–41

    Google Scholar 

  6. I Dubchak M Brudno G Loots L Pachter C Mayor E Rubin K Frazer (2000) ArticleTitleActive conservation of noncoding sequences revealed by three-way species comparison. Genome Res 10 1304–1306

    Google Scholar 

  7. J Felsenstein (1981) ArticleTitleEvolutionary trees from DNA sequences. J Mol Evol 17 368–376 Occurrence Handle1:CAS:528:DyaL3MXls1Cisr8%3D Occurrence Handle7288891

    CAS  PubMed  Google Scholar 

  8. M Francino H Ochman (1997) ArticleTitleStrand asymmetries in DNA evolution. Trends Genet 13 240–245 Occurrence Handle10.1016/S0168-9525(97)01118-9 Occurrence Handle1:CAS:528:DyaK2sXktVSktL8%3D Occurrence Handle9196330

    Article  CAS  PubMed  Google Scholar 

  9. M Francino H Ochman (2000) ArticleTitleStrand symmetry around the β-globin origin of replication in primates. Mol Biol Evol 17 416–422 Occurrence Handle10723742

    PubMed  Google Scholar 

  10. N Goldman (1993a) ArticleTitleSimple diagnostic statistical tests of models for DNA substitution. J Mol Evol 37 650–661 Occurrence Handle1:CAS:528:DyaK2cXisF2gtA%3D%3D

    CAS  Google Scholar 

  11. N Goldman (1993b) ArticleTitleStatistical tests of models of DNA substitution. J Mol Evol 36 182–198 Occurrence Handle1:CAS:528:DyaK3sXps1Cmsw%3D%3D

    CAS  Google Scholar 

  12. O Gotoh (1982) ArticleTitleAn improved algorithm for matching biological sequences. J Mol Biol 162 705–708 Occurrence Handle7166760

    PubMed  Google Scholar 

  13. R Hardison J Oeltjen W Miller (1997) ArticleTitleLong human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res 7 959–966 Occurrence Handle1:CAS:528:DyaK2sXntFaitrs%3D Occurrence Handle9331366

    CAS  PubMed  Google Scholar 

  14. R Hardison K Roskin S Yang M Diekhans W Kent R Weber L Elnitski J Li M O’Connor D Kolbe S Schwartz T Furey S Whelan N Goldman A Smit W Miller F Chiaromonte D Haussler (2003) ArticleTitleCovariation in frequencies of substitution, deletion, transposition, and recombination during Eutherian evolution. Genome Res 13 13–26 Occurrence Handle10.1101/gr.844103 Occurrence Handle1:CAS:528:DC%2BD3sXnvFGmsg%3D%3D Occurrence Handle12529302

    Article  CAS  PubMed  Google Scholar 

  15. M Hasegawa H Kishino T Yano (1985) ArticleTitleDating the human-ape splitting by molecular clock of mitochondrial DNA. J Mol Evol 22 160–174 Occurrence Handle1:CAS:528:DyaL2MXmtFSns7g%3D Occurrence Handle3934395

    CAS  PubMed  Google Scholar 

  16. S Henikoff J Henikoff (1992) ArticleTitleAmino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89 10915–10919 Occurrence Handle1:CAS:528:DyaK3sXjsFCgsQ%3D%3D

    CAS  Google Scholar 

  17. I Holmes G Rubin (2002) ArticleTitleAn expectation maximization algorithm for training hidden substitution models. J Mol Biol 317 753–764 Occurrence Handle10.1006/jmbi.2002.5405 Occurrence Handle1:CAS:528:DC%2BD38XivVaksr4%3D Occurrence Handle11955022

    Article  CAS  PubMed  Google Scholar 

  18. T Jukes C Cantor (1969) Evolution of protein molecules. H Munro (Eds) Mammalian protein metabolism Academic Press 21–132

    Google Scholar 

  19. M Kimura (1980) ArticleTitleA simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16 111–120 Occurrence Handle7463489

    PubMed  Google Scholar 

  20. S Kumar S Subramaniam (2002) ArticleTitleMutation rates in mammalian genomes. Proc Natl Acad Sci USA 99 803–808 Occurrence Handle10.1073/pnas.022629899 Occurrence Handle1:CAS:528:DC%2BD38Xht1Wis74%3D

    Article  CAS  Google Scholar 

  21. M Lercher JE Williams L Hurst (2001) ArticleTitleLocal similarity in evolutionary rates extends over whole chromosomes in human-rodent and mouse-rat comparisons: Implications for understanding the mechanistic basis of the male mutation bias. Mol Biol Evol 18 2032–2039 Occurrence Handle1:CAS:528:DC%2BD3MXotVSlsbw%3D Occurrence Handle11606699

    CAS  PubMed  Google Scholar 

  22. P Lió N Goldman (1998) ArticleTitleModels of molecular evolution and phylogeny. Genome Res 8 1233–1244 Occurrence Handle1:CAS:528:DyaK1MXmtVSitQ%3D%3D Occurrence Handle9872979

    CAS  PubMed  Google Scholar 

  23. J Lobry (1995) ArticleTitleProperties of a general model of DNA evolution under no-strand-bias conditions. J Mol Evol 40 326–330 Occurrence Handle1:CAS:528:DyaK2MXksl2gsLw%3D Occurrence Handle7723059

    CAS  PubMed  Google Scholar 

  24. J Lobry C Lobry (1999) ArticleTitleEvolution of DNA base composition under no-strand-bias conditions when the substitution rates are not constant. Mol Biol Evol 16 719–723 Occurrence Handle10368950

    PubMed  Google Scholar 

  25. T Müller M Vingron (2000) ArticleTitleModeling amino acid replacement. J Comput Biol 7 761–776 Occurrence Handle10.1089/10665270050514918 Occurrence Handle11382360

    Article  PubMed  Google Scholar 

  26. S Needleman C Wunsch (1970) ArticleTitleA general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48 443–453 Occurrence Handle1:CAS:528:DyaE3cXktVShu74%3D Occurrence Handle5420325

    CAS  PubMed  Google Scholar 

  27. C Rao (1973) Large sample theory and methods Linear statistical inference and its applications, ed2. John Wiley & Sons New York Occurrence Handle1:CAS:528:DyaE3sXkvVOhsbg%3D

    CAS  Google Scholar 

  28. N Smith M Webster H Ellegren (2002) ArticleTitleDeterministic mutation rate variation in the human genome. Genome Res 12 1350–1356 Occurrence Handle10.1101/gr.220502 Occurrence Handle1:CAS:528:DC%2BD38Xnt1elsbo%3D Occurrence Handle12213772

    Article  CAS  PubMed  Google Scholar 

  29. T Smith M Waterman (1981) ArticleTitleIdentification of common molecular subsequences. J Mol Biol 147 195–197 Occurrence Handle1:STN:280:Bi6B28jjvVE%3D Occurrence Handle7265238

    CAS  PubMed  Google Scholar 

  30. K Tamura M Nei (1993) ArticleTitleEstimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10 512–526 Occurrence Handle1:CAS:528:DyaK3sXks1CksL4%3D Occurrence Handle8336541

    CAS  PubMed  Google Scholar 

  31. S Tavaré (1986) ArticleTitleSome probabilistic and statistical problems in the analysis of DNA sequences. Lect Math Life Sci 17 57–86

    Google Scholar 

  32. Z Yang (1994) ArticleTitleEstimating the pattern of nucleotide substitution. J Mol Evol 39 105–111 Occurrence Handle8064867

    PubMed  Google Scholar 

  33. Z Yang (1996) ArticleTitleMaximum-likelihood models for combined analyses of multiple sequence data. J Mol Evol 42 587–596 Occurrence Handle1:CAS:528:DyaK28XjvVajsL0%3D Occurrence Handle8662011

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank Nick Bray, Colin Dewey, and Lior Pachter for generating the human–mouse–rat alignment. We also thank the Statistics Computing Facility of the Statistics Department of UC Berkeley for the excellent computing resources. The initial motivation for this work is due to Webb Miller.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Von Bing Yap.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yap, V., Speed, T.P. Modeling DNA Base Substitution in Large Genomic Regions from Two Organisms . J Mol Evol 58, 12–18 (2004). https://doi.org/10.1007/s00239-003-2520-8

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-003-2520-8

Keywords

Navigation