Abstract
We studied the substitution patterns in 7661 well-conserved human–mouse alignments corresponding to the intergenic regions of human chromosome 22. Alignments with a high average GC content tend to have a higher human GC content than mouse GC content, indicating a lack of stationarity. Segmenting the alignments into four groups of GC content and fitting the general reversible substitution model (REV) separately gave significantly better fits than the overall fit and the levels of fit are close to that expected under an REV model. In addition, most of the fitted rate matrices are not of the HKY type but are remarkably strand-symmetric, and we constructed a number of substitution matrices that should be useful for genomic DNA sequence alignment. We did not find obvious signs of temporal inhomogeneity in the substitution rates and concluded that the conserved intergenic regions in human chromosome 22 and mouse appear to have evolved from their common ancestors via a process that is approximately reversible and strand-symmetric, assuming site homogeneity and independence.
Similar content being viewed by others
References
N Bray L Pachter (2003) ArticleTitleThe MAVID multiple alignment server. Nucleic Acids Res 31 3525–3526 Occurrence Handle10.1093/nar/gkg623 Occurrence Handle1:CAS:528:DC%2BD3sXltVWisLc%3D Occurrence Handle12824358
C Burge S Karlin (1997) ArticleTitlePrediction of complete gene structures in human genomic DNA. J Mol Biol 268 78–94 Occurrence Handle10.1006/jmbi.1997.0951 Occurrence Handle1:CAS:528:DyaK2sXjtlSqtL4%3D Occurrence Handle9149143
J Castresana (2002) ArticleTitleEstimation of genetic distances from human and mouse introns. Genome Biol 3 Occurrence Handle12093383
Chiaromonte F, Yap V, Miller W (2002) Scoring pairwise genomic sequence alignments. Proceedings of Pacific Symposium on Biocomputing, pp 115–126
M Dayhoff R Eck (1968) A model of evolutionary change in proteins. M Dayhoff R Eck (Eds) Atlas of protein sequence and structure, Vol 3. National Biomedical Research Foundation Silver Sping, MD 33–41
I Dubchak M Brudno G Loots L Pachter C Mayor E Rubin K Frazer (2000) ArticleTitleActive conservation of noncoding sequences revealed by three-way species comparison. Genome Res 10 1304–1306
J Felsenstein (1981) ArticleTitleEvolutionary trees from DNA sequences. J Mol Evol 17 368–376 Occurrence Handle1:CAS:528:DyaL3MXls1Cisr8%3D Occurrence Handle7288891
M Francino H Ochman (1997) ArticleTitleStrand asymmetries in DNA evolution. Trends Genet 13 240–245 Occurrence Handle10.1016/S0168-9525(97)01118-9 Occurrence Handle1:CAS:528:DyaK2sXktVSktL8%3D Occurrence Handle9196330
M Francino H Ochman (2000) ArticleTitleStrand symmetry around the β-globin origin of replication in primates. Mol Biol Evol 17 416–422 Occurrence Handle10723742
N Goldman (1993a) ArticleTitleSimple diagnostic statistical tests of models for DNA substitution. J Mol Evol 37 650–661 Occurrence Handle1:CAS:528:DyaK2cXisF2gtA%3D%3D
N Goldman (1993b) ArticleTitleStatistical tests of models of DNA substitution. J Mol Evol 36 182–198 Occurrence Handle1:CAS:528:DyaK3sXps1Cmsw%3D%3D
O Gotoh (1982) ArticleTitleAn improved algorithm for matching biological sequences. J Mol Biol 162 705–708 Occurrence Handle7166760
R Hardison J Oeltjen W Miller (1997) ArticleTitleLong human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res 7 959–966 Occurrence Handle1:CAS:528:DyaK2sXntFaitrs%3D Occurrence Handle9331366
R Hardison K Roskin S Yang M Diekhans W Kent R Weber L Elnitski J Li M O’Connor D Kolbe S Schwartz T Furey S Whelan N Goldman A Smit W Miller F Chiaromonte D Haussler (2003) ArticleTitleCovariation in frequencies of substitution, deletion, transposition, and recombination during Eutherian evolution. Genome Res 13 13–26 Occurrence Handle10.1101/gr.844103 Occurrence Handle1:CAS:528:DC%2BD3sXnvFGmsg%3D%3D Occurrence Handle12529302
M Hasegawa H Kishino T Yano (1985) ArticleTitleDating the human-ape splitting by molecular clock of mitochondrial DNA. J Mol Evol 22 160–174 Occurrence Handle1:CAS:528:DyaL2MXmtFSns7g%3D Occurrence Handle3934395
S Henikoff J Henikoff (1992) ArticleTitleAmino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89 10915–10919 Occurrence Handle1:CAS:528:DyaK3sXjsFCgsQ%3D%3D
I Holmes G Rubin (2002) ArticleTitleAn expectation maximization algorithm for training hidden substitution models. J Mol Biol 317 753–764 Occurrence Handle10.1006/jmbi.2002.5405 Occurrence Handle1:CAS:528:DC%2BD38XivVaksr4%3D Occurrence Handle11955022
T Jukes C Cantor (1969) Evolution of protein molecules. H Munro (Eds) Mammalian protein metabolism Academic Press 21–132
M Kimura (1980) ArticleTitleA simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16 111–120 Occurrence Handle7463489
S Kumar S Subramaniam (2002) ArticleTitleMutation rates in mammalian genomes. Proc Natl Acad Sci USA 99 803–808 Occurrence Handle10.1073/pnas.022629899 Occurrence Handle1:CAS:528:DC%2BD38Xht1Wis74%3D
M Lercher JE Williams L Hurst (2001) ArticleTitleLocal similarity in evolutionary rates extends over whole chromosomes in human-rodent and mouse-rat comparisons: Implications for understanding the mechanistic basis of the male mutation bias. Mol Biol Evol 18 2032–2039 Occurrence Handle1:CAS:528:DC%2BD3MXotVSlsbw%3D Occurrence Handle11606699
P Lió N Goldman (1998) ArticleTitleModels of molecular evolution and phylogeny. Genome Res 8 1233–1244 Occurrence Handle1:CAS:528:DyaK1MXmtVSitQ%3D%3D Occurrence Handle9872979
J Lobry (1995) ArticleTitleProperties of a general model of DNA evolution under no-strand-bias conditions. J Mol Evol 40 326–330 Occurrence Handle1:CAS:528:DyaK2MXksl2gsLw%3D Occurrence Handle7723059
J Lobry C Lobry (1999) ArticleTitleEvolution of DNA base composition under no-strand-bias conditions when the substitution rates are not constant. Mol Biol Evol 16 719–723 Occurrence Handle10368950
T Müller M Vingron (2000) ArticleTitleModeling amino acid replacement. J Comput Biol 7 761–776 Occurrence Handle10.1089/10665270050514918 Occurrence Handle11382360
S Needleman C Wunsch (1970) ArticleTitleA general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48 443–453 Occurrence Handle1:CAS:528:DyaE3cXktVShu74%3D Occurrence Handle5420325
C Rao (1973) Large sample theory and methods Linear statistical inference and its applications, ed2. John Wiley & Sons New York Occurrence Handle1:CAS:528:DyaE3sXkvVOhsbg%3D
N Smith M Webster H Ellegren (2002) ArticleTitleDeterministic mutation rate variation in the human genome. Genome Res 12 1350–1356 Occurrence Handle10.1101/gr.220502 Occurrence Handle1:CAS:528:DC%2BD38Xnt1elsbo%3D Occurrence Handle12213772
T Smith M Waterman (1981) ArticleTitleIdentification of common molecular subsequences. J Mol Biol 147 195–197 Occurrence Handle1:STN:280:Bi6B28jjvVE%3D Occurrence Handle7265238
K Tamura M Nei (1993) ArticleTitleEstimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10 512–526 Occurrence Handle1:CAS:528:DyaK3sXks1CksL4%3D Occurrence Handle8336541
S Tavaré (1986) ArticleTitleSome probabilistic and statistical problems in the analysis of DNA sequences. Lect Math Life Sci 17 57–86
Z Yang (1994) ArticleTitleEstimating the pattern of nucleotide substitution. J Mol Evol 39 105–111 Occurrence Handle8064867
Z Yang (1996) ArticleTitleMaximum-likelihood models for combined analyses of multiple sequence data. J Mol Evol 42 587–596 Occurrence Handle1:CAS:528:DyaK28XjvVajsL0%3D Occurrence Handle8662011
Acknowledgements
We thank Nick Bray, Colin Dewey, and Lior Pachter for generating the human–mouse–rat alignment. We also thank the Statistics Computing Facility of the Statistics Department of UC Berkeley for the excellent computing resources. The initial motivation for this work is due to Webb Miller.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yap, V., Speed, T.P. Modeling DNA Base Substitution in Large Genomic Regions from Two Organisms . J Mol Evol 58, 12–18 (2004). https://doi.org/10.1007/s00239-003-2520-8
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s00239-003-2520-8