Skip to main content

The Influence of Protein Stability on Sequence Evolution: Applications to Phylogenetic Inference

  • Protocol
  • First Online:
Computational Methods in Protein Evolution

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1851))

Abstract

Phylogenetic inference from protein data is traditionally based on empirical substitution models of evolution that assume that protein sites evolve independently of each other and under the same substitution process. However, it is well known that the structural properties of a protein site in the native state affect its evolution, in particular the sequence entropy and the substitution rate. Starting from the seminal proposal by Halpern and Bruno, where structural properties are incorporated in the evolutionary model through site-specific amino acid frequencies, several models have been developed to tackle the influence of protein structure on sequence evolution. Here we describe stability-constrained substitution (SCS) models that explicitly consider the stability of the native state against both unfolded and misfolded states. One of them, the mean-field model, provides an independent sites approximation that can be readily incorporated in maximum likelihood methods of phylogenetic inference, including ancestral sequence reconstruction. Next, we describe its validation with simulated and real proteins and its limitations and advantages with respect to empirical models that lack site specificity. We finally provide guidelines and recommendations to analyze protein data accounting for stability constraints, including computer simulations and inferences of protein evolution based on maximum likelihood. Some practical examples are included to illustrate these procedures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Schmitt AO, Schuchhardt J, Ludwig A, Brockmann GA (2007) Protein evolution within and between species. J Theor Biol 249(2):376–383. https://doi.org/10.1016/j.jtbi.2007.08.001

    Article  CAS  PubMed  Google Scholar 

  2. Gao F, Bhattacharya T, Gaschen B, Taylor J, Moore JP, Novitsky V, Yusim K, Lang D, Foley B, Beddows S, Alam M, Haynes B, Hahn BH, Korber B (2003) Consensus and ancestral state HIV vaccines. Science 299(5612):1515–1518

    Google Scholar 

  3. Arenas M, Posada D (2010) Computational design of centralized HIV-1 genes. Curr HIV Res 8(8):613–621

    Article  CAS  PubMed  Google Scholar 

  4. Wilson C, Agafonov RV, Hoemberger M, Kutter S, Zorba A, Halpin J, Buosi V, Otten R, Waterman D, Theobald DL, Kern D (2015) Kinase dynamics. Using ancient protein kinases to unravel a modern cancer drug’s mechanism. Science 347(6224):882–886. https://doi.org/10.1126/science.aaa1823

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Perez-Jimenez R, Ingles-Prieto A, Zhao ZM, Sanchez-Romero I, Alegre-Cebollada J, Kosuri P, Garcia-Manyes S, Kappock TJ, Tanokura M, Holmgren A, Sanchez-Ruiz JM, Gaucher EA, Fernandez JM (2011) Single-molecule paleoenzymology probes the chemistry of resurrected enzymes. Nat Struct Mol Biol 18(5):592–596

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Wijma HJ, Floor RJ, Janssen DB (2013) Structure- and sequence-analysis inspired engineering of proteins for enhanced thermostability. Curr Opin Struct Biol 23(4):588–594. https://doi.org/10.1016/j.sbi.2013.04.008

    Article  CAS  PubMed  Google Scholar 

  7. Cole MF, Gaucher EA (2011) Utilizing natural diversity to evolve protein function: applications towards thermostability. Curr Opin Chem Biol 15(3):399–406. https://doi.org/10.1016/j.cbpa.2011.03.005

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Arenas M (2015) Trends in substitution models of molecular evolution. Front Genet 6:319. https://doi.org/10.3389/fgene.2015.00319

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning AP, Dokholyan NV, Echave J, Elofsson A, Gerloff DL, Goldstein RA, Grahnen JA, Holder MT, Lakner C, Lartillot N, Lovell SC, Naylor G, Perica T, Pollock DD, Pupko T, Regan L, Roger A, Rubinstein N, Shakhnovich E, Sjolander K, Sunyaev S, Teufel AI, Thorne JL, Thornton JW, Weinreich DM, Whelan S (2012) The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci 21(6):769–785

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Bastolla U (2014) Detecting selection on protein stability through statistical mechanical models of folding and evolution. Biomol Ther 4:291–314

    Google Scholar 

  11. Wilke CO (2012) Bringing molecules back into molecular evolution. PLoS Comput Biol 8(6):e1002572

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Sikosek T, Chan HS (2014) Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 11(100):20140419. https://doi.org/10.1098/rsif.2014.0419

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Goldstein RA (2011) The evolution and evolutionary consequences of marginal thermostability in proteins. Proteins 79(5):1396–1407

    Article  CAS  PubMed  Google Scholar 

  14. Serohijos AW, Shakhnovich EI (2014) Merging molecular mechanism and evolution: theory and computation at the interface of biophysics and evolutionary population genetics. Curr Opin Struct Biol 26:84–91. https://doi.org/10.1016/j.sbi.2014.05.005

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Bastolla U, Dehouck Y, Echave J (2017) What evolution tells us about protein physics, and protein physics tells us about evolution. Curr Opin Struct Biol 42:59–66. https://doi.org/10.1016/j.sbi.2016.10.020

    Article  CAS  PubMed  Google Scholar 

  16. Echave J (2008) Evolutionary divergence of protein structure: the linearly forced elastic network model. Chem Phys Lett 457(4):413–416. https://doi.org/10.1016/j.cplett.2008.04.042

    Article  CAS  Google Scholar 

  17. Tirion MM (1996) Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys Rev Lett 77(9):1905–1908

    Article  CAS  PubMed  Google Scholar 

  18. Bahar I, Rader AJ (2005) Coarse-grained normal mode analysis in structural biology. Curr Opin Struct Biol 15(5):586–592. https://doi.org/10.1016/j.sbi.2005.08.007

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Bornberg-Bauer E, Chan HS (1999) Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. Proc Natl Acad Sci U S A 96(19):10689–10694

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Bastolla U, Porto M, Eduardo Roman MH, Vendruscolo MH (2003) Connectivity of neutral networks, overdispersion, and structural conservation in protein evolution. J Mol Evol 56(3):243–254

    Article  CAS  PubMed  Google Scholar 

  21. Lemmon AR, Moriarty EC (2004) The importance of proper model assumption in bayesian phylogenetics. Syst Biol 53(2):265–277

    Article  PubMed  Google Scholar 

  22. Zhang J (1999) Performance of likelihood ratio tests of evolutionary hypotheses under inadequate substitution models. Mol Biol Evol 16(6):868–875

    Article  CAS  PubMed  Google Scholar 

  23. Bordner AJ, Mittelmann HD (2013) A new formulation of protein evolutionary models that account for structural constraints. Mol Biol Evol 31(3):736–749

    Article  PubMed  Google Scholar 

  24. Rodrigue N, Lartillot N, Bryant D, Philippe H (2005) Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene 347(2):207–217

    Article  CAS  PubMed  Google Scholar 

  25. Arenas M, Sanchez-Cobos A, Bastolla U (2015) Maximum likelihood phylogenetic inference with selection on protein folding stability. Mol Biol Evol 32(8):2195–2207. https://doi.org/10.1093/molbev/msv085

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Bastolla U, Porto M, Roman HE, Vendruscolo M (2006) A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank. BMC Evol Biol 6:43

    Article  PubMed  PubMed Central  Google Scholar 

  27. Anishchenko I, Ovchinnikov S, Kamisetty H, Baker D (2017) Origins of coevolution between residues distant in protein 3D structures. Proc Natl Acad Sci U S A 114:9122–9127. https://doi.org/10.1073/pnas.1702664114

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Wang ZO, Pollock DD (2005) Context dependence and coevolution among amino acid residues in proteins. Methods Enzymol 395:779–790. https://doi.org/10.1016/S0076-6879(05)95040-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Arenas M, Dos Santos HG, Posada D, Bastolla U (2013) Protein evolution along phylogenetic histories under structurally constrained substitution models. Bioinformatics 29(23):3020–3028

    Article  CAS  PubMed  Google Scholar 

  30. Echave J, Wilke CO (2017) Biophysical models of protein evolution: understanding the patterns of evolutionary sequence divergence. Annu Rev Biophys 46:85–103. https://doi.org/10.1146/annurev-biophys-070816-033819

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Bastolla U, Farwer J, Knapp EW, Vendruscolo M (2001) How to guarantee optimal stability for most representative structures in the Protein Data Bank. Proteins 44(2):79–96

    Article  CAS  PubMed  Google Scholar 

  32. Minning J, Porto M, Bastolla U (2013) Detecting selection for negative design in proteins through an improved model of the misfolded state. Proteins 81(7):1102–1112. https://doi.org/10.1002/prot.24244

    Article  CAS  PubMed  Google Scholar 

  33. Sella G, Hirsh AE (2005) The application of statistical physics to evolutionary biology. Proc Natl Acad Sci U S A 102(27):9541–9546

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Mustonen V, Lassig M (2005) Evolutionary population genetics of promoters: predicting binding sites and functional phylogenies. Proc Natl Acad Sci U S A 102(44):15936–15941. https://doi.org/10.1073/pnas.0505537102

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Arenas M (2012) Simulation of molecular data under diverse evolutionary scenarios. PLoS Comput Biol 8(5):e1002495

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Hoban S, Bertorelle G, Gaggiotti OE (2012) Computer simulations: tools for population and evolutionary genetics. Nat Rev Genet 13(2):110–122

    Article  CAS  PubMed  Google Scholar 

  37. Kingman JFC (1982) The coalescent. Stoch Process Appl 13:235–248

    Article  Google Scholar 

  38. Posada D, Wiuf C (2003) Simulating haplotype blocks in the human genome. Bioinformatics 19(2):289–290

    Article  CAS  PubMed  Google Scholar 

  39. Arenas M, Posada D (2010) Coalescent simulation of intracodon recombination. Genetics 184(2):429–437

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Arenas M (2013) Computer programs and methodologies for the simulation of DNA sequence data with recombination. Front Genet 4:9

    PubMed  PubMed Central  Google Scholar 

  41. Arenas M, Posada D (2014) Simulation of genome-wide evolution under heterogeneous substitution models and complex multispecies coalescent histories. Mol Biol Evol 31(5):1295–1301

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Hudson RR (1998) Island models and the coalescent process. Mol Ecol 7(4):413–418

    Article  Google Scholar 

  43. Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford

    Book  Google Scholar 

  44. Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21(9):2104–2105

    Article  CAS  PubMed  Google Scholar 

  45. Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  Google Scholar 

  46. Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A (2000) Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 29:291–325

    Article  CAS  PubMed  Google Scholar 

  47. Halpern AL, Bruno WJ (1998) Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol Biol Evol 15(7):910–917

    Article  CAS  PubMed  Google Scholar 

  48. Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18(5):691–699

    Article  CAS  PubMed  Google Scholar 

  49. Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8(3):275–282

    CAS  PubMed  Google Scholar 

  50. Arenas M, Weber CC, Liberles DA, Bastolla U (2017) ProtASR: an evolutionary framework for ancestral protein reconstruction with selection on folding stability. Syst Biol 66:1054–1064. https://doi.org/10.1093/sysbio/syw121

    Article  PubMed  Google Scholar 

  51. Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24(8):1586–1591

    Article  CAS  PubMed  Google Scholar 

  52. Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13(5):555–556

    CAS  PubMed  Google Scholar 

  53. Merkl R, Sterner R (2016) Ancestral protein reconstruction: techniques and applications. Biol Chem 397(1):1–21. https://doi.org/10.1515/hsz-2015-0158

    Article  CAS  PubMed  Google Scholar 

  54. Liberles DA (2007) Ancestral sequence reconstruction. Oxford University Press, Oxford

    Book  Google Scholar 

  55. Kothe DL, Li Y, Decker JM, Bibollet-Ruche F, Zammit KP, Salazar MG, Chen Y, Weng Z, Weaver EA, Gao F, Haynes BF, Shaw GM, Korber BT, Hahn BH (2006) Ancestral and consensus envelope immunogens for HIV-1 subtype C. Virology 352(2):438–449

    Article  CAS  PubMed  Google Scholar 

  56. Gaucher EA, Govindarajan S, Ganesh OK (2008) Palaeotemperature trend for Precambrian life inferred from resurrected proteins. Nature 451(7179):704–707

    Article  CAS  PubMed  Google Scholar 

  57. Hobbs JK, Shepherd C, Saul DJ, Demetras NJ, Haaning S, Monk CR, Daniel RM, Arcus VL (2012) On the origin and evolution of thermophily: reconstruction of functional precambrian enzymes from ancestors of Bacillus. Mol Biol Evol 29(2):825–835. https://doi.org/10.1093/molbev/msr253

    Article  CAS  PubMed  Google Scholar 

  58. Bastolla U, Moya A, Viguera E, van Ham RC (2004) Genomic determinants of protein folding thermodynamics in prokaryotic organisms. J Mol Biol 343(5):1451–1466

    Article  CAS  PubMed  Google Scholar 

  59. Williams PD, Pollock DD, Blackburne BP, Goldstein RA (2006) Assessing the accuracy of ancestral protein reconstruction methods. PLoS Comput Biol 2(6):e69

    Article  PubMed  PubMed Central  Google Scholar 

  60. Lartillot N, Lepage T, Blanquart S (2009) PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25(17):2286–2288. https://doi.org/10.1093/bioinformatics/btp368

    Article  CAS  PubMed  Google Scholar 

  61. Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21(6):1095–1109

    Article  CAS  PubMed  Google Scholar 

  62. Mustonen V, Lassig M (2009) From fitness landscapes to seascapes: non-equilibrium dynamics of selection and adaptation. Trends Genet 25(3):111–119. https://doi.org/10.1016/j.tig.2009.01.002

    Article  CAS  PubMed  Google Scholar 

  63. Arenas M, Patricio M, Posada D, Valiente G (2010) Characterization of phylogenetic networks with NetTest. BMC Bioinformatics 11(1):268

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

M.A. was supported by the grant “Ramón y Cajal” RYC-2015-18241 from the Spanish Government. U.B. is supported by the grant BIO2016-79043 from the Spanish Ministry of Economy.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ugo Bastolla or Miguel Arenas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Bastolla, U., Arenas, M. (2019). The Influence of Protein Stability on Sequence Evolution: Applications to Phylogenetic Inference. In: Sikosek, T. (eds) Computational Methods in Protein Evolution. Methods in Molecular Biology, vol 1851. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8736-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8736-8_11

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8735-1

  • Online ISBN: 978-1-4939-8736-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics