ABSTRACT
Predicting how a point mutation alters a protein's stability can guide drug design initiatives which aim to counter the effects of serious diseases. Mutagenesis studies give insights about the effects of amino acid substitutions, but such wet-lab work is prohibitive due to the time and costs needed to assess the consequences of even a single mutation. Computational methods for predicting the effects of a mutation are available, with promising accuracy rates. In this work we study the utility of several machine learning methods and their ability to predict the effects of mutations. We in silico generate mutant protein structures, and compute several rigidity metrics for each of them. Our approach does not require costly calculations of energy functions that rely on atomic-level statistical mechanics and molecular energetics. Our metrics are features for support vector regression, random forest, and deep neural network methods. We validate the effects of our in silico mutations against experimental Delta Delta G stability data. We attain Pearson Correlations upwards of 0.69.
- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D.Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http://tensorflow.org/. Software available from tensorflow.org.Google Scholar
- T. Alber, S. Dao-pin, J.A. Wozniak, S.P. Cook, and B.W. Matthews. 1987. Contributions of hydrogen bonds of Thr 157 to the thermodynamic stability of phage T4 lysozyme. Nature 330 (1987), 41--46.Google ScholarCross Ref
- E. Andersson, R. Hsieh, H. Szeto, R. Farhoodi N. Haspel, and F. Jagodzinski. 2016. Assessing how multiple mutations affect protein stability using rigid cluster size distributions. In Computational Advances in Bio and Medical Sciences (ICCABS), 2016 IEEE 6th International Conference on. IEEE, 1--6.Google Scholar
- E. Andersson and F. Jagodzinski. 2017. ProMuteHT: A High Throughput Compute Pipeline for Generating Protein Mutants in silico. In CSBW (Computational Structural Bioinformatics Workshop), in proc. of ACM-BCB (ACM International conference on Bioinformatics and Computational Biology). Google ScholarDigital Library
- B. Akbal-Delibas, F. Jagodzinski, and N. Haspel. 2013. A Conservation and Rigidity Based Method for Detecting Critical Protein Residues. BMC Structural Biology 13(Suppl 1) (2013), S6.Google Scholar
- D. Basak, S. Pal, and D.C. Patranabis. 2007. Support vector regression. Neural Information Processing-Letters and Reviews 11, 10 (2007), 203--224.Google Scholar
- J.A. Bell, W.J. Becktel, U. Sauer, W.A. Baase, and B.W. Matthews. 1992. Dissection of helix capping in T4 lysozyme by structural and thermodynamic analysis of six amino acid substitutions at Thr 59. Biochemistry 31 (1992), 3590--3596. Issue 14.Google ScholarCross Ref
- L. Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32. Google ScholarDigital Library
- J. Brender and Y. Zhang. 2015. Predicting the effect of mutations on protein-protein binding interactions through structure-based interface profiles. PLoS Computational Biology 11, 10 (2015), e1004494.Google ScholarCross Ref
- J. Cheng, A. Randall, and P. Baldi. 2006. Prediction of Protein Stability Changes for Single-Site Mutations Using Support Vector Machines. PROTEINS: Structure, Function, and Bioinformatics 62 (2006), 1125--1132.Google ScholarCross Ref
- R.L. Jr. Dunbrack and M. Karplus. 1994. Conformational analysis of the backbone-dependent rotamer preferences of protein sidechains. Nature Structural Biology 1 (1994), 334--340. Issue 5.Google ScholarCross Ref
- A.E. Eriksson, W.A. Baase, X.J. Zhang, D.W. Heinz, E.P. Baldwin, and B.W. Matthews. 1992. Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect. Science 255 (1992), 178--183.Google ScholarCross Ref
- N. Fox, F. Jagodzinski, and I. Streinu. 2012. Kinari-lib: a C++ library for pebble game rigidity analysis of mechanical models. Minisymposium on Publicly Available Geometric/Topological Software (2012).Google Scholar
- S.C. Garman and D.N. Garboczi. 2002. Structural basis of Fabry disease. Molecular Genetics and Metabolism 77, 1 (2002), 3--11.Google ScholarCross Ref
- D. Gilis and M. Rooman. 1997. Predicting protein stability changes upon mutation usings database derived potentials : Solvent accessiblity determines the importances of local versus non-local interactions along the sequence. Journal Molecular Biology 272 (1997), 276--290. Issue 2.Google ScholarCross Ref
- D.J. Jacobs, A.J. Rader, M.F. Thorpe, and L.A. Kuhn. 2001. Protein Flexibility Predictions Using Graph Theory. Proteins 44 (2001), 150--165.Google ScholarCross Ref
- F. Jagodzinski, B. Akbal-Delibas, and N. Haspel. 2013. An Evolutionary Conservation & Rigidity Analysis Machine Learning Approach for Detecting Critical Protein Residues. In CSBW (Computational Structural Bioinformatics Workshop), in proc. of ACM-BCB (ACM International conference on Bioinformatics and Computational Biology). 780--786. Google ScholarDigital Library
- F. Jagodzinski, J. Hardy, and I. Streinu. 2012. Using rigidity analysis to probe mutation-induced structural changes in proteins. Journal of Bioinformatics and Computational Biology 10 (2012). Issue 3.Google Scholar
- J. Janin and S. Wodak. 1978. Conformation of amino acid side-chains in proteins. Journal of Molecular Biology 125 (1978), 357--386. Issue 3.Google ScholarCross Ref
- L. Jia, R. Yarlagadda, and C.C. Reed. 2015. Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools. PloS One 10, 9 (2015), e0138022.Google ScholarCross Ref
- D. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
- G. Krivov, M. Shapovalov, and R.L. Dunbrack. 2009. Improved prediction of protein side-chain conformations with SCWRL4. Proteins: Structure, Function, and Bioinformatics 77, 4 (2009), 778--795.Google ScholarCross Ref
- M.D. Kumar, K.A. Bava, M.M. Gromiha, P. Prabakaran, K. Kitajima, H. Uedaira, and A. Sarai. 2005. Protherm and Pronit : Thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Research 34 (2005), D204--D206.Google ScholarCross Ref
- C. Lee and M. Levitt. 1991. Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core. Nature 352 (1991), 448--451.Google ScholarCross Ref
- Y. Li and J. Fang. 2012. PROTS-RF: a robust model for predicting mutation-induced protein stability changes. PloS one 7, 10 (2012), e47247.Google ScholarCross Ref
- M. Matsumura, W.J. Becktel, and B.W. Matthews. 1988. Hydrophobic stabilization in T4 lysozyme determined directly by multiple substitutions of Ile 3. Nature 334 (1988), 406--410.Google ScholarCross Ref
- B. Mooers, W.A. Baase, J.W. Wray, and B.W. Matthews. 2009. Contributions of all 20 amino acids at site 96 to the stability and structure of T4 lysozyme. Protein Science 18 (2009), 871--880. Issue 5.Google ScholarCross Ref
- H. Nicholson, E. Soderlind, D.E. Tronrud, and B.W. Matthews. 1989. Contributions of left-handed helical residues to the structure and stability of bacteriophage T4 lysozyme. Journal of Molecular Biology 210 (1989), 181--193. Issue 1.Google ScholarCross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830. Google ScholarDigital Library
- J.C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R.D. Skeel, L. Kale, and K. Schulten. 2005. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry 26, 16 (2005), 1781--1802.Google ScholarCross Ref
- J.W. Ponder and F.M. Richards. 1987. Tertiary templates for proteins: Use of packing criteria in the enumeration of allowed sequences for different structural classes. Journal of Molecular Biology 193 (1987), 775--791. Issue 4.Google ScholarCross Ref
- M. Prevost, S.J. Wodak, B. Tidor, and M. Karplus. 1991. Contribution of the hydrophobic effect to protein stability: analysis based on simulations of the Ile-96-Ala mutation in barnase. Proceedings of the National Academy of Sciences 88 (1991), 10880--10884. Issue 23.Google ScholarCross Ref
- J. Snoek, H. Larochelle, and R.P. Adams. 2012. Practical Bayesian Optimization of Machine Learning Algorithms. In Advances in neural information processing systems. 2951--2959. Google ScholarDigital Library
- M. Song, C.M. Breneman, J. Bi, N. Sukumar, K.P. Bennett, S. Cramer, and N. Tugcu. 2002. Prediction of protein retention times in anion-exchange chromatography systems using support vector regression. Journal of Chemical Information and computer sciences 42, 6 (2002), 1347--1357.Google ScholarCross Ref
- C.M. Topham, N. Srinivasan, and T. Blundell. 1997. Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitutions and propensity tables. Protein Engineering 10, 1 (1997), 7--21.Google ScholarCross Ref
- C.L. Worth, R. Preissner, and L. Blundell. 2011. SDM-a server for predicting effects of mutations on protein stability and malfunction. Nucleic Acids Research 39 (2011), W215--W222. Issue Web Server Issue.Google ScholarCross Ref
Index Terms
- Predicting the Effect of Point Mutations on Protein Structural Stability
Recommendations
Ensemble Voting Schemes that Improve Machine Learning Models for Predicting the Effects of Protein Mutations
BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsUnderstanding how a mutation affects a protein's structural stability can guide pharmaceutical drug design initiatives that aim to engineer medicines for combating a variety of diseases. Conducting wet-lab mutagenesis experiments in physical proteins ...
Application of HPD Model for Predicting Protein Mutations
AbstractThe proteins are one of the most important part of the organisms. They are complex macromolecules that perform a vital function in all living beings. They are composed of a chain of amino acids. The biological function of a protein is determined ...
Structural characterization of ANGPTL8 (betatrophin) with its interacting partner lipoprotein lipase
Graphical abstractDisplay Omitted HighlightsStructural characterization of ANGPTL8 was performed.We used the computational strategy which includes sequence analysis, secondary structure prediction, comparative modeling and protein-protein interactions (...
Comments