Abstract
Efficient methods for searching the chemical space of molecular compounds are needed to automate and accelerate the design of new functional molecules such as pharmaceuticals. Given the high cost in both resources and time for experimental efforts, computational approaches play a key role in guiding the selection of promising molecules for further investigation. Here, we construct a workflow to accelerate design by combining approximate quantum chemical methods [i.e. density-functional tight-binding (DFTB)], a graph convolutional neural network (GCNN) surrogate model for chemical property prediction, and a masked language model (MLM) for molecule generation. Property data from the DFTB calculations are used to train the surrogate model; the surrogate model is used to score candidates generated by the MLM. The surrogate reduces computation time by orders of magnitude compared to the DFTB calculations, enabling an increased search of chemical space. Furthermore, the MLM generates a diverse set of chemical modifications based on pre-training from a large compound library. We utilize the workflow to search for near-infrared photoactive molecules by minimizing the predicted HOMO-LUMO gap as the target property. Our results show that the workflow can generate optimized molecules outside of the original training set, which suggests that iterations of the workflow could be useful for searching vast chemical spaces in a wide range of design problems.
NOTICE OF COPYRIGHT: This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Basic Energy Sciences Advisory Committee et al.: Directing Matter and Energy: Five Challenges for Science and the Imagination. US Department of Energy: Washington, DC (2007)
Sanchez-Lengeling, B., Aspuru-Guzik, A.: Inverse molecular design using machine learning: generative models for matter engineering. Science 361(6400), 360–365 (2018)
Blanchard, A.E., Stanley, C., Bhowmik, D.: Using GANs with adaptive training data to search for new molecules. J. Cheminform. 13(1), 1–8 (2021). https://doi.org/10.1186/s13321-021-00494-3
Sun, W., et al.: Machine learning-assisted molecular design and efficiency prediction for high-performance organic photovoltaic materials. Sci. Adv. 5(11), eaay4275 (2019)
Pral, P.O., Barbatti, M.: Molecular excited states through a machine learning lens. Nat. Rev. Chem. 5(6), 388–405 (2021)
Zhavoronkov, A.: Artificial intelligence for drug discovery, biomarker development, and generation of novel chemistry. Mol. Pharm. 15(10), 4311–4313 (2018)
Jiménez-Luna, J., Grisoni, F., Schneider, G.: Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2(10), 573–584 (2020)
Bhowmik, D., et al.: Deep clustering of protein folding simulations. JBMC Bioinformatics 19(484), 47–58 (2018). https://doi.org/10.1186/s12859-018-2507-5
Zhuo, Y., Brgoch, J.: Opportunities for next-generation luminescent materials through artificial intelligence. J. Phys. Chem. Lett. 12(2), 764–772 (2021)
Cheng-Wei, J., et al.: Machine learning enables highly accurate predictions of photophysical properties of organic uorescent materials: emission wavelengths and quantum yields. J. Chem. Inf. Model 61(3), 1053–1065 (2021)
Acharya, A., et al.: Supercomputer-based ensemble docking drug discovery pipeline with application to COVID-19. J. Chem. Inf. Model 60(12), 5832–5852 (2020)
Meftahi, N., et al.: Machine learning property prediction for organic photovoltaic devices. NPJ Comput. Mater 6(1), 1–8 (2020)
Mazouin, B., Schöpfer, A.A., von Lilienfeld, O.A.: Selected Machine Learning of HOMO-LUMO gaps with Improved Data-Efficiency. arXiv preprint arXiv:2110.02596 (2021)
Andersson, K., Malmqvist, P.Å., Roos, B.O.:Second-order perturbation theory with a complete active space self-consistent field reference function. J. Chem. Phys. 96(2), 1218–1226 (1992)
Angeli, C., et al.: Introduction of n-electron valence states for multireference perturbation theory. J. Chem. Phys. 114(23), 10252–10264 (2001)
Botti, S., et al.: Time-dependent density-functional theory for extended systems. Rep. Prog. Phys. 70(3), 357 (2007)
Sokolov, M., et al.: Analytical time-dependent long-range corrected density functional tight binding (TD-LC-DFTB) gradients in DFTB+: implementation and benchmark for excited-state geometries and transition energies. J. Chem. Theory Comput. 17(4), 2266–2282 (2021)
Lupo Pasini, M., et al.: Multi-task graph neural networks for simultaneous prediction of global and atomic properties in ferromagnetic systems. Mach. Learn. Sci. Technol. 3(2), 025007 (2022). https://doi.org/10.1088/2632-2153/ac6a51
Pasini, M.L., et al.: HydraGNN. [Computer Software] (2021). https://doi.org/10.11578/dc.20211019.2, https://github.com/ORNL/HydraGNN
Li, B., Zhao, M., Zhang, F.: Rational design of nearinfrared- II organic molecular dyes for bioimaging and biosensing. ACS Mater. Lett. 2(8), 905–917 (2020)
Blanchard, A.E., et al.: Language models for the prediction of SARSCoV- 2 inhibitors. bioRxiv (2021). https://www.biorxiv.org/content/10.1101/2021.12.10.471928v1, https://doi.org/10.1101/2021.12.10.471928
Ramakrishnan, R., et al.: Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1(1), 1–7 (2014)
Hourahine, B., et al.: DFTB+, a software package for efficient approximate density functional theory based atomistic simulations. J. Chem. Phys. 152(12), 124101 (2020)
Weininger, D.: SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1998). https://doi.org/10.1021/ci00057a005
RDKit: Open-source cheminformatics (2022). https://www.rdkit.org
Enamine REAL Database. https://virtual-ow.org/, https://enamine.net/compound-collections/real-compounds/real-database. Accessed 01 Apr 2020
Porezag, D., et al.: Construction of tight-binding-like potentials on the basis of density-functional theory: application to carbon. Phys. Rev. B 51(19), 12947–12957 (1995). https://link.aps.org/doi/10.1103/PhysRevB.51.12947, https://doi.org/10.1103/PhysRevB.51.12947
Elstner, M., et al.: Self-consistent-charge density-functional tight-binding method for simulations of complex materials properties. Phys. Rev. B 58(11), 7260–7268 (1998). https://link.aps.org/doi/10.1103/PhysRevB.58.7260. https://doi.org/10.1103/PhysRevB.58.7260
Gaus, M., Cui, Q., Elstner, M.: DFTB3: extension of the self-consistent-charge density-functional tight-binding method (SCCDFTB). J. Chem. Theory Comput. 7(4), 931–948 (2011). ISSN: 1549-9618, 1549-9626. https://pubs.acs.org/doi/10.1021/ct100684s. https://doi.org/10.1021/ct100684s
Jones. R.O.: Density functional theory: its origins, rise to prominence, and future. Rev. Mod. Phys. 87(3), 897 (2015)
Nishimoto, Y., Fedorov, D.G., Irle, S.: Density-functional tight-binding combined with the fragment molecular orbital method. J. Chem. Theory Comput. 10(11), 4801–4812 (2014). ISSN: 1549–9618. https://pubs.acs.org/doi/10.1021/ct500489d, https://doi.org/10.1021/ct500489d
Nishimura, Y., Nakai, H.: DCDFTBMD: divide-and-conquer density functional tight-binding program for huge-system quantum mechanical molecular dynamics simulations. J. Comput. Chem. 40(15), 1538–1549 (2019). ISSN: 1096–987X. https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.25804, https://doi.org/10.1002/jcc.25804
Nishimura, Y., Nakai, H.: Quantum chemical calculations for up to one hundred million atoms using DCDFTBMD code on supercomputer Fugaku. Chem. Lett. 50(8), 1546–1550 (2021)
Frauenheim, T., et al.: Atomistic simulations of complex materials: ground-state and excited-state properties. J. Phys. Condens. Matter 14(11), 3015 (2002)
Lutsker, V., Aradi, B., Niehaus, T.A.: Implementation and benchmark of a long-range corrected functional in the density functional based tight-binding method. J. Chem. Phys. 143(18), 184107 (2015)
Rezac, J.: Empirical self-consistent correction for the description of hydrogen bonds in DFTB3. J. Chem. Theory Comput. 13(10), 4804–4817 (2017)
Cui, Q., Elstner, M.: Density functional tight binding: values of semi-empirical methods in an ab initio era. Phys. Chem. Chem. Phys. 16(28), 14368–14377 (2014)
Nishimoto, Y., Irle, S.: Quantum chemical prediction of vibrational spectra of large molecular systems with radical or metallic electronic structure. Chem. Phys. Lett. 667, 317–321 (2017)
Camacho, C., et al.: Origin of the size-dependent fluorescence blueshift in [n] cycloparaphenylenes. Chem. Sci. 4(1), 187–195 (2013)
Chou, C.-P., et al.: Automatized parameterization of DFTB using particle swarm optimization. J. Chem. Theory Comput. 12(1), 53–64 (2016)
Larsen, A.H., et al.: The atomic simulation environment—a Python library for working with atoms. J. Phys. Condens. Matter 29(27), 273002 (2017)
Kubillus, M., et al.: Parameterization of the DFTB3 method for Br, Ca, Cl, F, I, K, and Na in organic and biological systems. J. Chem. Theory Comput. 11(1), 332–342 (2015). ISSN: 1549–9618. https://doi.org/10.1021/ct5009137. Accessed 06 Mar 2021
Xie, T., Grossman, J.C.: Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120(14), 145301 (2018). https://link.aps.org/doi/10.1103/PhysRevLett.120.145301, https://doi.org/10.1103/PhysRevLett.120.145301
Chen, C., et al.: Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater. 31(9), 3564–3572 (2019). https://doi.org/10.1021/acs.chemmater.9b01294
Choudhary, K., DeCost, B.: Atomistic line graph neural network for improved materials property predictions. NPJ Comput. Mater. 7(1), 1–8 (2021)
Corso, G., et al.: Principal Neighbourhood Aggregation for Graph Nets. en. arXiv:2004.05718 [cs, stat] (2020). arXiv: 2004.05718. http://arxiv.org/abs/2004.05718. Accessed 21 Feb 2021
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process Syst. 32 (2019).http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performancedeep-learning-library.pdf. Ed. by H. Wallach et al. Curran Associates Inc., pp. 8024–8035
Fey, M., Lenssen, J.E.: Fast graph representation learning with Py-Torch geometric. In: ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)
PyTorch Geometric. https://pytorch-geometric.readthedocs.io/en/latest/
Godoy, W.F., et al.: ADIOS 2: the adaptable input output system. A framework for high-performance data management. SoftwareX 12, 100561 (2020). ISSN: 2352–7110. https://doi.org/10.1016/j.softx.2020.100561, https://www.sciencedirect.com/science/article/pii/S2352711019302560
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies–Proceedings of the Conference, no. 1, pp. 4171–4186. Mlm (2019). arXiv: 1810.04805
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. arXiv (2020). ISSN: 23318422. https://arxiv.org/abs/2007.15779
Schuster, M., Nakajima, K.: Japanese and Korean voice search. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5149–5152 (2012). https://doi.org/10.1109/ICASSP.2012.6289079
Wu, Y., et al.: Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (2016). arXiv: 1609.08144. http://arxiv.org/abs/1609.08144
Blanchard, A.E., et al.: Automating genetic algorithm mutations for molecules using a masked language model. IEEE Trans. Evol. Comput. (2022). https://doi.org/10.1109/TEVC.2022.3144045
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Online: Association for Computational Linguistics, pp. 38–45, October 2020. https://www.aclweb.org/anthology/2020.emnlp-demos.6
Ertl, P., Schuffenhauer, A.: Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1(8) (2009). https://doi.org/10.1186/1758-2946-1-8
Acknowledgements
We thank Pilsun Yoo for fruitful discussions on the synthesizability score. This work was supported in part by the Office of Science of the Department of Energy and by the Laboratory Directed Research and Development (LDRD) Program of Oak Ridge National Laboratory. This research is sponsored by the Artificial Intelligence Initiative as part of the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725. The research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This work used resources of the Oak Ridge Leadership Computing Facility, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Blanchard, A.E. et al. (2022). Computational Workflow for Accelerated Molecular Design Using Quantum Chemical Simulations and Deep Learning Models. In: Doug, K., Al, G., Pophale, S., Liu, H., Parete-Koon, S. (eds) Accelerating Science and Engineering Discoveries Through Integrated Research Infrastructure for Experiment, Big Data, Modeling and Simulation. SMC 2022. Communications in Computer and Information Science, vol 1690. Springer, Cham. https://doi.org/10.1007/978-3-031-23606-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-23606-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23605-1
Online ISBN: 978-3-031-23606-8
eBook Packages: Computer ScienceComputer Science (R0)