Skip to main content

Abstract

Efficient methods for searching the chemical space of molecular compounds are needed to automate and accelerate the design of new functional molecules such as pharmaceuticals. Given the high cost in both resources and time for experimental efforts, computational approaches play a key role in guiding the selection of promising molecules for further investigation. Here, we construct a workflow to accelerate design by combining approximate quantum chemical methods [i.e. density-functional tight-binding (DFTB)], a graph convolutional neural network (GCNN) surrogate model for chemical property prediction, and a masked language model (MLM) for molecule generation. Property data from the DFTB calculations are used to train the surrogate model; the surrogate model is used to score candidates generated by the MLM. The surrogate reduces computation time by orders of magnitude compared to the DFTB calculations, enabling an increased search of chemical space. Furthermore, the MLM generates a diverse set of chemical modifications based on pre-training from a large compound library. We utilize the workflow to search for near-infrared photoactive molecules by minimizing the predicted HOMO-LUMO gap as the target property. Our results show that the workflow can generate optimized molecules outside of the original training set, which suggests that iterations of the workflow could be useful for searching vast chemical spaces in a wide range of design problems.

NOTICE OF COPYRIGHT: This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Basic Energy Sciences Advisory Committee et al.: Directing Matter and Energy: Five Challenges for Science and the Imagination. US Department of Energy: Washington, DC (2007)

    Google Scholar 

  2. Sanchez-Lengeling, B., Aspuru-Guzik, A.: Inverse molecular design using machine learning: generative models for matter engineering. Science 361(6400), 360–365 (2018)

    Google Scholar 

  3. Blanchard, A.E., Stanley, C., Bhowmik, D.: Using GANs with adaptive training data to search for new molecules. J. Cheminform. 13(1), 1–8 (2021). https://doi.org/10.1186/s13321-021-00494-3

    Article  Google Scholar 

  4. Sun, W., et al.: Machine learning-assisted molecular design and efficiency prediction for high-performance organic photovoltaic materials. Sci. Adv. 5(11), eaay4275 (2019)

    Google Scholar 

  5. Pral, P.O., Barbatti, M.: Molecular excited states through a machine learning lens. Nat. Rev. Chem. 5(6), 388–405 (2021)

    Google Scholar 

  6. Zhavoronkov, A.: Artificial intelligence for drug discovery, biomarker development, and generation of novel chemistry. Mol. Pharm. 15(10), 4311–4313 (2018)

    Google Scholar 

  7. Jiménez-Luna, J., Grisoni, F., Schneider, G.: Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2(10), 573–584 (2020)

    Article  Google Scholar 

  8. Bhowmik, D., et al.: Deep clustering of protein folding simulations. JBMC Bioinformatics 19(484), 47–58 (2018). https://doi.org/10.1186/s12859-018-2507-5

  9. Zhuo, Y., Brgoch, J.: Opportunities for next-generation luminescent materials through artificial intelligence. J. Phys. Chem. Lett. 12(2), 764–772 (2021)

    Google Scholar 

  10. Cheng-Wei, J., et al.: Machine learning enables highly accurate predictions of photophysical properties of organic uorescent materials: emission wavelengths and quantum yields. J. Chem. Inf. Model 61(3), 1053–1065 (2021)

    Article  Google Scholar 

  11. Acharya, A., et al.: Supercomputer-based ensemble docking drug discovery pipeline with application to COVID-19. J. Chem. Inf. Model 60(12), 5832–5852 (2020)

    Article  Google Scholar 

  12. Meftahi, N., et al.: Machine learning property prediction for organic photovoltaic devices. NPJ Comput. Mater 6(1), 1–8 (2020)

    Article  Google Scholar 

  13. Mazouin, B., Schöpfer, A.A., von Lilienfeld, O.A.: Selected Machine Learning of HOMO-LUMO gaps with Improved Data-Efficiency. arXiv preprint arXiv:2110.02596 (2021)

  14. Andersson, K., Malmqvist, P.Å., Roos, B.O.:Second-order perturbation theory with a complete active space self-consistent field reference function. J. Chem. Phys. 96(2), 1218–1226 (1992)

    Google Scholar 

  15. Angeli, C., et al.: Introduction of n-electron valence states for multireference perturbation theory. J. Chem. Phys. 114(23), 10252–10264 (2001)

    Article  Google Scholar 

  16. Botti, S., et al.: Time-dependent density-functional theory for extended systems. Rep. Prog. Phys. 70(3), 357 (2007)

    Article  Google Scholar 

  17. Sokolov, M., et al.: Analytical time-dependent long-range corrected density functional tight binding (TD-LC-DFTB) gradients in DFTB+: implementation and benchmark for excited-state geometries and transition energies. J. Chem. Theory Comput. 17(4), 2266–2282 (2021)

    Article  Google Scholar 

  18. Lupo Pasini, M., et al.: Multi-task graph neural networks for simultaneous prediction of global and atomic properties in ferromagnetic systems. Mach. Learn. Sci. Technol. 3(2), 025007 (2022). https://doi.org/10.1088/2632-2153/ac6a51

  19. Pasini, M.L., et al.: HydraGNN. [Computer Software] (2021). https://doi.org/10.11578/dc.20211019.2, https://github.com/ORNL/HydraGNN

  20. Li, B., Zhao, M., Zhang, F.: Rational design of nearinfrared- II organic molecular dyes for bioimaging and biosensing. ACS Mater. Lett. 2(8), 905–917 (2020)

    Article  Google Scholar 

  21. Blanchard, A.E., et al.: Language models for the prediction of SARSCoV- 2 inhibitors. bioRxiv (2021). https://www.biorxiv.org/content/10.1101/2021.12.10.471928v1, https://doi.org/10.1101/2021.12.10.471928

  22. Ramakrishnan, R., et al.: Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1(1), 1–7 (2014)

    Article  Google Scholar 

  23. Hourahine, B., et al.: DFTB+, a software package for efficient approximate density functional theory based atomistic simulations. J. Chem. Phys. 152(12), 124101 (2020)

    Google Scholar 

  24. Weininger, D.: SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1998). https://doi.org/10.1021/ci00057a005

    Article  Google Scholar 

  25. RDKit: Open-source cheminformatics (2022). https://www.rdkit.org

  26. Enamine REAL Database. https://virtual-ow.org/, https://enamine.net/compound-collections/real-compounds/real-database. Accessed 01 Apr 2020

  27. Porezag, D., et al.: Construction of tight-binding-like potentials on the basis of density-functional theory: application to carbon. Phys. Rev. B 51(19), 12947–12957 (1995). https://link.aps.org/doi/10.1103/PhysRevB.51.12947, https://doi.org/10.1103/PhysRevB.51.12947

  28. Elstner, M., et al.: Self-consistent-charge density-functional tight-binding method for simulations of complex materials properties. Phys. Rev. B 58(11), 7260–7268 (1998). https://link.aps.org/doi/10.1103/PhysRevB.58.7260. https://doi.org/10.1103/PhysRevB.58.7260

  29. Gaus, M., Cui, Q., Elstner, M.: DFTB3: extension of the self-consistent-charge density-functional tight-binding method (SCCDFTB). J. Chem. Theory Comput. 7(4), 931–948 (2011). ISSN: 1549-9618, 1549-9626. https://pubs.acs.org/doi/10.1021/ct100684s. https://doi.org/10.1021/ct100684s

  30. Jones. R.O.: Density functional theory: its origins, rise to prominence, and future. Rev. Mod. Phys. 87(3), 897 (2015)

    Google Scholar 

  31. Nishimoto, Y., Fedorov, D.G., Irle, S.: Density-functional tight-binding combined with the fragment molecular orbital method. J. Chem. Theory Comput. 10(11), 4801–4812 (2014). ISSN: 1549–9618. https://pubs.acs.org/doi/10.1021/ct500489d, https://doi.org/10.1021/ct500489d

  32. Nishimura, Y., Nakai, H.: DCDFTBMD: divide-and-conquer density functional tight-binding program for huge-system quantum mechanical molecular dynamics simulations. J. Comput. Chem. 40(15), 1538–1549 (2019). ISSN: 1096–987X. https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.25804, https://doi.org/10.1002/jcc.25804

  33. Nishimura, Y., Nakai, H.: Quantum chemical calculations for up to one hundred million atoms using DCDFTBMD code on supercomputer Fugaku. Chem. Lett. 50(8), 1546–1550 (2021)

    Google Scholar 

  34. Frauenheim, T., et al.: Atomistic simulations of complex materials: ground-state and excited-state properties. J. Phys. Condens. Matter 14(11), 3015 (2002)

    Article  Google Scholar 

  35. Lutsker, V., Aradi, B., Niehaus, T.A.: Implementation and benchmark of a long-range corrected functional in the density functional based tight-binding method. J. Chem. Phys. 143(18), 184107 (2015)

    Google Scholar 

  36. Rezac, J.: Empirical self-consistent correction for the description of hydrogen bonds in DFTB3. J. Chem. Theory Comput. 13(10), 4804–4817 (2017)

    Article  Google Scholar 

  37. Cui, Q., Elstner, M.: Density functional tight binding: values of semi-empirical methods in an ab initio era. Phys. Chem. Chem. Phys. 16(28), 14368–14377 (2014)

    Google Scholar 

  38. Nishimoto, Y., Irle, S.: Quantum chemical prediction of vibrational spectra of large molecular systems with radical or metallic electronic structure. Chem. Phys. Lett. 667, 317–321 (2017)

    Article  Google Scholar 

  39. Camacho, C., et al.: Origin of the size-dependent fluorescence blueshift in [n] cycloparaphenylenes. Chem. Sci. 4(1), 187–195 (2013)

    Article  Google Scholar 

  40. Chou, C.-P., et al.: Automatized parameterization of DFTB using particle swarm optimization. J. Chem. Theory Comput. 12(1), 53–64 (2016)

    Article  Google Scholar 

  41. Larsen, A.H., et al.: The atomic simulation environment—a Python library for working with atoms. J. Phys. Condens. Matter 29(27), 273002 (2017)

    Google Scholar 

  42. Kubillus, M., et al.: Parameterization of the DFTB3 method for Br, Ca, Cl, F, I, K, and Na in organic and biological systems. J. Chem. Theory Comput. 11(1), 332–342 (2015). ISSN: 1549–9618. https://doi.org/10.1021/ct5009137. Accessed 06 Mar 2021

  43. Xie, T., Grossman, J.C.: Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120(14), 145301 (2018). https://link.aps.org/doi/10.1103/PhysRevLett.120.145301, https://doi.org/10.1103/PhysRevLett.120.145301

  44. Chen, C., et al.: Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater. 31(9), 3564–3572 (2019). https://doi.org/10.1021/acs.chemmater.9b01294

  45. Choudhary, K., DeCost, B.: Atomistic line graph neural network for improved materials property predictions. NPJ Comput. Mater. 7(1), 1–8 (2021)

    Google Scholar 

  46. Corso, G., et al.: Principal Neighbourhood Aggregation for Graph Nets. en. arXiv:2004.05718 [cs, stat] (2020). arXiv: 2004.05718. http://arxiv.org/abs/2004.05718. Accessed 21 Feb 2021

  47. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process Syst. 32 (2019).http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performancedeep-learning-library.pdf. Ed. by H. Wallach et al. Curran Associates Inc., pp. 8024–8035

  48. PyTorch. https://pytorch.org/docs/stable/index.html

  49. Fey, M., Lenssen, J.E.: Fast graph representation learning with Py-Torch geometric. In: ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)

    Google Scholar 

  50. PyTorch Geometric. https://pytorch-geometric.readthedocs.io/en/latest/

  51. Godoy, W.F., et al.: ADIOS 2: the adaptable input output system. A framework for high-performance data management. SoftwareX 12, 100561 (2020). ISSN: 2352–7110. https://doi.org/10.1016/j.softx.2020.100561, https://www.sciencedirect.com/science/article/pii/S2352711019302560

  52. Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies–Proceedings of the Conference, no. 1, pp. 4171–4186. Mlm (2019). arXiv: 1810.04805

  53. Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. arXiv (2020). ISSN: 23318422. https://arxiv.org/abs/2007.15779

  54. Schuster, M., Nakajima, K.: Japanese and Korean voice search. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5149–5152 (2012). https://doi.org/10.1109/ICASSP.2012.6289079

  55. Wu, Y., et al.: Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (2016). arXiv: 1609.08144. http://arxiv.org/abs/1609.08144

  56. Blanchard, A.E., et al.: Automating genetic algorithm mutations for molecules using a masked language model. IEEE Trans. Evol. Comput. (2022). https://doi.org/10.1109/TEVC.2022.3144045

  57. Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Online: Association for Computational Linguistics, pp. 38–45, October 2020. https://www.aclweb.org/anthology/2020.emnlp-demos.6

  58. Ertl, P., Schuffenhauer, A.: Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1(8) (2009). https://doi.org/10.1186/1758-2946-1-8

Download references

Acknowledgements

We thank Pilsun Yoo for fruitful discussions on the synthesizability score. This work was supported in part by the Office of Science of the Department of Energy and by the Laboratory Directed Research and Development (LDRD) Program of Oak Ridge National Laboratory. This research is sponsored by the Artificial Intelligence Initiative as part of the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725. The research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This work used resources of the Oak Ridge Leadership Computing Facility, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephan Irle .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1014 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Blanchard, A.E. et al. (2022). Computational Workflow for Accelerated Molecular Design Using Quantum Chemical Simulations and Deep Learning Models. In: Doug, K., Al, G., Pophale, S., Liu, H., Parete-Koon, S. (eds) Accelerating Science and Engineering Discoveries Through Integrated Research Infrastructure for Experiment, Big Data, Modeling and Simulation. SMC 2022. Communications in Computer and Information Science, vol 1690. Springer, Cham. https://doi.org/10.1007/978-3-031-23606-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23606-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23605-1

  • Online ISBN: 978-3-031-23606-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics