Machine learning directed drug formulation development

https://doi.org/10.1016/j.addr.2021.05.016Get rights and content

Abstract

Machine learning (ML) has enabled ground-breaking advances in the healthcare and pharmaceutical sectors, from improvements in cancer diagnosis, to the identification of novel drugs and drug targets as well as protein structure prediction. Drug formulation is an essential stage in the discovery and development of new medicines. Through the design of drug formulations, pharmaceutical scientists can engineer important properties of new medicines, such as improved bioavailability and targeted delivery. The traditional approach to drug formulation development relies on iterative trial-and-error, requiring a large number of resource-intensive and time-consuming in vitro and in vivo experiments. This review introduces the basic concepts of ML-directed workflows and discusses how these tools can be used to aid in the development of various types of drug formulations. ML-directed drug formulation development offers unparalleled opportunities to fast-track development efforts, uncover new materials, innovative formulations, and generate new knowledge in drug formulation science. The review also highlights the latest artificial intelligence (AI) technologies, such as generative models, Bayesian deep learning, reinforcement learning, and self-driving laboratories, which have been gaining momentum in drug discovery and chemistry and have potential in drug formulation development.

Introduction

Drug formulation typically involves combining inert materials and excipients with active pharmaceutical ingredients (APIs) to produce viable drug products with desired properties. The improvements associated with the development of an optimized drug formulation can include enhanced efficacy, longer acting therapeutic effects, reduced side effects, extended API stability and shelf-life, as well as better patient compliance [1]. Depending on the desired route of administration and specific requirements for the indication, APIs can be formulated using a diverse set of materials (including, inert excipients, such as polymers, lipids and surfactants, as well as other APIs) and in a wide range of delivery systems including various types of microparticles (MPs), nanoparticles (NPs), and multicomponent systems [2], [3], [4], [5]. These delivery systems are usually further manufactured into final drug products, e.g., solid, liquid, or parenteral dosage forms [1]. Bringing effective medicines to the market in a timely manner requires innovative drug delivery systems and an economically efficient development process. Although the traditional approach to formulation development has delivered successful drug products to patients, it relies on several inherently time-consuming and often inefficient steps. In general, the current formulation development pathway involves preparing and characterizing several candidate formulations through an extensive API-material matching process. The performance of these potential formulations is then evaluated and compared to the API alone, and competitor formulations (should they exist) in a variety of in vitro and/or ex vivo assays. Several predetermined product requirements, such as API loading, API release rate, formulation stability, particle size and/or particle charge are then used to select lead-candidate formulation(s) to be carried forward into preclinical animal studies. Failure of a potential formulation to meet the desired criteria (e.g., release rate, particle size, etc.) at any stage of development can require its refinement and the need to repeat several steps in the development process or in some cases the abandonment of a formulation and the need to begin the process anew.

The setbacks encountered during formulation development are largely related to an inability to predict how the composition, or the combination, of APIs and materials influences the performance-related parameters of the formulation. In an attempt to bridge this knowledge gap, pharmaceutical scientists have adopted computational modelling approaches such as molecular dynamics simulations [6], molecular docking studies [7], and cheminformatics tools [8]. Molecular modeling tools can provide new insights into complex drug delivery systems at the molecular level that are not always accessible by experimental techniques. A prominent example is the progress made in predicting properties such as small molecule solubility and affinity using molecular dynamics simulations [9], [10], [11]. While an in-depth discussion of these techniques, and their applications, is beyond the scope of this review, their application in drug formulation development has seen increasing success in recent years and these advances are well summarized in a recent review by Casalini [12]. Yet, these physics-based simulations have limitations that hinder their application in formulation development. In fact, the prediction of properties such as API release involves the simulation of large, multicomponent drug delivery systems over long timescales, such that the use of approaches like atomistic molecular dynamics simulations would be computationally intractable. The timescales involved, sometimes days long, are out of reach for coarse-grained approaches as well [13].

Machine learning (ML) is a branch of artificial intelligence (AI) that aims to model processes by training computational models based on a body of data. For instance, ML might allow one to predict the stability of a specific drug formulation by considering data from many previous experiments that examined formulation stability. Recent advances in ML algorithms, the wide availability of faster computing hardware, as well as the release of user-friendly ML toolkits, have significantly improved accessibility to powerful ML models. These trends have led to an explosion in the real-world applications of ML and AI, including within the healthcare and pharmaceutical sectors. The application of ML in these sectors has led to improved cancer diagnostics [14], [15], [16], the discovery of new antifibrotic [17] and antibiotic [18] molecules, and the development of so-called self-driving laboratories [19], [20]. Other notable applications include using supervised learning algorithms to predict the products of chemical reactions [21], the use of deep reinforcement learning to optimize chemical reactions [22], as well as the use of deep learning (DL) to determine the three-dimensional structure of a protein from its amino acid sequence [23].

The invention of new drug products and the steps involved in optimizing such formulations poses similar challenges to those which have already been addressed using ML in other sectors. For example, a major barrier in the current drug formulation development process is the number of expensive, laborious and time-consuming experiments that must be conducted to select appropriate materials to achieve a desirable formulation property (such as increased API solubility). By harnessing the predictive power of AI and ML, pharmaceutical scientists may be able to streamline the development of such formulations using existing data or through optimal experimental planning. To date, ML models have been developed to address several of the inherent challenges faced by formulation scientists, including prediction of the effect of excipients on API solubility, determination of the chemical and colloidal stability of proteins, prediction of the physical stability of API formulations, determination of API loading capacity as well as release rates of APIs from advanced delivery platforms such as MPs and NPs. The purpose of this review is to provide drug delivery and formulation scientists with a brief introduction to ML and to highlight recent formulation development projects that have overcome substantial obstacles using ML tools. Efforts are also made to highlight promising directions to achieve further success in this area. Overall, this review aims to make the case for a new data-driven formulation development process.

Section snippets

Machine learning tools and techniques

In this review, we focus primarily on the use of supervised ML to predict properties of drug formulations. Supervised ML tasks aim to predict a numerical value or class for a specific data sample. The prediction of numerical values, for example the prediction of API solubility in various surfactant solutions (Box 1), is referred to as a regression task. A classification task, in contrast, determines a category to which a sample belongs, for example predicting whether a molecule primarily acts

Conventional oral dosage forms

The first applications of ML in drug formulation development date back to the 1990s when NNs were used to predict properties of immediate release (IR) oral tablets. These studies involved the preparation and evaluation of a range of tablet formulations. The resulting data was then used to train NNs and/or decision trees to predict various outputs (e.g., disintegration time, dissolution rate, and friability). These studies were some of the first to demonstrate how ML could be used to predict

Outlook

ML models enable the users to analyze experimental results to uncover subtle patterns that are not immediately visible. While the majority of the studies summarized herein reported ML models with high predictive accuracy, many of these models have only been evaluated retrospectively. Only a limited number of studies have included prospective experimental validation and model interpretation steps. It is through such analytical steps that ML models can be used to generate new knowledge and afford

Author contributions

The manuscript was written through contributions of all authors. All authors have approved the final version of the manuscript.

Funding Sources

NSERC Discovery grant (RGPIN-2016-04293) to C. A. F.H., M.A., and A.A.G. acknowledge support by the Defense Advanced Research Projects Agency under the Accelerated Molecular Discovery Program under Cooperative Agreement No. HR00111920027 dated August 1, 2019. A.A.G. would like to thank Dr. Anders Frøseth for his support. M.A. is supported by a Postdoctoral Fellowship of the Vector Institute.

Acknowledgment

Images were created using BioRender.com, ChemDraw Professional, TheNounProject.com and Smart.Servier.com.

References (103)

  • M. Turkoglu et al.

    Modeling of a roller-compaction process using neural networks and genetic algorithms

    Eur. J. Pharm. Biopharm.

    (1999)
  • K. Takagaki et al.

    Creation of a tablet database containing several active ingredients and prediction of their pharmaceutical characteristics based on ensemble artificial neural networks

    J. Pharm. Sci.

    (2010)
  • R. Han et al.

    Predicting oral disintegrating tablet formulations by neural network techniques

    Asian J. Pharm. Sci.

    (2018)
  • S.A. Damiati et al.

    Application of machine learning in prediction of hydrotrope-enhanced solubilisation of indomethacin

    Int. J. Pharm.

    (2017)
  • Y. Yang

    Deep learning for in vitro prediction of pharmaceutical formulations

    Acta Pharm. Sin. B

    (2019)
  • J. Petrović et al.

    Application of dynamic neural networks in the modeling of drug release from polyethylene oxide matrix tablets

    Eur. J. Pharm. Sci.

    (2009)
  • S. Ibrić et al.

    The application of generalized regression neural network in the modeling and optimization of aspirin extended release tablets with Eudragit® RS PO as matrix substance

    J. Controlled Release

    (2002)
  • J. Petrović et al.

    Optimization of matrix tablets controlled drug release using Elman dynamic neural networks and decision trees

    Int. J. Pharm.

    (2012)
  • A. Ghaffari

    Performance comparison of neural network training algorithms in modeling of bimodal drug delivery

    Int. J. Pharm.

    (2006)
  • P. Barmpalexis et al.

    Artificial neural networks in the optimization of a nimodipine controlled release tablet formulation

    Eur. J. Pharm. Biopharm.

    (2010)
  • R. Han

    Predicting physical stability of solid dispersions by machine learning techniques

    J. Controlled Release

    (2019)
  • Z. Zhang

    Design of an expert system for the development and formulation of push–pull osmotic pump tablets containing poorly water-soluble drugs

    Int. J. Pharm.

    (2011)
  • H. Gao

    Predicting drug/phospholipid complexation by the lightGBM method

    Chem. Phys. Lett.

    (2020)
  • Q. Zhao et al.

    Predicting complexation performance between cyclodextrins and guest molecules by integrated machine learning and molecular modeling techniques

    Acta Pharm. Sin. B

    (2019)
  • S. Branchu et al.

    A decision-support tool for the formulation of orally active, poorly soluble compounds

    Eur. J. Pharm. Sci.

    (2007)
  • J.A. Dowell et al.

    Artificial neural networks applied to the in vitro-in vivo correlation of an extended-release formulation: initial trials and experience

    J. Pharm. Sci.

    (1999)
  • L. Gentiluomo

    Application of interpretable artificial neural networks to early monoclonal antibodies development

    Eur. J. Pharm. Biopharm.

    (2019)
  • L. Gentiluomo et al.

    Application of machine learning to predict monomer retention of therapeutic proteins after long term storage

    Int. J. Pharm.

    (2020)
  • K. Park

    Injectable, long-acting PLGA formulations: Analyzing PLGA and understanding microparticle formation

    J. Controlled Release

    (2019)
  • M. Germain

    Delivering the power of nanomedicine to patients today

    J. Controlled Release

    (2020)
  • C.I. Nkanga

    Clinically established biodegradable long acting injectables: An industry perspective

    Adv. Drug Deliv. Rev.

    (2020)
  • Y. Li et al.

    Prediction of kinetics of doxorubicin release from sulfopropyl dextran ion-exchange microspheres using artificial neural networks

    Eur. J. Pharm. Sci.

    (2005)
  • Y. Li et al.

    Optimization of controlled release nanoparticle formulation of verapamil hydrochloride using artificial neural networks with genetic algorithm and response surface methodology

    Eur. J. Pharm. Biopharm.

    (2015)
  • G. Amasya et al.

    Quality by design case study 1: design of 5-fluorouracil loaded lipid nanoparticles by the W/O/W double emulsion — solvent evaporation method

    Eur. J. Pharm. Sci.

    (2016)
  • Y. He

    Can machine learning predict drug nanocrystals?

    J. Controlled Release

    (2020)
  • Aulton’s pharmaceutics: the design and manufacture of medicines, Elsevier,...
  • R.F. Pagels, R.K. Prud’homme, Polymeric nanoparticles and microparticles for the delivery of peptides, biologics, and...
  • M. De Vivo et al.

    Role of molecular dynamics and related methods in drug discovery

    J. Med. Chem.

    (2016)
  • D.B. Kitchen et al.

    Docking and scoring in virtual screening for drug discovery: methods and applications

    Nat. Rev. Drug Discov.

    (2004)
  • H. Chen et al.

    Cheminformatics in drug discovery, an industrial perspective

    Mol. Inform.

    (2018)
  • M. Aldeghi et al.

    Accurate calculation of the absolute free energy of binding for drug molecules

    Chem. Sci.

    (2016)
  • M.G. Saunders et al.

    Coarse-graining methods for computational biology

    Annu. Rev. Biophys.

    (2013)
  • A. Hosny et al.

    Artificial intelligence in radiology

    Nat. Rev. Cancer

    (2018)
  • N. Wu

    Deep neural networks improve radiologists’ performance in breast cancer screening

    IEEE Trans. Med. Imaging

    (2020)
  • S.M. McKinney

    International evaluation of an AI system for breast cancer screening

    Nature

    (2020)
  • A. Zhavoronkov

    Deep learning enables rapid identification of potent DDR1 kinase inhibitors

    Nat. Biotechnol.

    (2019)
  • S. Langner

    Beyond ternary OPV: high-throughput experimentation and self-driving laboratories optimize multicomponent systems

    Adv. Mater.

    (2020)
  • B.P. MacLeod

    Self-driving laboratory for accelerated discovery of thin-film materials

    Sci. Adv.

    (2020)
  • Z. Zhou et al.

    Optimizing chemical reactions with deep reinforcement learning

    ACS Cent. Sci.

    (2017)
  • A.W. Senior

    Improved protein structure prediction using potentials from deep learning

    Nature

    (2020)
  • Cited by (98)

    View all citing articles on Scopus

    This review is part of the Advanced Drug Delivery Reviews theme issue on “Editor’s collection 2021”.

    View full text