Machine learning directed drug formulation development☆
Graphical abstract
Introduction
Drug formulation typically involves combining inert materials and excipients with active pharmaceutical ingredients (APIs) to produce viable drug products with desired properties. The improvements associated with the development of an optimized drug formulation can include enhanced efficacy, longer acting therapeutic effects, reduced side effects, extended API stability and shelf-life, as well as better patient compliance [1]. Depending on the desired route of administration and specific requirements for the indication, APIs can be formulated using a diverse set of materials (including, inert excipients, such as polymers, lipids and surfactants, as well as other APIs) and in a wide range of delivery systems including various types of microparticles (MPs), nanoparticles (NPs), and multicomponent systems [2], [3], [4], [5]. These delivery systems are usually further manufactured into final drug products, e.g., solid, liquid, or parenteral dosage forms [1]. Bringing effective medicines to the market in a timely manner requires innovative drug delivery systems and an economically efficient development process. Although the traditional approach to formulation development has delivered successful drug products to patients, it relies on several inherently time-consuming and often inefficient steps. In general, the current formulation development pathway involves preparing and characterizing several candidate formulations through an extensive API-material matching process. The performance of these potential formulations is then evaluated and compared to the API alone, and competitor formulations (should they exist) in a variety of in vitro and/or ex vivo assays. Several predetermined product requirements, such as API loading, API release rate, formulation stability, particle size and/or particle charge are then used to select lead-candidate formulation(s) to be carried forward into preclinical animal studies. Failure of a potential formulation to meet the desired criteria (e.g., release rate, particle size, etc.) at any stage of development can require its refinement and the need to repeat several steps in the development process or in some cases the abandonment of a formulation and the need to begin the process anew.
The setbacks encountered during formulation development are largely related to an inability to predict how the composition, or the combination, of APIs and materials influences the performance-related parameters of the formulation. In an attempt to bridge this knowledge gap, pharmaceutical scientists have adopted computational modelling approaches such as molecular dynamics simulations [6], molecular docking studies [7], and cheminformatics tools [8]. Molecular modeling tools can provide new insights into complex drug delivery systems at the molecular level that are not always accessible by experimental techniques. A prominent example is the progress made in predicting properties such as small molecule solubility and affinity using molecular dynamics simulations [9], [10], [11]. While an in-depth discussion of these techniques, and their applications, is beyond the scope of this review, their application in drug formulation development has seen increasing success in recent years and these advances are well summarized in a recent review by Casalini [12]. Yet, these physics-based simulations have limitations that hinder their application in formulation development. In fact, the prediction of properties such as API release involves the simulation of large, multicomponent drug delivery systems over long timescales, such that the use of approaches like atomistic molecular dynamics simulations would be computationally intractable. The timescales involved, sometimes days long, are out of reach for coarse-grained approaches as well [13].
Machine learning (ML) is a branch of artificial intelligence (AI) that aims to model processes by training computational models based on a body of data. For instance, ML might allow one to predict the stability of a specific drug formulation by considering data from many previous experiments that examined formulation stability. Recent advances in ML algorithms, the wide availability of faster computing hardware, as well as the release of user-friendly ML toolkits, have significantly improved accessibility to powerful ML models. These trends have led to an explosion in the real-world applications of ML and AI, including within the healthcare and pharmaceutical sectors. The application of ML in these sectors has led to improved cancer diagnostics [14], [15], [16], the discovery of new antifibrotic [17] and antibiotic [18] molecules, and the development of so-called self-driving laboratories [19], [20]. Other notable applications include using supervised learning algorithms to predict the products of chemical reactions [21], the use of deep reinforcement learning to optimize chemical reactions [22], as well as the use of deep learning (DL) to determine the three-dimensional structure of a protein from its amino acid sequence [23].
The invention of new drug products and the steps involved in optimizing such formulations poses similar challenges to those which have already been addressed using ML in other sectors. For example, a major barrier in the current drug formulation development process is the number of expensive, laborious and time-consuming experiments that must be conducted to select appropriate materials to achieve a desirable formulation property (such as increased API solubility). By harnessing the predictive power of AI and ML, pharmaceutical scientists may be able to streamline the development of such formulations using existing data or through optimal experimental planning. To date, ML models have been developed to address several of the inherent challenges faced by formulation scientists, including prediction of the effect of excipients on API solubility, determination of the chemical and colloidal stability of proteins, prediction of the physical stability of API formulations, determination of API loading capacity as well as release rates of APIs from advanced delivery platforms such as MPs and NPs. The purpose of this review is to provide drug delivery and formulation scientists with a brief introduction to ML and to highlight recent formulation development projects that have overcome substantial obstacles using ML tools. Efforts are also made to highlight promising directions to achieve further success in this area. Overall, this review aims to make the case for a new data-driven formulation development process.
Section snippets
Machine learning tools and techniques
In this review, we focus primarily on the use of supervised ML to predict properties of drug formulations. Supervised ML tasks aim to predict a numerical value or class for a specific data sample. The prediction of numerical values, for example the prediction of API solubility in various surfactant solutions (Box 1), is referred to as a regression task. A classification task, in contrast, determines a category to which a sample belongs, for example predicting whether a molecule primarily acts
Conventional oral dosage forms
The first applications of ML in drug formulation development date back to the 1990s when NNs were used to predict properties of immediate release (IR) oral tablets. These studies involved the preparation and evaluation of a range of tablet formulations. The resulting data was then used to train NNs and/or decision trees to predict various outputs (e.g., disintegration time, dissolution rate, and friability). These studies were some of the first to demonstrate how ML could be used to predict
Outlook
ML models enable the users to analyze experimental results to uncover subtle patterns that are not immediately visible. While the majority of the studies summarized herein reported ML models with high predictive accuracy, many of these models have only been evaluated retrospectively. Only a limited number of studies have included prospective experimental validation and model interpretation steps. It is through such analytical steps that ML models can be used to generate new knowledge and afford
Author contributions
The manuscript was written through contributions of all authors. All authors have approved the final version of the manuscript.
Funding Sources
NSERC Discovery grant (RGPIN-2016-04293) to C. A. F.H., M.A., and A.A.G. acknowledge support by the Defense Advanced Research Projects Agency under the Accelerated Molecular Discovery Program under Cooperative Agreement No. HR00111920027 dated August 1, 2019. A.A.G. would like to thank Dr. Anders Frøseth for his support. M.A. is supported by a Postdoctoral Fellowship of the Vector Institute.
Acknowledgment
Images were created using BioRender.com, ChemDraw Professional, TheNounProject.com and Smart.Servier.com.
References (103)
Successful oral delivery of poorly water-soluble drugs both depends on the intraluminal behavior of drugs and of appropriate advanced drug delivery systems
Eur. J. Pharm. Sci.
(2019)- et al.
Pharmaceutical aspects of salt and cocrystal forms of APIs and characterization challenges
Adv. Drug Deliv. Rev.
(2017) - et al.
Pharmaceutical cocrystals, salts and multicomponent systems; intermolecular interactions and property based design
Adv. Drug Deliv. Rev.
(2017) Large scale relative protein ligand binding affinities using non-equilibrium alchemy
Chem. Sci.
(2020)- et al.
Molecular simulation as a computational pharmaceutics tool to predict drug solubility, solubilization processes and partitioning
Eur. J. Pharm. Biopharm.
(2019) Not only in silico drug discovery: molecular modeling towards in silico drug delivery formulations
J. Controlled Release
(2021)A deep learning approach to antibiotic discovery
Cell
(2020)A graph-convolutional neural network model for the prediction of chemical reactivity
Chem. Sci.
(2019)- et al.
Next-generation experimentation with self-driving laboratories
Trends Chem.
(2019) - et al.
Comparison of artificial neural networks (ANN) with classical modelling techniques using different experimental designs and data from a galenical study on a solid dosage form
Eur. J. Pharm. Sci.
(1998)
Modeling of a roller-compaction process using neural networks and genetic algorithms
Eur. J. Pharm. Biopharm.
Creation of a tablet database containing several active ingredients and prediction of their pharmaceutical characteristics based on ensemble artificial neural networks
J. Pharm. Sci.
Predicting oral disintegrating tablet formulations by neural network techniques
Asian J. Pharm. Sci.
Application of machine learning in prediction of hydrotrope-enhanced solubilisation of indomethacin
Int. J. Pharm.
Deep learning for in vitro prediction of pharmaceutical formulations
Acta Pharm. Sin. B
Application of dynamic neural networks in the modeling of drug release from polyethylene oxide matrix tablets
Eur. J. Pharm. Sci.
The application of generalized regression neural network in the modeling and optimization of aspirin extended release tablets with Eudragit® RS PO as matrix substance
J. Controlled Release
Optimization of matrix tablets controlled drug release using Elman dynamic neural networks and decision trees
Int. J. Pharm.
Performance comparison of neural network training algorithms in modeling of bimodal drug delivery
Int. J. Pharm.
Artificial neural networks in the optimization of a nimodipine controlled release tablet formulation
Eur. J. Pharm. Biopharm.
Predicting physical stability of solid dispersions by machine learning techniques
J. Controlled Release
Design of an expert system for the development and formulation of push–pull osmotic pump tablets containing poorly water-soluble drugs
Int. J. Pharm.
Predicting drug/phospholipid complexation by the lightGBM method
Chem. Phys. Lett.
Predicting complexation performance between cyclodextrins and guest molecules by integrated machine learning and molecular modeling techniques
Acta Pharm. Sin. B
A decision-support tool for the formulation of orally active, poorly soluble compounds
Eur. J. Pharm. Sci.
Artificial neural networks applied to the in vitro-in vivo correlation of an extended-release formulation: initial trials and experience
J. Pharm. Sci.
Application of interpretable artificial neural networks to early monoclonal antibodies development
Eur. J. Pharm. Biopharm.
Application of machine learning to predict monomer retention of therapeutic proteins after long term storage
Int. J. Pharm.
Injectable, long-acting PLGA formulations: Analyzing PLGA and understanding microparticle formation
J. Controlled Release
Delivering the power of nanomedicine to patients today
J. Controlled Release
Clinically established biodegradable long acting injectables: An industry perspective
Adv. Drug Deliv. Rev.
Prediction of kinetics of doxorubicin release from sulfopropyl dextran ion-exchange microspheres using artificial neural networks
Eur. J. Pharm. Sci.
Optimization of controlled release nanoparticle formulation of verapamil hydrochloride using artificial neural networks with genetic algorithm and response surface methodology
Eur. J. Pharm. Biopharm.
Quality by design case study 1: design of 5-fluorouracil loaded lipid nanoparticles by the W/O/W double emulsion — solvent evaporation method
Eur. J. Pharm. Sci.
Can machine learning predict drug nanocrystals?
J. Controlled Release
Role of molecular dynamics and related methods in drug discovery
J. Med. Chem.
Docking and scoring in virtual screening for drug discovery: methods and applications
Nat. Rev. Drug Discov.
Cheminformatics in drug discovery, an industrial perspective
Mol. Inform.
Accurate calculation of the absolute free energy of binding for drug molecules
Chem. Sci.
Coarse-graining methods for computational biology
Annu. Rev. Biophys.
Artificial intelligence in radiology
Nat. Rev. Cancer
Deep neural networks improve radiologists’ performance in breast cancer screening
IEEE Trans. Med. Imaging
International evaluation of an AI system for breast cancer screening
Nature
Deep learning enables rapid identification of potent DDR1 kinase inhibitors
Nat. Biotechnol.
Beyond ternary OPV: high-throughput experimentation and self-driving laboratories optimize multicomponent systems
Adv. Mater.
Self-driving laboratory for accelerated discovery of thin-film materials
Sci. Adv.
Optimizing chemical reactions with deep reinforcement learning
ACS Cent. Sci.
Improved protein structure prediction using potentials from deep learning
Nature
Cited by (98)
Towards safer and efficient formulations: Machine learning approaches to predict drug-excipient compatibility
2024, International Journal of PharmaceuticsArtificial intelligence generates novel 3D printing formulations
2024, Applied Materials TodaySimulation-based approaches for drug delivery systems: Navigating advancements, opportunities, and challenges
2024, Journal of Molecular LiquidsResearch progress of MIoT and digital healthcare in the new era
2024, Clinical eHealthDesign of a Reciprocal Injection Device for Stability Studies of Parenteral Biological Drug Products
2024, Journal of Pharmaceutical Sciences3D printing of biologics—what has been accomplished to date?
2024, Drug Discovery Today
- ☆
This review is part of the Advanced Drug Delivery Reviews theme issue on “Editor’s collection 2021”.