Main

Despite amazing progress in basic life sciences and biotechnology, drug discovery and development (DDD) remain slow and expensive, taking on average approximately 15 years and approximately US$2 billion to make a small-molecule drug1. Although it is accepted that clinical studies are the priciest part of the development of each drug, most time-saving and cost-saving opportunities reside in the earlier discovery and preclinical stages. Preclinical efforts themselves account for more than 43% of expenses in pharma, in addition to major public funding1, driven by the high attrition rate at every step from target selection to hit identification and lead optimization to the selection of clinical candidates. Moreover, the high failure rate in clinical trials (currently 90%)2 is largely explained by issues rooted in early discovery such as inadequate target validation or suboptimal ligand properties. Finding fast and accessible ways to discover more diverse pools of higher-quality chemical probes, hits and leads with optimal absorption, distribution, metabolism, excretion and toxicology (ADMET) and pharmacokinetics (PK) profiles at the early stages of DDD would improve outcomes in preclinical and clinical studies and facilitate more effective, accessible and safer drugs.

The concept of computer-aided drug discovery3 was developed in the 1970s and popularized by Fortune magazine in 1981, and has since been through several cycles of hype and disillusionment4. There have been success stories along the way5 and, in general, computer-assisted approaches have become an integral, yet modest, part of the drug discovery process6,7. In the past few years, however, several scientific and technological breakthroughs resulted in a tectonic shift towards embracing computational approaches as a key driving force for drug discovery in both academia and industry. Pharmaceutical and biotech companies are expanding their computational drug discovery efforts or hiring their first computational chemists. Numerous new and established drug discovery companies have raised billions in the past few years with business models that heavily rely on a combination of advanced physics-based molecular modelling with deep learning (DL) and artificial intelligence (AI)8. Although it is too early yet to expect approved drugs from the most recent computationally driven discovery efforts, they are producing a growing number of clinical candidates, with some campaigns specifically claiming target-to-lead times as low as 1–2 months9,10, or target-to-clinic time under 1 year11. Are these the signs of a major shift in the role that computational approaches have in drug discovery or just another round of the hype cycle?

Let us look at the key factors defining the recent changes (Fig. 1). First, the structural revolution—from automation in crystallography12 to microcrystallography13,14 and most recently cryo-electron microscopy technology15,16—has made it possible to reveal 3D structures for the majority of clinically relevant targets, often in a state or molecular complex relevant to its biological function. Especially impressive has been the recent structural turnaround for G protein-coupled receptors (GPCRs)17 and other membrane proteins that mediate the action of more than 50% of drugs18, providing 3D templates for ligand screening and lead optimization. The second factor is a rapid and marked expansion of drug-like chemical space, easily accessible for hit and lead discovery. Just a few years ago, this space was limited to several million on-shelf compounds from vendors and in-house screening libraries in pharma. Now, screening can be done with ultra-large virtual libraries and chemical spaces of drug-like compounds, which can be readily made on-demand, rapidly growing beyond billions of compounds19, and even larger generative spaces with theoretically predicted synthesizability (Box 1). The third factor involves emerging computational approaches that strive to take full advantage of the abundance of 3D structures and ligand data, supported by the broad availability of cloud and graphics processing unit (GPU) computing resources to support these methods at scale. This includes structure-based virtual screening of ultra-large libraries20,21,22, using accelerated23,24,25 and modular26 screening approaches, as well as recent growth of data-driven machine learning (ML) and DL methods for predicting ADMET and PK properties and activities27.

Fig. 1: Key factors driving VLS technology breakthroughs for generation of high-quality hits and leads.
figure 1

a, More than 200,000 protein structures in the PDB, plus private collections, have more than 90% of protein families covered with high-resolution X-ray and more recently cryo-electron microscopy structures, often in distinct functional states, with remaining gaps also filled by homology or AlphaFold2 models. b, The chemical space available for screening and fast synthesis has grown from about 107 on-shelf compounds in 2015 to more than 3 × 1010 on-demand compounds in 2022, and can be rapidly expanded beyond 1015 diverse and novel compounds. c, Computational methods for VLS include advances in fast flexible docking, modular fragment-based algorithms, DL models and hybrid approaches. d, Computational tools are supported by rapid growth of affordable cloud computing, GPU acceleration and specialized chips.

Although the impacts of the recent structural revolution17 and computing hardware in drug discovery28 are comprehensively reviewed elsewhere, here we focus on the ongoing expansion of accessible drug-like chemical spaces as well as current developments in computational methods for ligand discovery and optimization. We detail how emerging computational tools applied in gigaspace can facilitate the cost-effective discovery of hundreds or even thousands of highly diverse, potent, target-selective and drug-like ligands for a desired target, and put them in the context of experimental approaches (Table 1). Although the full impact of new computational technologies is only starting to affect clinical development, we suggest that their synergistic combination with experimental testing and validation in the drug discovery ecosystem can markedly improve its efficiency in producing better therapeutics.

Table 1 Comparison of experimentally driven HTS, fragment-based ligand discovery, gigascale DEL screening and gigascale VLS

Expansion of accessible chemical space

Why bigger is better

The limited size and diversity of screening libraries have long been a bottleneck for detection of novel potent ligands and for the whole process of drug discovery. An average ‘affordable’ high-throughput screening (HTS) campaign29 uses screening libraries of about 50,000–500,000 compounds and is expected to yield only a few true hits after secondary validation. Those hits, if any, are usually rather weak, non-selective, have suboptimal ADMET and PK properties and unknown binding mode, so their discovery entails years of painstaking trial-and-error optimization efforts to produce a lead molecule with satisfying potency and all the other requirements for preclinical development. Scaling of HTS to a few million compounds can be afforded only in big pharma, and it still does not make that much difference in terms of the quality of resulting hits. Likewise, virtual libraries that use in silico screening were traditionally limited to a collection of compounds available in stock from vendors, usually comprising fewer than 10 million unique compounds, therefore the scale advantage over HTS was marginal.

Although chasing the full coverage of the enormous drug-like chemical space (estimated at more than 1063 compounds)30 is a futile endeavour, expanding the screening of on-demand libraries by several orders of magnitude to billions and more of previously unexplored drug-like compounds, either physical or virtual, is expected to change the drug discovery model in several ways. First, it can proportionally increase the number of potential hits in the initial screening31 (Fig. 2). This abundance of ligands in the library also increases the chances of identification of more potent or selective ligands, as well as ligands with better physicochemical properties. This has been demonstrated in ultra-large virtual screening campaigns for several targets, revealing highly potent ligands with affinities often in the mid-nanomolar to sub-nanomolar range20,21,22,23,26. Second, the accessibility of hit analogues in the same on-demand spaces streamlines a generation of meaningful structure–activity relationship (SAR)-by-catalogue and further optimization steps, reducing the amount of elaborate custom synthesis. Last, although the library scale is important, properly constructed gigascale libraries can expand chemical diversity (even with a few chemical reactions32), chemical novelty and patentability of the hits, as almost all on-demand compounds have never been synthesized before.

Fig. 2: Benefits of a bigger chemical space.
figure 2

The red curves in log scale illustrate the distribution of screening hits with binding scores better than X for libraries of 10 billion, 100 million and 1 million compounds, as estimated from previous VLS and V-SYNTHES screening campaigns. The blue curves illustrate the approximate dependence of the experimental hit rate on the predicted docking score for 10-µM, 1-µM and 100-nM thresholds20. This analysis (semi-quantitative, as it varies from target to target) suggests that screening of more than 100 million compounds lifts the limitations of smaller libraries, extending the tail of the hit distribution towards better binding scores with high hit rates, and allowing for identification of proportionally more experimental hits with higher affinity. Note also two important factors justifying further growth of screening libraries to 10 billion and more: (1) the candidate hits for synthesis and experimental testing are usually picked as a result of target-dependent post-processing of several thousands of top-scoring compounds, which selects for novelty, diversity, drug likeness and often interactions with specific receptor residues. Thus, the more good-scoring compounds that are identified, the better overall selection can be made. (2) Saturation of the hit rate curves at best scores is not a universal rule but a result of the limited accuracy of fast scoring functions used in screening. Using more accurate docking or scoring approaches (flexible docking, quantum mechanical and free energy perturbation) in the post-processing step can extend a meaningful correlation of binding score with affinity further left (grey dashed curves), potentially bringing even more high-affinity hits for gigascale chemical spaces.

Physical libraries

Several approaches have been developed recently to push the library size limits in HTS, including combinatorial chemistry and large-scale pooling of the compounds for parallel assays. For example, affinity-selection mass spectrometry techniques can be applied to identify binders directly in pools of thousands of compounds33 without the need for labelling. DNA-encoded libraries (DELs) and cost-effective approaches to generate and screen them have also been developed34, making it possible to work with as many as approximately 1010 compounds in a single test tube35. These methods have their own limitations; as DELs are created by tagging ligands with unique DNA sequences through a linker, DNA conjugation limits the chemistries possible for the combinatorial assembly of the library. Screening of DELs may also yield a large number of false negatives by blocking important moieties for binding and, more importantly, false positives by nonspecific binding of DNA labels, so expensive off-DNA resynthesis of hit compounds is needed for their validation. To avoid this resynthesis, it has been suggested to use ML modes trained on DEL results for each target to predict drug-like ligands from on-demand chemical spaces, as described in ref. 36.

Virtual on-demand libraries

In silico screening of virtual libraries by fast computational approaches has long been touted as a cost-effective way to overcome the limitations of physical libraries. Only recently, however, have synthetic chemistry and cheminformatics approaches been developed to break out of these limits and construct virtual on-demand libraries that explore much larger chemical space, as reviewed in refs. 37,38. In 2017, the readily accessible (REAL) database by Enamine19,39 became the first commercially available on-demand library based on the robust reaction principle40, whereas the US National Institutes of Health developed synthetically accessible virtual inventory (SAVI)41, which also uses Enamine building blocks. The REAL database uses carefully selected and optimized parallel synthesis protocols and a curated collection of in-stock building blocks, making it possible to guarantee the fast (less than 4 weeks), reliable (80% success rate) and affordable synthesis of a set of compounds21. Driven by new reactions and diverse building blocks, the fully enumerated REAL database has grown from approximately 170 million compounds in 2017 to more than 5.5 billion compounds in 2022 and comprises the bulk of the popular ZINC20 virtual screening database42. The practical utility of the REAL database has been recently demonstrated in several major prospective screening campaigns20,21,23,24, some of them taking further hit optimization steps in the same chemical space, yielding selective nanomolar and even sub-nanomolar ligands without any custom synthesis20,21. Similar ultra-large virtual libraries (that is, GalaXi (http://www.wuxiapptec.com) and CHEMriya (http://chemriya.com)) are available commercially, although their synthetic success rates are yet to be published.

Virtual chemical spaces

The modular nature of on-demand virtual libraries supports further growth by the addition of reactions and building blocks. However, building, maintaining and searching fully enumerated chemical libraries comprising more than a few billion compounds become slow and impractical. Such gigascale virtual libraries are therefore usually maintained as non-enumerated chemical spaces, defined by a specific set of building blocks and reactions (or transforms), as comprehensively reviewed in ref. 38. Within pharma, one of the first published examples includes PGVL by Pfizer37,43, the most recent version of which uses a set of 1,244 reactions and in-house reagents to account for 1014 compounds. Other biopharma companies have their own virtual chemical spaces38,44, although their details are often not in the public domain. Among commercially available chemical spaces, GalaXi Space by WuXi (approximately 8 billion compounds), CHEMriya by Otava (11.8 billion compounds) and Enamine REAL Space (36 billion compounds)45 are among the largest and most established. In addition to their enormous sizes, these virtual spaces are highly novel and diverse, and have minimal overlap (less than 10%) between each other46. Currently, the largest commercial space, Enamine REAL Space, is an extension to the REAL database that maintains the same synthetic speed, rate and cost guarantees, covering more than 170 reactions and more than 137,000 building blocks (Box 1). Most of these reactions are two-component or three-component, but more four-component or even five-component reactions are being explored, enabling higher-order combinatorics. This space can be easily expanded to 1015 compounds based on available reactions and extended building block sets, for example, 680 million of make on demand (MADE) building blocks47, although synthesis of such compounds involves more steps and is more expensive. To represent and navigate combinatorial chemical spaces without their full enumeration, specialized cheminformatics tools have been developed, from fragment-based chemical similarity searches48 to more elaborate 3D molecular similarity search methods based on atomic property fields such as rapid isostere discovery engine (RIDE)38.

An alternative approach proposed to building chemical spaces generates hypothetically synthesizable compounds following simple rules of synthetic feasibility and chemical stability. Thus, the generated databases (GDB) predict compounds that can be made of a specific number of atoms; for example, GDB-17 contained 166.4 billion molecules of up to 17 atoms of C, N, O, S and halogens49, whereas GDB-18 made up of 18 atoms would reach an estimated 1013 compounds38. Other generative approaches based on narrower definitions of chemical spaces are now used in de novo ligand design with DL-based generative chemistry (for example, ref. 50), as discussed below.

Although the synthetic success rate for some of the commercial on-demand chemical spaces (for example, Enamine REAL Space) have been thoroughly validated20,21,22,23,24,26,42, synthetic accessibilities and success rates of other chemical spaces remain unpublished38. These are important metrics for the practical sustainability of on-demand synthesis because reduced success rates or unreasonable time and cost would diminish its advantage over custom synthesis.

Computational approaches to drug design

Challenges of gigascale screening

Chemical spaces of gigascale and terrascale, provided that they maintain high drug likeness and diversity, are expected to harbour millions of potential hits and thousands of potential lead series for any target. Moreover, their highly tractable robust synthesis simplifies any downstream medicinal chemistry efforts towards final drug candidates.

Dealing with such virtual libraries, however, calls for new computational approaches that meet special requirements for both speed and accuracy. They have to be fast enough to handle gigascale libraries. If docking of a compound takes 10 s per CPU core, it would take more than 3,000 years to screen 1010 compounds on a single CPU core, or cost approximately US $1 million on a computing cloud at the cheapest CPU rates. At the same time, gigascale screening must be extremely accurate, safeguarding against false-positive hits that effectively cheat the scoring function by exploiting its holes and approximations31. Even a one-in-a-million rate of false positives in a 1010 compound library would comprise 10,000 false hits, which may flood out any hit candidate selection. The artefact rate and nature may depend on the target and screening algorithms and should be carefully addressed in screening and post-processing. Although there is no one simple solution for such artefacts, some practical and reasonably cost-effective remedies include: (1) selection based on the consensus of two different scoring functions, (2) selection of highly diverse hits (many artefacts cluster to similar compounds), (3) hedging the bets from several ranges of scores31 and (4) manually curating the final list of compounds for any unusual interactions. Ultimately, it is highly desirable to fix as many remaining ‘holes in the scoring functions’ as possible, and reoptimize them for high selectivity in the range of scores where the top true hits of gigaspace are found. Missing some hits in screening (false negatives) would be well tolerated because of the huge number of potential hits in the 1010 space (for example, losing 50% of a million potential hits is perfectly fine), so some trade-off in score sensitivity is acceptable.

The major types of computational approaches to screening a protein target for potential ligands are summarized in Table 2. Below, we discuss some emerging technologies and how they can best fit into the overall DDD pipeline to take full advantage of growing on-demand chemical spaces.

Table 2 Major types of virtual screening algorithms

Receptor structure-based screening

In silico screening by docking molecules of the virtual library into a receptor structure and predicting its ‘binding score’ is a well-established approach to hit and lead discovery and had a key role in recent drug discovery success stories11,17,51. The docking procedure itself can use molecular mechanics, often in internal coordinate representation, for rapid conformational sampling of fully flexible ligands52,53, using empirical 3D shape-matching approaches54,55, or combining them in a hybrid docking funnel56,57. Special attention is devoted to ligand scoring functions, which are designed to reliably remove non-binders to minimize false-positive predictions, which is especially relevant with the growth of library size. Blind assessments of the performance of structure-based algorithms have been routinely performed as a D3R Grand Challenge community effort58,59, showing continuous improvements in ligand pose and binding energy predictions for the best algorithms.

Results of the many successful structure-based prospective screening campaigns have been published over the years covering all major classes of targets, most recently GPCRs, as reviewed in refs. 17,51,60, whereas countless more have been used in industry. The focused candidate ligand sets, predicted by such screening, often show useful (10–40%) hit rates in experimental testing60, yielding novel hits for many targets with potencies in the 0.1–10-μM range (for those that are published, at least). Further steps in optimization of the initial hits obtained from standard screening libraries of less than 10 million compounds, however, usually require expensive custom synthesis of analogues, which has been afforded only in a few published cases20,61.

Identification of hits directly in much larger chemical spaces such as REAL Space not only can bring more and better hits31 but also supports their optimization, as any resulting hit has thousands of analogues and derivatives in the same on-demand space. This advantage was especially helpful for such challenging targets as SARS-CoV-2 main protease (Mpro), for which hundreds of standard virtual ligand screening (VLS) attempts came up empty-handed62 (see discussion on Mpro challenges in ‘Hybrid in vitro–in silico approaches’ below). Although the initial hit rates were low even in the ultra-large screens, VirtualFlow24 of the REAL database with 1.4 billion compounds still identified hits in the 10–100-µM range, which were optimized via on-demand synthesis63 to yield quality leads with the best compound Z222979552 (half maximal inhibitory concentration (IC50) = 1.0 μM). Another ultra-large screen of 235 million compounds, based on a newer Mpro structure with a non-covalent inhibitor (Protein Data Bank (PDB) ID: 6W63), also produced viable hits, fast optimization of which resulted in the discovery of nanomolar Mpro inhibitors in just 4 months by a combination of on-demand and simple custom chemistry64. The best compound in this work had good in vitro ADMET properties, with an affinity of 38 nM and a cell-based antiviral potency of 77 nM, which are comparable to clinically used PF-07321332 (nirmatrelvir)65.

With increasing library sizes, the computational time and cost of docking itself become the main bottleneck in screening, even with massively parallel cloud computing60. Iterative approaches have been recently suggested to tackle libraries of this size; for example, VirtualFlow used stepwise filtering of the whole library with docking algorithms of increasing accuracy to screen approximately 1.4 billion Enamine REAL compounds23,24. Although improving speed several-fold, the method still requires a fully enumerated library and its computational cost grows linearly with the number of compounds, limiting its applicability in rapidly expanding chemical spaces.

Modular synthon-based approaches

The idea of designing molecules from a limited set of fragments to optimally fill the receptor binding pocket has been entertained from the early years of drug discovery, implemented, for example, in the LUDI algorithm66. However, custom synthesis of the designed compounds remained the major bottleneck of such approaches. The recently developed virtual synthon hierarchical enumeration screening (V-SYNTHES)26 technology applies fragment-based design to on-demand chemical spaces, thus avoiding the challenges of custom synthesis (Fig. 3). Starting with the catalogue of REAL Space reactions and building blocks (synthons), V-SYNTHES first prepares a minimal library of representative chemical fragments by fully enumerating synthons at one of the attachment points, capping the other position (or positions) with a methyl or phenyl group. Docking-based screening then allows selection of the top-scoring fragments (for example, the top 0.1%) that are predicted to bind well into the target pocket. This is repeated for a second position (and then third and fourth positions, if available), and the resulting focused libraries are screened at each iteration against the target pocket. At the final step, the top approximately 50,000 full compounds from REAL Space are docked with more elaborate and accurate docking parameters or methods, and the top-ranking candidates are filtered for novelty, diversity and variety of desired drug-like properties. In post-processing, the best 50–500 compounds are selected for synthesis and testing. Our assessment suggests that combining synthons with the scaffolds and capping them with dummy minimal groups in the V-SYNTHES algorithm is a critical requirement for optimal fragment predictions because reactive groups of building blocks and scaffolds often create strong, yet false, interactions that are not present in the full molecule. Another important part of the algorithm is the evaluation of the fragment-binding pose in the target, which prioritizes those hits with minimal caps pointed into a region of the pocket where the fragment has space to grow.

Fig. 3: Synthon-based hierarchical screening.
figure 3

An overview of the V-SYNTHES algorithm allowing effective screening of more than 31 billion compounds in REAL Space or even larger chemical spaces, while performing enumeration and docking of only small fractions of molecules. The algorithm, illustrated here using a two-component reaction based on a sulfonamide scaffold with R1 and R2 synthons, can be applied to hundreds of optimized two-component, three-component or more-component reactions by iteratively repeating steps 3 and 4 until fully enumerated molecules optimally fitting the target pocket are obtained. PAINS, pan assay interference compounds.

Initially applied to discover new chemotypes for cannabinoid receptor CB2 antagonists, V-SYNTHES has shown a hit rate of 23% for submicromolar ligands, which exceeded the hit rate of standard VLS by fivefold, while taking about 100 times less computational resources26. A similar hit rate was found for the ROCK1 kinase screening in the same study, with one hit in the low nanomolar range26. V-SYNTHES is being applied to other therapeutically relevant targets with well-defined pocket structures.

A similar approach, chemical space docking, has been implemented by BioSolveIT, so far for two-component reactions67. This method is even faster, as it docks individual building block fragments and then enumerates them with scaffolds and other synthons. However, there are trade-offs for the extra speed: docking of smaller fragments without scaffolds is less reliable, and their reactive groups often have dissimilar properties from the reaction product. This may introduce strong receptor interactions that are irrelevant to the final compound and can misguide the fragment selection. This is especially true for cycloaddiction reactions and three-component scaffolds, which need further validation in chemical space docking.

Apart from supporting the abundance, chemical diversity and potential quality of hits, structure-based modular approaches are especially effective in identifying hits with robust chemical novelty, as they (1) do not rely on information for existing ligands and (2) identify ligands that have never been synthesized before. This is an important factor in assuring the patentability of the chemical matter for hit compounds and the lead series arising from gigascale screening. Moreover, thousands of easily synthesizable analogues assure extensive SAR-by-catalogue for the best hits, which, for example, enabled approximately 100-fold potency and selectivity improvement for the CB2 V-SYNTHES hits26. Availability of the multilayer on-demand chemical space extensions (for example, supported by MADE building blocks47) can also greatly streamline the next steps in lead optimization through ‘virtual MedChem’, thus reducing extensive custom synthesis.

Data-driven approaches and DL

In the era of AI-based face recognition, ChatGPT and AlphaFold68, there is enormous interest in applications of data-driven DL approaches across drug discovery, from target identification to lead optimization to translational medicine (as reviewed in refs. 69,70,71).

Data-driven approaches have a long history in drug discovery, in which ML algorithms such as support vector machine, random forest and neural networks have been used extensively to predict ligand properties and on-targets activities, albeit with mixed results. Accurate quantitative structure–property relationship (QSPR) models can predict physicochemical (for example, solubility and lipophilicity) and pharmacokinetic (for example, bioavailability and blood–brain barrier penetration) properties, in which large and broad experimental datasets for model training are available and continue to grow72,73,74. ML is also implemented in many quantitative SAR (QSAR) algorithms75, in which the training set and the resulting models are focused on a given target and a chemical scaffold, helping to guide lead affinity and potency optimization. Methods based on extensive ligand–target binding datasets, chemical similarity clustering and network-based approaches have also been suggested for drug repurposing76,77.

The advent of DL takes data-driven models to the next level, allowing analysis of much larger and diverse datasets while deriving more complicated non-linear relationships, with vast literature describing specific DL methodologies and applications to drug discovery27,70. By its ‘learning from examples’ nature, AI requires comprehensive ligand datasets for training the predictive models. For QSPR, large public and private databases have been accumulated, with various properties such as solubility, lipophilicity or in vitro proxies for oral bioavailability and brain permeability experimentally measured for many thousands of diverse compounds, allowing prediction of these properties in a broad range of new compounds.

The quality of QSAR models, however, differs for different target classes depending on data availability, with the most advances achieved for the kinase superfamily and aminergic GPCRs. An unbiased benchmark of the best ML QSAR models was given by a recent IDG-DREAM Drug-Kinase Binding Prediction Challenge with the participation of more than 200 experts78. The top predictive models in this blind assessment included kernel learning, gradient boosting and DL-based algorithms. The top-performing model (from team Q.E.D) used a kernel regression, protein sequence similarity and affinity values of more than 60,000 compound–kinase pairs between 13,608 compounds and 527 kinases from ChEMBL79 and Drug Target Commons80 databases as the training data. The best DL model used as many as 900,000 experimental ligand-binding data points for training, but still trailed the much simpler kernel model in performance. The best models achieved a Spearman rank coefficient of 0.53 with a root-mean-square error of 0.95 for the predicted versus experimental pKd values in the challenge set. Such accuracy was found to be on par with the accuracy and recall of single-point experimental assays for kinase inhibition, and may be useful in screenings for the initial hits for less explored kinases and guiding lead optimization. Note, however, that the kinase family is unique as it is the largest class of more than 500 targets, all possessing similar orthosteric binding pockets and sharing high cross-selectivity. The distant second family with systematic cross-reactivity comprises about 50 aminergic GPCRs, whereas other GPCR families and other cross-reactive protein families are much smaller. The performance and generalizability of ML and DL methods for these and other targets remain to be tested.

The development of broadly generalizable or even universal models is the key aspiration of AI-driven drug discovery. One of the directions here is to extract general models of binding affinities (binding score functions) from data on both known ligand activities and corresponding protein–ligand 3D structures, for example, collected in the PDBbind database81 or obtained from docking. Such models explore various approaches to represent the data and network architectures, including spatial graph-convolutional models82,83, 3D deep convolutional neural networks84,85 or their combinations86. A recent study, however, found that regardless of neural network architecture, an explicit description of non-covalent intermolecular interactions in the PDBbind complexes does not provide any statistical advantage compared with simpler approximations of only ligand or only receptor that omit the interactions87. Therefore, the good performances of DL models based on PDBbind rely on memorizing similar ligands and receptors, rather than on capturing general information about their binding. One possible explanation for this phenomenon is that the PDBbind database does not have an adequate presentation of ‘negative space’, that is, ligands with suboptimal interaction patterns to enforce the training.

This mishap exemplifies the need for a better understanding of behaviour of DL models and their dependence on the training data, which is widely recognized in the AI community. It has been shown that DL models, especially based on limited datasets lacking negative data, are prone to overtraining and spurious performance, sometimes leading to whole classes of models deemed ‘useless’88 or severely biased by subjective factors defining the training dataset89. Statistical tools are being developed to define the applicability range and carefully validate the performance of the models. One of the proposed concepts is the predictability, computability and stability framework for ‘veridical data science’90. Adequate selection of quality data has been specifically identified by leaders of the AI community as the major requirement for closing the ‘production gap’, or the inability of ML models to succeed when they are deployed in the real world, thus calling for a data-centric approach to AI91,92. There have also been attempts to develop tools to make AI ‘explainable’, that is, able to formulate some general trends in the data, specifically in the drug discovery applications93.

Despite these challenges and limitations, AI is already starting to make a substantial effect on drug discovery, with the first AI-based drug candidates making it into the preclinical and clinical studies. For kinases, the AI-driven compounds were reported as potent and effective in vivo inhibitors of the receptor tyrosine kinase DDR1, which is involved in fibrosis9. Phase I clinical trials have been announced for ISM001-055 (also known as INS018_055) for the treatment of idiopathic pulmonary fibrosis10, although the identity of the compound and its target has not been disclosed. For GPCRs, AI-driven compounds targeting 5-HT1A, dual 5-HT1A–5-HT2A and A2A receptors have recently entered clinical trials, providing further support for the AI-driven drug discovery concept. These first success stories are coming from kinase and GPCR families with already well-studied pharmacology, and the compounds show close chemical similarity to known high-affinity scaffolds94. It is important for the next generation of DL drug candidates to improve in novelty and applicability range.

Hybrid computational approaches

As discussed above, physics-based and data-driven approaches have distinct advantages and limitations in predicting ligand potency. Structure-based docking predictions are naturally generalizable to any target with 3D structures and can be more accurate, especially in eliminating false positives as the main challenge of screening. Conversely, data-driven methods may work in lieu of structures and can be faster, especially with GPU acceleration, although they struggle to generalize beyond data-rich classes of targets. Therefore, there are numerous ongoing efforts to combine physics-based and data-driven approaches in some synergistic ways in general95, and in drug discovery specifically96.

In virtual screening approaches, a synergetic use of physics-based docking with data-based scoring functions may be highly beneficial. Moreover, if the physics-based and data-based scoring functions are relatively independent and both generate enrichment in the selected focused libraries, their combination can reduce the false-positive rates and improve the quality of the hits. This synergy is reflected in the latest 3DR Grand Challenge 4 results for ligand IC50 predictions59, in which the top methods that used a combination of both physics-based and ML scoring outperformed those that did not use ML. Going forward, thorough benchmarking of physics-based, ML and hybrid approaches will be a key focus of a new Critical Assessment of Computational Hit-finding Experiments (CACHE), which will assess five specific scenarios relevant to practical hit and lead discovery and optimization97.

At a deeper level, the results of accurate physics-based docking (in addition to experimental data, for example, from PDBbind81) can be used to train generalized graph or 3D DL models predicting ligand–receptor affinity. This would help to markedly expand the training dataset and balance positive and negative (suboptimal binding) examples, which is important to avoid the overtraining issues described in ref. 87. Such DL-based 3D scoring functions for predicting molecular binding affinity from a docked protein−ligand complex are being developed and benchmarked, most recently RTCNN98, although their practical utility remains to be demonstrated.

To expand the range of structure-based docking applicability to those targets lacking high-resolution structures, it is also tempting to use AI-derived AlphaFold2 (refs. 99,100) or RosettaFold101 3D models, which already show utility in many applications, including protein–protein and protein–peptide docking102. Traditional homology models based on close protein similarity, especially when refined with known ligands103, have been used in small-molecule docking and virtual screening104, therefore AlphaFold2 is expected to further expand the scope of structural modelling and its accuracy. In a recent report, AlphaFold2 models, augmented by other AI approaches, helped to identify a cyclin-dependent kinase 20 (CDK20) small-molecule inhibitor, although at a modest affinity of 8.9 μM (ref. 105). More general benchmarking of the performance of AlphaFold2 models in virtual screening, however, gives mixed results. In a benchmark focused on targets with existing crystal structures, most AlphaFold2 models had to be cleaned from loops blocking the binding pocket and/or augmented with known ion or other cofactors to achieve reasonable enrichment of hits106. For the more practical cases of targets lacking experimental structures, especially for target classes with less obvious structural homologies in the ligand-binding pocket, the performance of AlphaFold2 models in small-molecule docking showed disappointing results in recent assessments for GPCR and antibacterial targets107,108. The recently developed AphaFill approach109 for ‘transplanting’ small-molecule cofactors and ligands form PDB structures to homologous AlphaFold2 models can potentially help to validate and optimize these models, although further assessment of their utility for docking and virtual screening is ongoing.

To speed up virtual screening of ultra-large chemical libraries, several groups have suggested hybrid iterative approaches, in which results of structure-based docking of a sparse library subset are used to train ML models, which are then used to filter the whole library to further reduce its size. These methods, including MolPal25, Active Learning110 and DeepDocking111, report as much as 14–100 reduction in the computational cost for libraries of 1.4 billion compounds, although it is not clear how they would scale to rapidly growing chemical spaces.

We should emphasize here that scoring functions in fast-docking algorithms and ML models are primarily designed and trained to effectively separate potential target binders from non-binders, although they are not very accurate in predictions of binding affinities or potencies. For more accurate potency predictions, the smaller focused library of candidate binders selected by the initial AI or docking-based screening can be further analysed and ranked using more elaborate physics-based tools, including free energy perturbation methods for relative112 and absolute113,114,115 free energy of ligand binding. Although these methods are much slower, utilization of GPU accelerated calculations28 holds the potential for their broader application in post-processing in virtual screening campaigns to further enrich the hit rates for high-affinity candidates (Fig. 2), as well as in lead optimization stages.

Future challenges

Further growth of readily accessible chemical spaces

The advent of fast and practical methods for screening gigascale chemical spaces for drug discovery stimulates further growth of these on-demand spaces, supporting better diversity and the overall quality of identified hits and leads. Specifically developed for V-SYNTHES screening, the xREAL extension of Enamine REAL Space now comprises 173 billion compounds116, and can be further expanded to 1015 compounds and beyond by tapping into an even larger building block set (for example, to 680 million of MADE building blocks47), by including four-component or five-component scaffolds, and by using new click-like chemistries as they are discovered. Real-world testing of MADE-enhanced REAL Space, and other commercial and proprietary chemical spaces will allow a broader assessment of their synthesizability and overall utility38,117,118. In parallel, specialized ultra-large libraries can be built for important scaffolds underrepresented in general purpose on-demand spaces, for example, screening of a virtual library of 75 million easily synthesizable tetrahydropyridines recently yielded potent agonists for the 5-HT2A receptor119.

Further growth of the on-demand chemical space size and diversity is also supported by recent development of new robust reactions for the click-like assembly of building blocks. As well as ‘classical’ azide-alkyne cycloaddition click chemistry120, recognized by the 2022 Nobel Prize in chemistry121, and optimized click-like reactions including SuFEx122, more recent developments such as Ni-electrocatalysed doubly decarboxylative cross-coupling123 show promise. Other carbon–carbon forming reactions use methyliminodiacetic acid boronates for Csp2–Csp2 couplings124, and most recently tetramethyl N-methyliminodiacetic acid boronates125 for stereospecific Csp3–C bond formation. Each of these reactions applied iteratively can generate new on-demand chemical spaces of billions of diverse compounds operating with a limited number of building blocks. Similar to the routinely used automatic assembly of amino acids in peptide synthesis, fully automated processes could be carried out with robots capable of producing a library of drug-like compounds on demand using combinations of a few thousand diverse building blocks126,127,128. Such machines are already working, although scaling-up production of thousands of specialized building blocks remains the bottleneck.

The development of more robust generative chemical spaces can also be supported by new computational approaches in synthetic chemistry, for example, predictions of new iterative reaction sequences129 or synthetic routes and feasibility from DL-based retrosynthetic analysis130. In generative models, synthesizability predictions can be coupled with predictions of potency and other properties towards higher levels of automated chemical design131. Thus, generative adversarial networks combined with reinforcement learning (GAN-RL) were recently used to predict synthetic feasibility, novelty and biological activity of compounds, enabling the iterative cycle of in silico optimization, synthesis and testing of the ligands in vitro50,132. When applied within a set of well-established reactions and pharmacologically explored classes of targets, these approaches already yield useful hits and leads, leading to clinical candidates50,132. However, the wider potential of automated chemical design concepts and robotic synthesis in drug discovery remains to be seen.

Hybrid in vitro–in silico approaches

Although blind benchmarking and recent prospective screening success stories for the growing number of targets support utility of modern computational tools, there are whole classes of challenging targets, in which existing in silico screening approaches are not expected to fare very well by themselves. Some of the hardest cases are targets with cryptic or shallow pockets that have to open or undergo a substantial induced fit to engage ligand, as often found when targeting allosteric sites, for example, in kinases or GPCRs, or protein–protein interactions in signalling pathways.

Although bioinformatics and molecular dynamics approaches can help to detect and analyse allosteric and cryptic pockets133, computational tools alone are often insufficient to support ligand discovery for such challenging sites. The cryptic and shallow pockets, however, have been rather successfully handled by fragment-based drug discovery approaches, which start with experimental screening for the binding of small fragments. The initial hits are found by very sensitive methods, such as BIACORE, NMR, X-ray134,135 and potentially cryo-electron microscopy136, to reliably detect weak binding, usually in the 10–100-μM range. The initial screening of the target can be also performed with fragments decorated by a chemical warhead enabling proximity-driven covalent attachment of a low-affinity ligand137. In either case, elaboration of initial fragment hits to full high-affinity ligands is the key bottleneck of fragment-based drug discovery, which requires a major effort involving ‘growing’ the fragment or linking two or more fragments together. This is usually an iterative process involving custom ligand design and synthesis that can take many years134,138. At the same time, structure-based virtual screening can help to computationally elaborate the fragments to match the experimentally identified conformation of the target binding pocket. Most cost-effectively, this approach can be applied when fragment hits are identified from the on-demand space building blocks or their close analogues for easy elaboration in the same on-demand space139.

The recent examples of hybrid fragment-based computational design approaches targeting SARS-CoV-2 inhibitors highlight the challenges presented by such targets and allow head-to-head comparisons to ultra-large VLS. One of the studies was aimed at the SARS-CoV-2 NSP3 conserved macrodomain enzyme (Mac1), which is a target critical for the pathogenesis and lethality of the virus. Building on crystallographic detection of the low-affinity (180 μM) fragments weakly binding Mac1 (ref. 139), merging of the fragments identified a 1-μM hit, quickly optimized by catalogue synthesis to a 0.4-μM lead140. In the same study, an ultra-scale screening of 400 million REAL database identified more than 100 new diverse chemotypes of drug-like ligands, with follow-up SAR-by-catalogue optimization yielding a 1.7-μM lead140. For the SARS-CoV-2 main protease Mpro, the COVID Moonshot initiative published results of crystallographic screening of 1,500 small fragments with 71 hits bound in different subpockets of the shallow active site, albeit none of them showing in vitro inhibition of protease even at 100 μM (ref. 141). Numerous groups crowdsourcing the follow-up computational design and screening of merged and growing fragments helped to discover several SAR series, including a non-covalent Mpro inhibitor with an enzymatic IC50 of 21 μM. Further optimization by both structure-based and AI-driven computational approaches, which used more than 10 million MADE Enamine building blocks, led to the discovery of preclinical candidates with cell-based IC50 in the approximately 100-nM range, approaching the potency of nirmatrelvir65. The enormous scale, urgency and complexity of this Moonshot effort with more than 2,400 compounds synthesized on demand and measured in more than 10,000 assays are unprecedented and this highlights the challenges of de novo design of non-covalent inhibitors of Mpro.

Beyond the Moonshot initiative, a flood of virtual screening efforts yielded mostly disappointing results62, for example, the antimalaria drug ebselen, which was proposed in an early virtual screen142, failed in clinical trials. Most of these studies, however, screened small-ligand sets focused on repurposing existing drugs, lacked experimental support and used the first structure of Mpro solved in a covalent ligand complex (PDB ID: 6LU7) that was suboptimal for docking non-covalent molecules142.

In comparison, several studies screening ultra-large libraries were able to identify de novo non-covalent Mpro inhibitors in the 10–100-μM range24,62,63,143, while experimentally testing only a few hundred synthesized on-demand compounds. One of these studies further elaborated on these weak VLS hits by testing their Enamine on-demand analogues, revealing a lead with IC50 = 1 μM in cell-based assays, and validating its non-covalent binding crystallographically63. Another study based on a later, more suitable non-covalent co-crystal structure of Mpro (PDB ID: 6W63) used an ultra-large docking and optimization strategy to discover even more potent 38-nM lead compounds64. Note that, although the results of the initial ultra-large screenings for Mpro were modest, they were on par with the much more elaborate and expensive efforts of the Moonshot hybrid approach, with simple on-demand optimization leading to similar-quality preclinical candidates. These examples suggest that even for challenging shallow pockets, structure-based virtual screening can often provide a viable alternative when performed at gigascale and supported by accurate structures, sufficient testing and optimization effort.

Outlook towards computer-driven drug discovery

With all the challenges and caveats, the emerging capability of in silico tools to effectively tap into the enormous abundance and diversity of drug-like on-demand chemical spaces at the key target-to-hit-to-lead-to-clinic stages make it tempting to call for the transformation of the DDD ecosystem from computer-aided to computer-driven144 (Fig. 4). At the early hit identification stage, the ultra-scale virtual screening approaches, both structure-based and AI-based, are becoming mainstream in providing fast and cost-effective entry points into drug discovery campaigns. At the hit-to-lead stage, the more elaborate potency prediction tools such as free energy perturbation and AI-based QSAR often guide rational optimization of ligand potency. Beyond the on-target potency and selectivity, various data-driven computational tools are routinely used in multiparameter optimization of the lead series that includes ADMET and PK properties. Of note, chemical spaces of more than 1010 diverse compounds are likely to contain millions of initial hits for each target20 (Box 1), thousands of potent and selective leads and, with some limited medicinal chemistry in the same highly tractable chemical space, drug candidates ready for preclinical studies. To harness this potential, the computational tools need to become more robust and better integrated into the overall discovery pipeline to ensure their impact in translating initial hits into preclinical and clinical development.

Fig. 4: Computationally driven drug discovery.
figure 4

Schematic comparison of the standard HTS plus custom synthesis-driven discovery pipeline versus the computationally driven pipeline. The latter is based on easily accessible on-demand or generative virtual chemical spaces, as well as structure-based and AI-based computational tools that streamline each step of the drug discovery process.

One should not forget here that any computational models, however useful or accurate, may never ensure that all of the predictions are correct. In practice, the best virtual screening campaigns result in 10–40% of candidate hits confirmed in experimental validation, whereas the best affinity predictions used in optimization rarely have accuracy better than 1 kcal mol−1 root-mean-square error. Similar limitations apply to current computational models predicting ADMET and PK properties. Therefore, computational predictions always need experimental validation in robust in vitro and in vivo assays at each step of the pipeline. At the same time, experimental testing of predictions also provides data that can feed back into improving the quality of the models by expanding their training datasets, especially for the ligand property predictions. Thus, the DL-based QSPR models will greatly benefit from further accumulating data in cell-permeability assays such as CACO-2 and MDCK, as well as new advanced technologies such as organs-on-a-chip or functional organoids to provide better estimates of ADMET and PK properties without cumbersome in vivo experiments. The ability to train ADMET and PK models with in vitro assay data representing the most relevant species for drug development (typically mouse, rat and human) would also help to address species variability as a major challenge for successful translational studies. All of this creates a virtuous cycle for improving computational models to the point at which they can drive compound selection for most DDD end points. When combined with more accurate in vitro testing, this may reduce and eventually eliminate animal test requirements (as recently indicated by FDA)145.

Building hybrid in silico–in vitro pipelines with easy access to the enormous on-demand chemical space at all stages of the gene-to-lead process can help to generate abundant pools of diverse lead compounds with optimal potency, selectivity and ADMET and PK properties, resulting in less compromise in multiparameter optimization for clinical candidates. Running such data-rich computationally driven pipelines requires overarching data management tools for drug discovery, many of them being implemented in pharma and academic DDD centres146,147. Building computationally driven pipelines will also help to reveal weak or missing links, in which new approaches and additional data may be needed to generate improved models, thus helping to fill the remaining computational gaps in the DDD pipeline. Provided this systematic integration continues, computer-driven ligand discovery has a great potential to reduce the entry barriers for generating molecules for numerous lines of inquiry, whether it is in vivo probes for new and understudied targets148, polypharmacology and pluridimensional signalling, or drug candidates for rare diseases and personalized medicine.