Introduction

The spread of the coronavirus disease 2019 (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to rapid progress in the development of potential therapeutics and assays to assess them. However, the nature of this progress means that numerous assays and animal models for measuring antiviral immunity have been independently developed by different groups. Many of these are based on similar approaches and aimed at measuring identical outcomes. However, differences in cell lines and viral isolates (or laboratory adaptation of isolates), as well as in animal species and conditions, across different laboratories may lead to different predictions of the efficacy of interventions. For example, mutations in SARS-CoV-2 spike protein may affect the ability of antibodies to directly bind to this viral protein1,2, alter virus transmission dynamics3 or modulate viral binding to its entry receptor angiotensin-converting enzyme 2 (ACE2)4. Even for seemingly similar in vitro assays that use identical cells and viral isolates, minor details of assay design such as the inoculum size, length of incubation and method used to measure the infection level can have major impacts on interpreting the efficacy of different interventions in reducing infection.

Advancing studies into different animal models adds further complexity as factors such as the initial inoculum size, route of administration and infected cell type may all vary between species and laboratories. An agreement on a set of standardized assays for measuring SARS-CoV-2 immunity would advance the field substantially. However, given the pace of development and diversity of approaches, this may be challenging to achieve in the short term. In the interim, a better understanding of the characteristics and limitations of different in vitro assays and animal models should provide a rational basis for comparison.

In this Review, we provide an overview of different assays and animal models for SARS-CoV-2 infection and provide a theoretical framework for analysis and assessment of these studies. We show that many of the differences between alternative approaches can be understood through a consideration of infection dynamics in vitro and in vivo. This perspective of the dynamics of infection in different assays and animal models not only provides a foundation to understand variation in results between studies but also allows us to extrapolate to likely clinical effects. Finally, we outline key considerations for harmonizing and improving the use of current models to investigate SARS-CoV-2 immunity.

Measuring antiviral activity in vitro

A primary assessment of SARS-CoV-2 immunity involves measuring the neutralization capacity of serum or monoclonal antibodies in vitro5,6,7. This can be studied by measuring the ability of antibodies to inhibit the binding of the viral receptor binding domain (RBD) to the human protein ACE2 in vitro8,9. However, there may not be a direct relationship between binding inhibition and the level of inhibition of cellular infection. Therefore, numerous assays have been developed to measure neutralization of the infection of cells with either the native SARS-CoV-2 or a pseudotyped reporter virus carrying SARS-CoV-2 spike protein. Infection is measured after a period of co-incubation of virus and serum or antibody, quantifying either the number of infected cells, the production of viral RNA or infectious virus, or the viral cytopathic effect.

Antiviral activity is measured by comparing infection levels in antibody-treated and untreated cultures, and efficacy is often reported as an IC50 (the concentration of antibody required to reduce infection to 50% of that seen in untreated control cultures). The IC50 in these assays is usually interpreted as the concentration of antibody required to neutralize 50% of virions. However, as we show below, depending on factors such as the initial inoculum size, length of incubation and method of measuring infection, we would expect neutralization of anywhere between 10% and 99% of virions to be required to produce an apparent IC50 in different assays. Here, we highlight that different IC50 measurements between assays may arise from predictable differences in what is being measured under the specific assay conditions. We analyse several common assays and provide a framework for comparing assays and for interpreting assay results in a clinical context.

Single-cycle virus neutralization assays

Pseudotyped virus (or pseudovirus) assays involve incorporation of SARS-CoV-2 spike protein onto other viruses such as vesicular stomatitis virus (VSV)1,10,11 or lentiviruses12,13 (Table 1). These chimeric viruses also encode luciferase or other fluorescent reporters, providing a direct read-out of the level of infection in vitro when they are used to infect (transduce) ACE2-expressing cells. Pseudotyped virus assays using SARS-CoV-2 spike protein are only suitable for studying viral entry and the effects of antibodies targeting spike protein, because they do not include other components of the SARS-CoV-2 viral replication machinery.

Table 1 In vitro models of SARS-CoV-2 infection

Most pseudotyped virus assays involve a replication-defective virus (because SARS-CoV-2 spike protein is included in trans), and thus they measure the number of cells infected during a single infection cycle11,12,14 (replication-competent pseudoviruses are discussed below). This has the major benefit of requiring a lower level of laboratory containment. To test antibody-mediated inhibition, the virus and the antibody are pre-incubated for a period before being applied to cells (typically using methods such as spinoculation or polybrene treatment to improve infection efficiency), and inhibition is measured as the relative reduction in reporter signal, usually 24 h later. Fitting of the relationship between antibody concentration and reporter signal is then used to estimate the IC50 of an antibody. This assay can provide a direct read-out of the decrease in successful viral entry during a single round of infection as a result of treatment (Fig. 1a). However, the use of a pseudovirus also raises numerous challenges. Factors such as the folding, cleavage, density and geometry of spike proteins on the virion can affect both the mechanics of cell entry and the ability of antibodies to bind to (pseudo)virions and neutralize infectivity15,16 and may differ from those of the native virus17,18,19,20.

Fig. 1: In vitro assays for measuring viral inhibition.
figure 1

a | Single-cycle pseudotyped virus assays involve co-incubation of virus and cells and measurement of the number of infected cells by a fluorescent reporter construct. They can provide a direct measure of the proportion of virus entry neutralized by serum or antibodies. b | Multi-cycle assays use either replication-competent pseudoviruses or native severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and measure the spread of infection over multiple cycles of infection in vitro. The level of infection can be measured using detection of a fluorescent reporter construct, viral antigen in infected cells or free virus in the supernatant. Some assays reach saturation before the end of the incubation and are thus insensitive to small changes in initial inoculum or viral growth rate. Once saturation is overcome, the fraction reduction in initial infectious viral levels is reflected as an equivalent fold-change in final viral levels (left hand panels). By contrast, small changes in viral growth rate are amplified over multiple rounds of infection, leading to large changes in final viral levels (right hand panels). c | Plaque reduction neutralization assays involve co-incubation of virus and antibody followed by plating out of virus onto an immobilized cell monolayer and incubation. The number of infectious virions remaining in the inoculum is enumerated by counting plaques of infected cells. d | An alternative limiting dilution approach involves co-incubation of antibody and virus followed by splitting into multiple wells to observe the proportion of wells infected. Cytopathic effect is commonly used as a read-out. The apparent IC50 (the concentration of antibody required to reduce infection to 50% of that seen in untreated control cultures) of the assay is highly dependent on the initial inoculum size. Inhibition of the cytopathic effect is only observed when the initial viral titres are reduced to <1 TCID50 (50% tissue culture infectious dose) in some wells. For this reason, limiting dilution-based assays can estimate a very different IC50 compared with single-cycle pseudotyped viral assays. Note that in the cytopathic effect assay, for a given input level of V0 infectious units, the IC50 occurs when the fraction of virions neutralized is 0.5(1/V0).

Multi-cycle virus neutralization assays

Numerous assays involve measuring the ability of antibodies to inhibit the replication of virus over several days. Both replicating VSV/SARS-CoV-2 chimeric viruses1,15 and native SARS-CoV-2 (refs1,21) have been used to infect susceptible cell lines, with subsequent measurement of the level of infection after several days of incubation by quantifying reporter protein expression, viral antigen in infected cells or free virus in the supernatant (see Table 1). These assays can then be used to measure antibody neutralization by pre-incubation of different concentrations of antibody with the viral inoculum and measuring the relationship between antibody concentration and inhibition of infection.

Depending on the construct, a replicating chimeric virus may require lower-level containment than native SARS-CoV-2 but suffers from the same issues as single-cycle pseudovirus regarding the quality of spike protein. In addition, it is important to recognize that all aspects of viral replication, except receptor binding, are mediated by the parental (VSV) viral proteins. Therefore, the assay may have very different replication kinetics to native SARS-CoV-2. A major advantage of the use of native SARS-CoV-2 is the ability to measure the effects of agents acting at different parts of the viral life cycle, typically over multiple life cycles in vitro.

The use of a multi-cycle assay introduces several potential confounders compared with the single-cycle assays. First, for 50% neutralization of the inoculum to translate into 50% reduction in final infection levels, viral growth must not ‘saturate’ before the end of the assay. Saturation can frequently occur if a large proportion of cells become infected and, thus, the lack of uninfected cells limits viral expansion. If viral levels are saturated at the end of the assay (as is likely to be the case for most live virus neutralization assays), then reducing the initial inoculum will not reduce the final level of virus, instead simply making the maximal viral level occur later (see Fig. 1b). Such assays will be insensitive to low levels of neutralization and not directly comparable with single-cycle assays. Thus, care must be taken that growth remains exponential throughout the assay.

A second important consideration is whether the antibody acts only on the initial inoculum or remains throughout the assay. If the antibody remains present during the assay, then it can act not only to neutralize a proportion of the inoculum but also to inhibit the subsequent spread of virus in the culture. If the final read-out is the ‘level of total infection relative to control’, this will be very sensitive to small changes (per cycle) in the viral growth rate during culture. For example, a 10% reduction in viral growth over six cycles of infection will lead to a 50% reduction in the final infection. Thus, the apparent IC50 (in the assay) may occur when antibody neutralizes only 10% of virions (on each cycle), leading to very different estimates of antibody efficacy between assays (see Fig. 1b, lower panels). The situation can become even more complex because the level of inhibition may not be constant over time, as the stoichiometry of antibody to virus varies over the course of incubation. This can potentially be avoided by removing the antibody during incubation. However, a more definitive approach may simply be to focus on measuring the outcome of interest and choose assays accordingly. For example, if one wishes to measure neutralization (of an inoculum), use of a single-cycle assay measures neutralization of a single round of infection. At present, single-cycle assays typically depend on non-replicating pseudovirus, which has inherent differences to native virus. However, single-cycle live SARS-CoV-2 assays are also possible if viral infection is limited to a single cycle either by short incubation or by addition of antibodies early after initial infection to prevent further viral spread.

By contrast, if the outcome of interest is a reduction in viral growth, then a multi-cycle assay can be used to measure growth directly. This can be done by measuring virus levels at different time points during the assay and estimating growth over time. Although time consuming, this is the only direct way to allow a comparison of the concentration of antibody or serum required to achieve a given level of viral growth inhibition in vitro. Importantly, it is likely that the relationship between neutralization and growth inhibition will vary between culture systems owing to changes in factors such as cellular infectivity and viral burst size, and whether these reflect the dynamics of viral replication and inhibition in human infection is unclear.

Plaque neutralization and limiting dilution assays

The multi-cycle assays described above aim to measure the level of infection at the end of the assay. Other approaches aim to directly quantify the degree of neutralization of the virus in the initial SARS-CoV-2 inoculum. In this case, viral replication in culture is only important inasmuch as it allows visualization of infection arising from a single initial virion. The plaque reduction neutralization test, for example, involves incubating virus and antibody and quantifying the number of infectious virions by counting ‘plaques’ of infected cells after immobilization in a gel and incubation. The IC50 is determined as the concentration of antibody that reduces the number of plaques by 50% (Fig. 1c). This approach can be technically challenging as it involves forming a monolayer of cells in gel, plaque counting (which may include an element of operator subjectivity) and may also be affected by antibody persistence during incubation. However, it aims to give a direct read-out of the proportion of the inoculum neutralized by antibodies.

A similar assay involves a limiting dilution approach to measure the amount of infectious virus remaining in the initial inoculum. This requires incubating the virus and antibody and then splitting the virus–antibody mixture into several wells and using the viral cytopathic effect as a read-out of infectivity22. In this case, individual wells are scored at the end of the assay as having a binary outcome of either ‘infection’ or ‘no infection’ (Fig. 1d). The IC50 is then calculated using the Reed–Muench method to determine the antibody concentration where growth is inhibited in 50% of wells10,11. However, the degree of viral neutralization that is required to see ‘no growth’ in a culture well is highly dependent on the initial inoculum in culture. For a typical inoculum of 100 TCID50 (50% tissue culture infectious dose), ‘no infection’ will only be observed when all infectious doses are neutralized. Therefore, infection in 50% of wells (that is, IC50) will only be observed when 99% of the inoculum is neutralized (more generally, the observed IC50 is seen when the proportion of virus neutralized is equal to 0.5(1/inoculum)).

A major advantage of both of these assays is that they aim to measure the number of infectious units neutralized at the start of incubation (even if they require viral growth for the final read-out of infection). That is, the assays rely on the ability of a single infectious unit at the start of the assay either to form a visible plaque or to mediate a widespread cytopathic effect by the end of the assay, both of which involve extensive viral replication. Thus, if the antibody remains present during the assay and is able to inhibit viral replication, it will have the same appearance as complete neutralization of the inoculum. For example, if the initial inoculum is 100 TCID50 and each infected cell produces 10 infectious virions over subsequent rounds of infection, it would be possible to prevent viral growth and cytopathic effects with an antibody that neutralizes just over 9 out of every 10 virions.

Comparing neutralization between in vitro assays

The choice of cell line and virus (or pseudovirus) can clearly play a major role in the apparent efficacy of antibodies, as they can affect factors such as viral entry route and burst size during infection. This can be further compounded by batch variation of viral stocks. However, it is clear from the discussion above that even when the cell line and virus are standardized, the apparent IC50 of an antibody in an assay can require between 10% and 99% of virions to be neutralized depending on the assay design. Importantly, this relationship cannot be simply scaled between assays. That is, the antibody with the highest IC50 in one assay does not necessarily rank as having the highest IC50 in a different assay. This is because antibodies can vary greatly in the shape of the dose–response relationship between antibody concentration and ‘proportion of virions inhibited’ (Box 1). Therefore, some degree of harmonization or standardization is urgently required to allow better comparison between both serological levels of immunity and antibody products in development.

Choosing an optimal in vitro assay may depend on the proposed use of the intervention — prophylactic or therapeutic. Single-cycle assays are most suitable for predicting prophylactic efficacy, as they measure ‘protective efficacy’ of antibodies against small inocula such as might be encountered in community transmission (Fig. 2a). By contrast, multi-cycle assays that measure viral growth inhibition can be used to predict efficacy as post-exposure prophylaxis or treatment of established infection. In either case, single-cycle pseudovirus assays are suitable for high-throughput screens owing to their lower level of biocontainment required. However, the relationship between pseudovirus and live SARS-CoV-2 virus infection assays should be established and the results of pseudovirus assays should be confirmed in live SARS-CoV-2 virus infection assays23, where viral characteristics are more physiological.

Fig. 2: In vivo control of SARS-CoV-2 infection.
figure 2

a | Goals and challenges of intervention at different stages of infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the potential differences between animal models and human infection. b | Relationship between the level of neutralization or inhibition of the viral inoculum and observed protective efficacy following challenge with different-sized inocula. c | Schematic of how the time from treatment to peak viral load limits the observed effect of treatment on peak viral load. High inocula in animal models shorten the time to peak and limit the impact of therapies that reduce viral growth rate. d | The rate of decline in viral titres after peak is significantly faster in animal models than in human infection (P = 0.0007), suggesting differences in the rate of infected cell death or the degree of ongoing infection after peak. For details of published data used for viral decay analysis, see Supplementary information.

Animal models of COVID-19

The next stage of assessing antibody efficacy typically involves animal testing, where it is hoped that key elements of infection such as viral replication, pathogenesis and immunity may mimic those observed in human infection. Various species have been used as models of SARS-CoV-2 infection24 (Table 2). A major challenge with attempts to recapitulate human infection dynamics is the benign course of SARS-CoV-2 infection in most human subjects and the large variability of outcomes in older individuals, as such variability of outcomes in animal models would require intractably large study sizes to observe statistically significant treatment effects. As a result, animal models tend to be designed either towards eliciting pathological outcomes (aimed to be prevented by treatment) or as studies of virological rather than clinical end points to assess the effects of treatment. Similar to in vitro models, a major question in choosing an animal model is the therapeutic intent of an intervention: does the study aim to measure prophylactic, post-exposure or therapeutic efficacy (Fig. 2)? Depending on the therapeutic goal, numerous factors need to be considered.

Table 2 Animal challenge models for SARS-Cov-2 infection

Assessing prophylactic interventions in animal models

Preventing the establishment of infection in animal models is a key goal of many vaccine or prophylactic treatment studies. In most current models, animals are infected with ~104–106 TCID50 via different routes into the respiratory tract (Table 2) (although in vivo infectivity may be lower than in vitro infectivity owing to in vitro sequence artefacts25). It is not clear how these doses relate to the minimal infectious dose for a particular species and challenge route. However, current challenge doses may be many orders of magnitude higher than either the minimal dose for infection or the dose received in natural transmission. To completely prevent the establishment of infection requires neutralizing enough of the inoculum to leave less than one infectious dose remaining (Fig. 2b). This may lead to a significant overestimation of the concentration of antibody required to neutralize natural transmission (Supplementary Box 1). In addition, use of high-dose inocula may lead to difficulties differentiating between residual virus from the inoculum and new viral replication, requiring assays to detect sub-genomic mRNA26.

The absence of detectable viral growth in an animal does not necessarily indicate complete neutralization of the inoculum. Apparent sterilizing immunity will also be observed if treatment can block the spread of infection from cell to cell (even if the initial inoculum had not been fully neutralized). Thus, if the per-cell production of infectious virus from infected cells in vivo is less than from the initial inoculum, neutralizing cell to cell spread may be an easier mechanism to produce an apparent sterilizing treatment. The degree of viral inhibition required for apparent sterilizing immunity may vary between species because of differences in viral production and spread. As a result, prophylactic treatments may look more effective in models with lower viral replication.

When assessing protective efficacy in animal models, we suggest it may be beneficial to use a low-dose challenge model with sequence-verified virus stocks, in which animals are infected with something approximating the minimal animal infectious dose. Similarly, more physiological transmission could be modelled using nebulized virus or co-housing of infected and uninfected animals to allow direct animal to animal transmission27,28,29, rather than direct installation of virus in liquid suspension into the airways. A major impediment to this approach is that truly low-dose infection may lead to only a proportion of control animals being uninfected, which greatly reduces the statistical power in treatment studies. However, modifications such as serial or parallel low-dose challenge have the potential to provide a greater sensitivity to detect the protective efficacy of an intervention (Box 2). Low-dose challenge models for SARS-CoV-2 are yet to be reported, although these have become common in animal models of HIV and tuberculosis30,31,32. Challenge with more physiological inocula, in principle, should be more reflective of the level of immunity needed in vaccination or prophylactic treatment in humans (Supplementary Box 1). Interestingly, human challenge models of SARS-CoV-2 infection have also been proposed33,34. The need to calibrate the dose would be even more important in human studies, as there may be a dose-dependent effect on infection outcome. However, in human studies there may be less temptation towards the use of high challenge doses and the human infectious dose can be determined by dose escalation studies35. This further highlights the need to develop comparable low-dose challenge studies in animal models.

Post-exposure treatment or prophylaxis in animal models

An important use of therapies for COVID-19 is as a post-exposure prophylaxis for close contacts of infected individuals to prevent or control early viral growth, reduce disease and limit forward transmission. This can be modelled in animal studies by treatment after the establishment of infection but before the peak of viraemia. The efficacy of treatment in slowing viral growth depends on the reduction of infectious virus titres in each round of replication, as well as in the number of rounds of replication the virus undergoes (Fig. 2c). For example, if a treatment can reduce viral expansion on each cycle of infection by 50%, over 8 rounds of infection the virus titres will be reduced 256-fold in treated animals. High challenge doses of virus may reduce the time (and number of viral replication cycles) between infection and peak viral loads36 and thus limit the potential impact of ‘growth-reducing’ therapies. The impact of high inoculum size is demonstrated by the earlier peak in viral loads following inoculation in most animal models (2–4 days post infection (Table 2)) compared with animal to animal transmission37 or time to diagnosis in human infection (4–6 days, with the time to severe illness even longer)38,39. If antiviral effects on viral growth are the desired outcome of a study, then clearly the most important measure is a direct comparison of viral growth rates in treated versus untreated infection.

Therapeutic interventions in animal models

The therapeutic use of antiviral agents has the potential to reduce mortality and/or shorten the disease course in infected individuals40. Studies of immunotherapies in animal models might therefore focus on either a reduction in pathology or changes in viral dynamics as a result of treatment. However, several differences in the infection course observed in animal studies make it challenging to directly predict the effects in humans. Studies in patients infected with SARS-CoV-2 show that virus titres decline from the time of symptom onset, suggesting that initial presentation often occurs in the second week of infection at or after the peak in viral replication41,42. However, clinical progression often occurs while viral loads are declining and may be associated with immunopathology43,44. Different animal models show varying degrees of pathology, but in the majority of cases maximal pathology is observed within the first week of infection, up to a few days after the peak in initial viral levels 26,29,45,46,47 (Table 2). Both the viral peak and peak pathology occur earlier in many animal models than in humans, suggesting potential differences in the underlying pathophysiology. This might occur, in part, because of the altered timing of the viral peak and immune response and the mode of infection. In any event, the differences in pathophysiology raise a question of whether changes in pathological outcomes in animal models will have the same effects for severe COVID-19 in humans.

An alternative approach to measure the efficacy of therapeutic interventions is to directly study their effects on the dynamics of viral clearance. Analysis of patients with COVID-19 suggests that both a higher peak and a slower decay of viral load are seen in more severe infection48,49,50 and that antiviral treatment can improve outcome40. This suggests that inducing a faster decline in virus titre may improve outcome (although this causality has not been established). The decline in virus from the peak in other viral infections is typically thought to reflect the balance of any ongoing new infection of cells and the underlying death or shutdown of virus-producing cells (rather than the clearance of free virus, which is typically rapid)51. Thus, a faster decline in virus titre can be achieved by mechanisms such as increasing the death rate of infected cells, reducing the rate of production of virus from infected cells (through cytokine inhibition of viral production) or blocking any ongoing infection of cells (through virus neutralization or antiviral effects). An important consideration is whether the underlying mechanisms of decline in virus titre are similar between animal models and human infection. For example, the cell types infected may be quite different in some human ACE2 transgenic mouse models (with ubiquitous expression of human ACE2 driven by a constitutive promoter)52, which may lead to differences in cell susceptibility to viral cytopathic effects and/or differences in immune control of infection in different sites. Indeed, analysis of viral decay rates in animal models suggests these may be significantly faster than in human infection (Fig. 2d; Supplementary Methods). It is unclear whether the slower rates of viral decline in human infection reflect long-lived infected cells continually producing virus or ongoing rounds of infection of new cells. Again, how alterations in viral clearance translate into clinical outcome is uncertain, as immunopathology rather than virus-mediated destruction of infected cells may be a major factor driving severe illness in patients with COVID-19 (ref.43).

Immunity, immunopathology and immune recall in animal models

The early peak of infection in most animal models also affects the relative timing between viral and immune kinetics (Supplementary Box 2). In primary infection in animal models, the peak in viral infection occurs earlier than in human infection and likely reduces the role of acquired immune responses (which typically take 7–10 days to develop) in the early control of viral replication and decay of virus titres. By contrast, the later peak in virus titres in human infection means acquired immunity may play a larger role in driving viral decay. In addition, as the coexistence of high viral loads and high immune responses may contribute to immunopathology, these differences in timing may also limit immunopathology in animal models.

Altered infection kinetics in animal models may also affect the ability of vaccine-induced recall responses to control peak viral levels, as the activation and expansion of recall responses may be delayed for a few days after challenge53,54. Because the earlier the peak viral level occurs, the less time these responses have to act on it, high inocula may inherently limit the ability of vaccination to control peak viral loads (Supplementary Box 2).

In vivo veritas? Optimizing animal models

It is clear that there are several significant differences between the pathogenesis and kinetics of human infection and animal models, and there is currently no single, simple and optimal animal model for SARS-CoV-2 infection. In addition, it is also not clear which is the best outcome metric to study — for example, should an intervention aim to reduce the viral titre, pathology or lethality? The most suitable animal model and outcome measure for a particular application depends on the therapeutic intention, as well as the cost, timing and availability. Control of viral levels in the lower airways is clearly a metric that can be used across different animal models, even if they lack a pathological phenotype. Designing studies aimed at reducing pathology can be difficult, as the pathological outcomes are often quite variable between individuals (requiring potentially large group sizes). Syrian golden hamsters currently provide a more consistent lung disease phenotype among animal models described to date (Table 2). These animals, however, suffer from limited genetic diversity and a limited repertoire of available reagents compared with more widely used animal models. Non-human primates are most physiologically similar to humans but disease in these animals is typically mild, although sporadic fatal lung disease has been reported in aged African green monkeys. One question is whether the relatively benign course of infection seen in otherwise healthy non-human primates accurately models human infection, where disease severity is increased in older people with co-morbidities. Developing similar non-human primate models of COVID-19 in obese, diabetic and/or aged animals will likely be near impossible in the foreseeable future, so alternative methods of simulating disease that causes hospitalization in people are urgently needed.

Concluding remarks

Rapid progress is being made in understanding immunity to SARS-CoV-2, as well as in the development of novel prophylactic and therapeutic interventions. Assays to measure naturally acquired immunity and test the efficacy of immune interventions are key to this progress. The goal of the present work is not to criticize or dismiss particular assays or animal models. Instead, it is to state the importance of identifying what we want to measure and matching these goals to our experimental design. If we want to measure neutralization and prophylaxis, we need to choose assays and models that are optimized to quantify this. On the other hand, if we want to understand the effects of an intervention on viral growth, we need to measure growth directly. In most cases, this can be achieved by modifications to existing methods. For example, measuring the effects of interventions on viral growth rates and viral decay rates, rather than simply the peak viral load or time to viral clearance, should provide a clearer metric for comparison between different models and provide a more direct guide to predict therapeutic efficacy in human infection.

It is important to bear in mind that no matter how precisely we can measure the effects of interventions in vitro and in vivo, these assays and animal models remain imperfect mimics of human infection. Regardless of how sophisticated or internally valid an experimental system may be, it may still mislead us in prioritizing interventions in humans. Until correlates of protection can be established in clinical cohorts, our current approach must rely on assumptions and predictions from examples of other infections. However, a thoughtful approach to the use and interpretation of current systems should, ultimately, greatly enhance our ability to understand and predict the impact of immune interventions on SARS-CoV-2 infection.