Introduction

As of April 2021, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of COVID-19, accounted for more than 143 million infections and more than three million deaths worldwide1. Virus genomic sequences are being generated and shared at an unprecedented rate, with more than one million SARS-CoV-2 sequences available via the Global Initiative on Sharing All Influenza Data (GISAID), permitting near real-time surveillance of the unfolding pandemic2. The use of pathogen genomes on this scale to track the spread of the virus internationally, study local outbreaks and inform public health policy signifies a new age in virus genomic investigations3. Further to understanding epidemiology, sequencing enables identification of emerging SARS-CoV-2 variants and sets of mutations potentially linked to changes in viral properties.

As highly deleterious mutations are rapidly purged, most mutations observed in genomes sampled from circulating SARS-CoV-2 virions are expected to be either neutral or mildly deleterious. This is because although high-effect mutations that contribute to virus adaption and fitness do occur, they tend to be in the minority compared with tolerated low-effect or no-effect ‘neutral’ amino acid changes4. A small minority of mutations are expected to impact virus phenotype in a way that confers a fitness advantage, in at least some contexts. Such mutations may alter various aspects of virus biology, such as pathogenicity, infectivity, transmissibility and/or antigenicity. Although care has to be taken not to confound mutations being merely present in growing lineages with mutations that change virus biology5, fitness-enhancing mutations were first detected to have arisen within a few months of the evolution of SARS-CoV-2 within the human population. For example, the spike protein amino acid change D614G was noted to be increasing in frequency in April 2020 and to have emerged several times in the global SARS-CoV-2 population, and the coding sequence exhibits a high dN/dS ratio, suggesting positive selection at the codon position 614 (refs6,7). Subsequent studies indicated that D614G confers a moderate advantage for infectivity8,9 and transmissibility10. Several other spike mutations of note have now arisen and are discussed in this Review, with particular focus on mutations affecting antigenicity.

The extent to which mutations affecting the antigenic phenotype of SARS-CoV-2 will enable variants to circumvent immunity conferred by natural infection or vaccination remains to be determined. However, there is growing evidence that mutations that change the antigenic phenotype of SARS-CoV-2 are circulating and affect immune recognition to a degree that requires immediate attention. The spike protein mediates attachment of the virus to host cell-surface receptors and fusion between virus and cell membranes11 (Box 1). It is also the principal target of neutralizing antibodies generated following infection by SARS-CoV-2 (refs12,13), and is the SARS-CoV-2 component of both mRNA and adenovirus-based vaccines licensed for use and others awaiting regulatory approval14. Consequently, mutations that affect the antigenicity of the spike protein are of particular importance. In this Review, we explore the literature on these mutations and their antigenic consequences, focusing on the spike protein and antibody-mediated immunity, and discuss them in the context of observed mutation frequencies in global sequence datasets.

Spike mutations receiving early attention

The rate of evolution of SARS-CoV-2 from December 2019 to October 2020 was consistent with the virus acquiring approximately two mutations per month in the global population15,16. Although our understanding of the functional consequences of spike mutations is rapidly expanding, much of this knowledge involves the reactive investigation of amino acid changes identified as rapidly increasing in frequency or being associated with unusual epidemiological characteristics. Following the emergence of D614G, an amino acid substitution within the receptor-binding motif (RBM), N439K, was noted as increasing in frequency in Scotland in March 2020. Whereas this first lineage with N439K (designated B.1.141 with the Pango nomenclature system17) quickly became extinct, another lineage that independently acquired N439K (B.1.258) emerged and circulated widely in many European countries18. N439K is noteworthy as it enhances the binding affinity for the ACE2 receptor and reduces the neutralizing activity of some monoclonal antibodies (mAbs) and polyclonal antibodies present in sera from people who have recovered from infection18. Another RBM amino acid change, Y453F — associated with increased ACE2-binding affinity19 — received considerable attention following its identification in sequences associated with infections in humans and mink; most notably one lineage identified in Denmark and initially named ‘cluster 5’ (now B.1.1.298)20. As of 5 November 2020, 214 humans infected with SARS-CoV-2 related to mink were all carrying the mutation Y453F21. The B.1.1.298 lineage also has Δ69–70, an amino-terminal domain (NTD) deletion that has emerged several times across the global SARS-CoV-2 population, including in the second N439K lineage, B.1.258. Δ69–70 is predicted to alter the conformation of an exposed NTD loop and has been reported to be associated with increased infectivity22.

Genomic analyses indicate a change in host environment and signatures of increased selective pressures acting upon immunologically important SARS-CoV-2 genes sampled from around November 2020 (ref.23). This coincided with the emergence of variants with higher numbers of mutations relative to previous circulating variants. These lineages because of their association with increased transmissibility were named ‘variants of concern’. They are defined by multiple convergent mutations that are hypothesized to have arisen either in the context of chronic infections or in previously infected individuals24,25,26,27,28,29. In addition to understanding the transmissibility and pathogenicity of these emerging variants, it is crucially important to characterize their antigenicity and the level of cross-protection provided by infection by earlier viruses that are genetically and antigenically similar to the virus that first emerged in December 2019 and which is used in all of the current vaccine preparations. Information on how spike mutations affect antigenic profiles can be derived from structural studies, mutations identified in viruses exposed to mAbs or plasma containing polyclonal antibodies, targeted investigations of variants using site-directed mutagenesis and deep mutational scanning (DMS) experiments that systematically investigate the possibility of mutations arising.

Immunogenic regions of spike

Several studies have probed the antigenicity of the SARS-CoV-2 spike protein by epitope mapping approaches, including solving the structure of the spike protein in complex with the antigen-binding fragment of particular antibodies13,30,31,32. Serological analyses of almost 650 individuals infected with SARS-CoV-2 indicated that ~90% of the plasma or serum neutralizing antibody activity targets the spike receptor-binding domain (RBD)12. A relative lack of glycan shielding may contribute to the immunodominance of the RBD33. One study reported structural, biophysical and bioinformatics analyses of 15 SARS-CoV-2 RBD-binding neutralizing antibodies31. Antibody footprints were generated by structural analyses of the spike residues considering potential hydrogen bonds and van der Waals interactions with a mAb atom that were less than 4.0 Å. Structural analyses allowed the categorization of RBD-binding neutralizing antibodies into four classes (Fig. 1a,b): ACE2-blocking antibodies that bind the spike protein in the open conformation (class 1); ACE2-blocking antibodies that bind the RBD in both the open conformation and the closed conformation (class 2); antibodies that do not block ACE2 and bind the RBD in both the open conformation and the closed conformation (class 3); and neutralizing antibodies that bind outside the ACE2 site and only in the open conformation (class 4)31. Within the RBD, RBM epitopes overlapping the ACE2 site are immunodominant, whereas other RBD sites generate lower and variable responses in different individuals12.

Fig. 1: Neutralizing antibody classes defined by structural analyses and properties of spike protein residues.
figure 1

a | Amino acid residues of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein are coloured according to the class of the antibody that binds to an epitope. Receptor-binding domain (RBD) antibody classes 1–4 (ref.31) are shown: green for class 1 (ACE2-blocking antibodies that bind the spike protein in the open conformation), yellow for class 2 (ACE2-blocking antibodies that bind the RBD in both the open conformation and the closed conformations), blue for class 3 (antibodies that do not block ACE2 and bind the RBD in both the open conformation and the closed conformations) and red for class 4 (neutralizing antibodies that bind outside the ACE2 site and only in the open conformation). When residues belong to epitopes of multiple classes, priority colouring is given to antibodies that block ACE2 and bind the closed spike protein. The amino-terminal domain (NTD) supersite30 is coloured in magenta. b | Aligned heat maps showing properties of amino acid residues where substitutions affect binding by antibodies in polyclonal human blood plasma or emerge as antibody escape mutations. The distance in angstroms to the ACE2-contacting residues that form the receptor-binding site (RBS) is shown in shades of orange; each residue is classified as having evidence for mutations affecting neutralization by either monoclonal antibodies (mAbs)40,43,47,48 or polyclonal antibodies in plasma from previously infected individuals (convalescent)39,40,41,43,48 or vaccinated individuals59 (‘mAb effect’ and ‘plasma effect’, respectively). A subset of these residues has mutations described as emerging upon exposure (co-incubation) to mAbs40,47,48 or plasma40,41 in laboratory experiments (‘mAb emerge’ and ‘plasma emerge’, respectively). When an observation includes a deletion, this is indicated by a red cross. Shades of green depict the results of deep mutational scanning (DMS) experiments where yeast cells expressing RBD mutants are incubated with multiple samples of human convalescent plasma39. The escape fraction (that is, a quantitative measure of the extent to which a mutation reduced polyclonal antibody binding) averaged across all amino acid substitutions at a residue (‘plasma average’) and the maximally resistant substitution (‘plasma max’) are indicated. DMS data on ACE2-binding affinity19 are shown in shades of red or blue representing higher or lower ACE2 affinity, respectively. The mean change in binding affinity averaged across all mutations at each site (‘binding average’) and alternatively the maximally binding mutant (‘binding max’) is shown. Scores represent binding constants (Δlog10 KD) relative to the wild-type reference amino acid.

Although the RBD is immunodominant, there is evidence for a substantial role of other spike regions in antigenicity, most notably the NTD13,30,34. Early structural characterization of NTD-specific antibodies 4A8 (ref.32) and 4–8 (ref.13) revealed similar epitope locations towards the upper side of the most prominently protruding area of the NTD. Cryogenic electron microscopy was used to determine the antibody footprint of the neutralizing antibody 4A8, and showed key interactions involving spike residues Y145, H146, K147, K150, W152, R246 and W258 (ref.32). Epitope binning of 41 NTD-specific mAbs led to the identification of six antigenic sites, one of which is recognized by all known NTD-specific neutralizing antibodies and has been termed the ‘NTD supersite’, consisting of residues 14–20, 140–158 and 245–264 (ref.30) (Fig. 1a,b). The mechanism of neutralization by which NTD-specific antibodies act remains to be fully determined, although it may involve the inhibition of conformational changes or proposed interactions with auxiliary receptors such as DC-SIGN or L-SIGN32,35. Relatively little is known of antigenicity in the S2 subunit, with immunogenicity thought to be impeded by extensive glycan shielding36, and although both linear and cross-reactive conformational S2 epitopes have been described37,38, the biological significance of these is not yet known.

Spike RBD mutations and immune escape

Several studies have contributed to the current understanding of how mutations in the SARS-CoV-2 spike protein affect neutralization. These studies include traditional escape mutation work that identifies mutations that emerge in virus populations exposed to either mAbs39 or convalescent plasma containing polyclonal antibodies40,41; targeted characterization of particular mutations18,42; and wider investigations of either large numbers of circulating variants43 or all possible amino acid substitutions in the RBD39,44,45,46. For spike residues where mutations have been shown to influence polyclonal antibody recognition, the observation of an effect on either mAbs or plasma is indicated in Fig. 1b. For a smaller number of residues, escape mutations emerging in virus exposed to mAbs or polyclonal plasma have been described (‘mAb emerge’ and ‘plasma emerge’ in Fig. 1b).

In a DMS study, researchers assessed all possible single amino acid variants using a yeast-display system and detected variants that escape either nine neutralizing SARS-CoV-2 mAbs45 or convalescent plasma from 11 individuals taken at two time points after infection39 (shades of green in Fig. 1b). The resulting heat maps provide rich data on the antigenic consequence of RBD mutations, with the plasma escape mutations being of particular interest given that they impact neutralization by polyclonal antibodies of the kind SARS-CoV-2 encounters in infections, with significant levels of immunity acquired through prior exposure or vaccination. Although significant interperson and intraperson heterogeneity in the impact of mutations on neutralization by polyclonal serum has been described, the mutations that reduce antibody binding the most occur at a relatively small number of RBD residues, indicating substantial immunodominance within the RBD39.

Of all RBD residues for which substitutions affected recognition by convalescent sera, DMS identified E484 as being of principal importance, with amino acid changes to K, Q or P reducing neutralization titres by more than an order of magnitude39. E484K has also been identified as an escape mutation that emerges during exposure to mAbs C121 and C144 (ref.40) and convalescent plasma41, and was the only mutation described in one study as able to reduce the neutralizing ability of a combination of mAbs (REGN10989 and REGN10934) to an unmeasurable level47. In an escape mutation study using 19 mAbs, substitutions at E484 emerged more frequently than at any other residue (in response to four mAbs), and each of the four 484 mutants identified (E484A, E484D, E484G and E484K) subsequently conferred resistance to each of four convalescent sera tested48. No other mAb-selected escape mutants escaped each of the four sera, although the mutations K444E, G446V, L452R and F490S escaped three of the four sera tested48.

Mutations at position 477 of the spike protein (S477G, S477N and S477R) rank prominently among mAb escape mutations identified by one study, and the mutation S477G conferred resistance to two of the four sera tested48. However, substitutions at 477 were not identified as being important in DMS with convalescent plasma39. The mutation N439K increases affinity for ACE2 (ref.19), is predicted to result in an additional salt bridge at the RBM–ACE2 interface and is thought to preferentially reduce the neutralization potential of plasma that already has low neutralizing activity18. However, a DMS study39 did not find that the mutation N439K significantly alters neutralization by polyclonal antibodies in plasma, in contrast to previous studies that found that N439K reduced neutralization by mAbs and convalescent plasma18. One explanation for this inconsistency is that the mechanism of immune escape conferred by N439K is through increased ACE2 affinity rather than by directly affecting antibody epitope recognition and that perhaps the experimental design of the DMS study is less sensitive to detecting immune evasion mutations of this type.

Spike NTD mutations and immune escape

In the NTD, most of the evidence for immune evasion focuses on a region centred at a conformational epitope consisting of residues 140–156 (N3 loop) and 246–260 (N5 loop), which includes the epitope of the antibody 4A832 (Fig. 1, magenta). In studies that identified the emergence of antibody escape mutations in virus populations exposed to convalescent plasma, mutations were roughly evenly distributed between the RBD and the NTD (Fig. 1b). One study described the emergence of escape mutations in viruses exposed to convalescent plasma from two individuals, one of which selected for NTD mutations only (N148S, K150R, K150E, K150T, K150Q and S151P)40. This was despite the plasma being a source of the highly potent RBD-targeting mAb C144 (ref.40). NTD antibody escape mutations were not observed for the other samples of plasma investigated, and furthermore, the 148–151 mutants exhibited only marginal reductions in sensitivity to the plasma tested, indicating individual immune responses may be differentially affected by mutations of RBD and NTD epitopes40.

Deletions in the NTD have been observed repeatedly in the evolution of SARS-CoV-2 and have been described as changing NTD antigenicity30,41,42. One study identified four recurrently deleted regions (RDRs) within the NTD and tested five frequently observed deletions within these: Δ69–70 (RDR1), Δ141–144 and Δ146 (RDR2), Δ210 (RDR3) and Δ243–244 (RDR4)42. Of the four RDRs, RDR1, RDR2 and RDR4 correspond to NTD loops N2, N3 and N5, whereas RDR3 falls between N4 and N5 in another accessible loop (Fig. 2a, asterisk). Both RDR2 deletions, Δ141–144 and Δ146, and Δ243–244 (RDR4) abolished binding of 4A8 (ref.42). Further evidence of the role of RDR2 deletions in immune escape was provided by a study that describes the emergence of Δ140 in SARS-CoV-2 co-incubated with potently neutralizing convalescent plasma, causing a fourfold reduction in neutralization titre41. This Δ140 spike mutant subsequently acquired the E484K mutation, resulting in a further fourfold drop in neutralization titre, and thus a two-residue change across the NTD and the RBD can drastically evade the polyclonal antibody response. The Δ140+E484K double mutant next acquired an 11-residue insertion in the NTD N5 loop between Y248 and L249, completely abolishing neutralization. This insertion, which also introduced a new glycosylation motif in the vicinity of RDR4, is predicted to alter the structure of the antigenic N3 and N5 NTD loops41. This finding further demonstrates the structural plasticity of the NTD and indicates that insertions and the acquisition of additional glycosylation motifs in the NTD are further mechanisms in addition to deletion that lead to immune evasion. Other examples of mutations that impact the epitope–paratope interface indirectly include mutations in the signal peptide region and at cysteine residues 15 and 136, which form a disulfide bond that ‘staples’ the NTD amino terminus against the galectin-like β-sandwich30. Mutations at those sites (for example, C136Y and S12P, which alter the cleavage occurring between residues C15 and V16) have been shown to affect the neutralizing activity of several mAbs, likely disrupting the disulfide bond and therefore dislodging the supersite targeted by several antibodies30.

Fig. 2: Structure-based analysis of conformational epitopes on the spike protein.
figure 2

a | Structure-based antibody accessibility scores for each spike protein ectodomain residue in the closed form were calculated with BEpro49. Black diamonds at the top and bottom of the plot indicate the positions of ACE2-contacting residues. Accessible amino-terminal domain (NTD) loops N1–N5 are labelled, and a loop falling between these is indicated with an asterisk. b | Two surface colour representations of antibody accessibility scores for the spike protein in the closed conformation according to the colour scheme in part a: a trimer axis vertical view (left) and an orthogonal top-down view along this axis (right). c | The extent to which each spike residue becomes more or less accessible when the spike protein is in its open form is shown. For each spike monomer (upright receptor-binding domain (RBD) (yellow), closed RBD clockwise adjacent (green) and closed RBD anticlockwise adjacent (blue)), the difference relative to the score calculated for the closed form (shown in part a) is shown. d | Two surface colour representations of antibody accessibility scores for the spike protein in the open conformation with a single monomer with an upright RBD are shown: a trimer axis vertical view (left) and an orthogonal top-down view along this axis (right).

Across the spike protein, some mutations that confer escape to neutralizing mAbs have little impact on serum antibody binding39,40,44, possibly because those mAbs are rare in polyclonal sera, targeting subdominant epitopes12,39,44. Escape mutations emerging in viruses exposed to convalescent plasma have been identified in both the NTD (ΔF140, N148S, K150R, K150E, K150T, K150Q and S151P) and the RBD (K444R, K444N, K444Q, V445E and E484K)40,41 (Fig. 1b). Notably, mutations emerging under selective pressure from convalescent plasma may be different from those selected by the most frequent mAb isolated from the same plasma40. Potentially, observed differences arise because mutations selected by convalescent plasma facilitate escape from multiple mAbs. Fewer data on the antigenic effects of S2 mutations exist, though D769H has been described as conferring decreased susceptibility to neutralizing antibodies24. Residue 769 is positioned in a surface-exposed S2 loop, and D769H was found to arise, in linkage with Δ69–70, in an immunocompromised individual treated with convalescent plasma24.

Conformational epitopes in spike

To evaluate potential antigenicity across the spike protein, we analysed the protein using BEpro, a program for the prediction of conformational epitopes based on tertiary structure49. This approach calculates a structure-based epitope score, which approximates antibody accessibility for each amino acid position. For each residue, the calculated score accounts for the local protein structure: half-sphere exposure measures and propensity scores each depend on all atoms within 8–16 Å of the target residue, with weighting towards closer atoms. Due to this aggregation, calculated scores are relatively insensitive to the effects of single amino acid substitutions. Scores were calculated for the spike protein in both the closed conformation and the open conformation (Fig. 2). It has been estimated that ~34% of spike proteins are closed and 27% are open (with the remainder in an intermediate form) following furin cleavage50. Scores rescaled between 0 and 1 are plotted for the closed conformation in Fig. 2a and are represented on the structure in Fig. 2b. A limitation of this approach is that it does not account for glycan shielding of residues and likely overestimates scores at the base of the ectodomain for residues closest to the carboxy terminus.

Comparisons with reporting of antibody footprints and the impact of mutations on antigenicity indicate that residues with mutations described as affecting recognition by mAbs or antibodies in convalescent plasma (Fig. 1b) tend to occur at residues with higher structure-based antibody accessibility scores compared with other residues belonging to epitope footprints and residues not implicated in antigenicity (Supplementary Fig, 1b). Notably, scores for residues with mutations described as affecting plasma antibody recognition are also slightly higher on average compared with those with mutations described as affecting mAbs only. Epitope scores are particularly high for residues with mutations described as emerging during exposure to convalescent plasma40,41 (Supplementary Fig. 1b). Experimental data on the emergence of mutations under selective pressure from polyclonal antibodies are relatively rare, although these trends for higher scores associated with such mutations indicate that information from structural analysis approaches of this kind can contribute to the ranking of residues at which substitutions are likely to impact the polyclonal antibody response.

Within the RBD, the two areas with high structure-based antibody accessibility scores for the closed spike structure (Fig. 2a, peaks with consecutive residues with scores greater than 0.8) are centred at residues 444–447 and residues 498–500. These areas are represented as yellow patches near the centre of the top-down view of the spike structure in Fig. 2b. Figure 2c shows that, in general, residues become more accessible and are likelier to form epitopes when the spike protein is in the open conformation, and this is especially true for the RBD, particularly for the upright RBD (Fig. 2c, yellow). In the open form, residues close to the ACE2-binding site (405, 415, 416, 417 and 468) become much more exposed on both the upright RBD and the clockwise adjacent closed RBD (Fig. 2c, green). The effect of mutations at these positions is likely to be greater for antibodies belonging to RBD class 1. Residues centred at 444–447 and 498–500 maintain high scores on the upright RBD and are joined by residues in areas 413–417 and 458–465. The only RBD residues that become notably less accessible in the open spike structure are residues 476, 477, 478, 586 and 487 of the closed RBD clockwise adjacent to the upright RBD, which become blocked by the upright RBD (Fig. 2c, green). Several RBD-specific antibodies are able to bind only the open spike protein (RBD classes 1 and 4 (ref.31)), and interestingly, it has been observed that D614G makes the spike protein more vulnerable to neutralizing antibodies by increasing the tendency for the open conformation to occur51.

Within the NTD, the highest-scoring spike residues in the closed form belong to a loop centred at residues 147–150, which each have scores greater than 0.9 (Fig. 2a, yellow patch to the extreme right of the structure viewed from the side in Fig. 2b). This loop, known as the N3 loop, is described as forming key interactions with the neutralizing antibody 4A8 (ref.32). One study described the structure of five previously unmodelled, protruding NTD loops, denoting them N1–N5. In addition to N3, high-scoring residues (greater than 0.7) are found at positions 22–26 (N1), 70 (N2), 173–187 (N4), 207–213 (Fig. 2a, asterisk) and 247–253 (N5). Structural analysis indicates NTD-binding antibodies are likely able to bind epitopes when the spike protein is in either the closed conformation or open the conformation (Fig. 2c). Outside the NTD and the RBD, the highest-scoring residues are residues 676 and 689 (which lie on either side of the loop containing the S1–S2 furin cleavage site, which is disordered in both the open conformation and the closed conformation50), 793–794, 808–812, 1,099–1,100 and 1,139–1,146 (Fig. 2a). When the spike protein is in the open conformation, increased accessibility results in substantially higher potential epitope scores for S2 residues centred at 850–854, which become more accessible on all three spike monomers (Fig. 2c), and residues 978–984, which become more accessible on the monomer anticlockwise adjacent to the upright RBD monomer (Fig. 2c, blue).

Structural context of spike mutations

To assess the impact of spike mutations and their immunological role in the global SARS-CoV-2 population, we combined structural analyses with the observed frequency of mutations in circulating variants (Fig. 3). Globally, the highest number of amino acid variants, mapped against the Wuhan-Hu-1 reference sequence (MN908947), are recorded at amino acid positions 614, 222 and 18 (Fig. 4a) (among 426,623 high-quality sequences retrieved from the GISAID database on 3 February 2021 and processed using CoV-GLUE). Residues at positions 614 and 222 have relatively low antibody access scores and are positioned ~50 Å from the RBS residues when the spike protein is in the open conformation (Fig. 3a,b). As mentioned earlier, there is evidence indicating that D614G confers a moderate advantage for infectivity8,9 and increases transmissibility10. The spike amino acid substitution with the second highest frequency is A222V, which is present in the 20A.EU1 SARS-CoV-2 cluster (also designated lineage B.1.177). This lineage has spread widely in Europe and is reported to have originated in Spain52. There is no evidence for a notable impact of A222V on virus phenotype (that is, infectivity and transmissibility), and so its increase in frequency is generally presumed to have been fortuitous rather than a selective advantage. The substitution L18F has occurred ~21 times in the global population53 and is associated with escape from multiple NTD-binding mAbs30.

Fig. 3: Structural context of spike amino acid mutations in the global virus population.
figure 3

Spike amino acid residues are coloured according to the frequency of amino acid substitutions or deletions. Variants (retrieved from CoV-GLUE) are based on 426,623 high-quality sequences downloaded from the Global Initiative on Sharing All Influenza Data (GISAID) database on 3 February 2021. a | Points representing each spike amino acid residue are positioned according to the antibody accessibility score and the distance to the nearest residue in the receptor-binding site. Residues with at least 100 sequences possessing a substitution or deletion are coloured according to the frequency scale shown, with the remainder shaded grey. b | Spike protein in closed form with all residues coloured according to the frequency scale shown; a trimer axis vertical view (left) and an orthogonal top-down view along this axis (right) are shown. c | A close-up view of the receptor-binding domain (RBD) bound to ACE2 (RCSB Protein Data Bank ID 6M0J95), with RBD residues shown as spheres coloured by amino acid variant frequency and ACE2 shown in gold. Amino acid variants are present at high frequency in positions at the RBD–ACE2 interface. d | Spike protein in open form with residues where at least 100 sequences possessing a substitution are highlighted; a trimer axis vertical view (left) and an orthogonal top-down view along this axis (right) are shown.

Fig. 4: Spike protein sequence variability and structure.
figure 4

a | The domain organization of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein showing amino acid sequence variability. The spike protein is synthesized as a 1,273 amino acid polypeptide, and the frequency of amino acid variants, including both substitutions and deletions, at each of the positions is shown. These variants, relative to the Wuhan-Hu-1 reference sequence, were identified with use of CoV-GLUE96, which filters out Global Initiative on Sharing All Influenza Data (GISAID) sequences97 identified as being of low quality or from non-human hosts (sequences retrieved from the GISAID database on 3 February 2021). Among 426,623 genomes after filtering, 5,106 different amino acid replacements or substitutions across 1,267 spike positions were identified, of which 320 at 259 positions were observed in at least 100 sequences. In addition to substitutions, several deletions have been observed, particularly within the amino-terminal domain (NTD). The most frequently detected NTD deletion is the two-residue deletion at positions 69 and 70 (Δ69–70), present in 45,898 sequences. The S1–S2 boundary is at amino acid position 685. b | Spike protein monomer displaying an upright receptor-binding domain (RBD). c | Spike protein structure in the closed conformation overlaid with surface representations shown with a trimer axis vertical view (left) and an orthogonal top-down view along this axis (right). Domains are coloured as in part a. The RCSB Protein Data Bank IDs for the SARS-CoV-2 spike protein structures are 6ZGG and 6ZGE50. The magenta spheres represent glycans, and the magenta triangles represent potantial N-linked glycosylation sites. The scissors represent the S1–S2 boundary at amino acid position 685. CD, connecting domain; CT cytoplasmic tail; FP, fusion peptide; RBM, receptor-binding motif; TM, transmembrane domain.

Among the 5,106 independent substitutions observed in the spike protein (Box 1), 161 are described as affecting recognition by mAbs or polyclonal antibodies in sera, of which 22 are present in more than 100 sequences. On average, variant frequency is higher at amino acid positions where mutations are described as affecting antibody recognition than at positions with no described substitutions of antigenic importance (Supplementary Fig. 1a), and high levels of amino acid substitutions are observed at some amino acid positions where mutations are described as affecting recognition by antibodies in convalescent plasma, including positions 439 and 484. This indicates that, generally, the amino acid positions at which antibody escape mutations have been detected in vitro tolerate mutations at least to some degree in vivo.

Within the RBD, the positions at which amino acid substitutions are present at the highest frequency are located close to the RBD–ACE2 interface (Fig. 3). Of the three RBD amino acid substitutions present in several thousand sequences, N439K and N501Y were described earlier, and N501Y is discussed in more detail in the next section in the context of variants of concern. The other substitution, S477N, is estimated to have emerged at least seven times in the global SARS-CoV-2 population and has persisted at a frequency of between 4% and 7% of sequences globally since mid-June 2020 (ref.53). One study described multiple mAbs that selected for the emergence of S477N and found this mutant to be resistant to neutralization by the entire panel of RBD-targeting mAbs that were tested. By contrast, when tested with convalescent serum, neutralization of the S477N mutant was similar to that of the wild type48. In common with N439K and N501Y, S477N results in increased affinity for the ACE2 receptor, although to a lesser extent19,54. As described in Box 2, substitutions may facilitate immune escape by increasing receptor-binding affinity independently of any effect that they may have on antibody recognition of epitopes; therefore, it is possible that such a mechanism contributes to the impact of S477N on neutralization. Variant frequency is also moderately high at RBD–ACE2 interface amino acid positions 417, 453 and 446. Of these positions, 446 occurs in a location in the spike structure that is predicted to be highly antigenic, and substitutions at this site are described as affecting neutralization by both mAbs and antibodies present in polyclonal serum39,43,46,48. Substitutions at amino acid positions 417 and 453 are described in the next section in the context of variants of concern.

Variants of interest or concern

In addition to single mutations of note, more heavily mutated SARS-CoV-2 lineages have emerged. Arguably the first variant of interest defined by the presence of several spike mutations, and referred to as B.1.1.298 (cluster 5), was detected in Denmark spreading among farmed mink and a small number of people20. This lineage is characterized by four amino acid differences, ΔH69–V70, Y453F, I692V and M1229I (Fig. 5). Of these, the Y453F substitution occurs at a residue within the ACE2 footprint and has been shown by DMS to increase ACE2 affinity19. In addition, Y453F has been described as reducing neutralization by mAbs47. In late 2020 and early 2021, the emergence and sustained transmission of lineages with mutations that affect the characteristics of the virus received much attention, most notably lineages B.1.1.7, B.1.351 and P.1 (also known as 501Y.V1, 501Y.V2 and 501Y.V3, respectively). The locations of the spike mutations in the B.1.1.298, B.1.1.7, B.1.351 and P.1 lineages are annotated in Fig. 5a, and information on the structural context and consequences of mutations for antibody recognition and ACE2 binding are shown in Fig. 5b.

Fig. 5: Amino acid mutations that characterize variants of concern.
figure 5

a | Spike heterotrimer in the open conformation overlaid with the surface representation (RCSB Protein Data Bank ID 6ZGG50). The locations of amino acid substitutions and deletions that define variants of concern are highlighted as red spheres. For B.1.1.7, scissors mark the approximate position of substitution P681H within the furin cleavage site, which is absent from the structural model. b | Aligned heat maps showing properties of amino acid residues or of the specific amino acid substitution, as appropriate. Epitope residues are coloured to indicate the amino-terminal domain (NTD) or the receptor-binding domain (RBD) class30. Structure-based antibody access scores for the spike protein in the closed and open conformations are shown. For RBD residues, the results of deep mutational scanning (DMS) studies show the escape fraction (that is, a quantitative measure of the extent to which a mutation reduced polyclonal antibody binding) for each mutant averaged across plasma (‘plasma average’) and for the most sensitive plasma (‘plasma max’)39. Each mutation is classified as having evidence for mutations affecting neutralization by either monoclonal antibodies (mAbs) or antibodies in convalescent plasma39 or vaccinated individuals59, and emerging in selection experiments using mAbs40,47,48 or post-infection serum40,47,48. The distance to the ACE2-contacting residues that form the receptor-binding site RBS is shown (for residue 681, this is estimated with use of the nearest residues present in published structures). DMS data on ACE2-binding affinity19 are shown by aggregation of scores and averaging across each mutant at a residue and alternatively the maximally binding mutant. Scores represent the binding constant (Δlog10 KD) relative to the wild-type reference amino acid. Mutations that are present in a variant but that are also widespread in the virus population in which a variant emerged, or exhibit high diversity within a lineage, are marked with a dagger.

Of the lineages summarized in Fig. 5, several amino acid substitutions are convergent, having arisen independently in different lineages: N501Y, which is present in lineages B.1.1.7, B.1.351 and P.1; E484K, which is present in lineages B.1.351 and P.1 and has been detected as emerging within the B.1.1.7 lineage55; and ΔH69–V70 in lineages B.1.1.298 and B.1.1.7. Additionally, lineages B.1.351 and P.1 possess alternative amino acid substitutions K417N and K417T, respectively. Further lineages with these mutations have also been identified; for example, an independent emergence of N501Y in the B.1.1.70 lineage, which is largely circulating in Wales. Residue 501 is at the RBD–ACE2 interface (Fig. 2c), and N501Y has been shown experimentally to result in one of the highest increases in ACE2 affinity conferred by a single RBD mutation19. E484 has been identified as an immunodominant spike protein residue, with various substitutions, including E484K, facilitating escape from several mAbs40,47,48 as well as antibodies in convalescent plasma39,40,41,48. E484K is estimated to have emerged repeatedly in the global SARS-CoV-2 population53 and has been described in two other lineages originating from lineage B.1.1.28 in addition to lineage P.1 reported to be spreading in the state of Rio de Janeiro in Brazil (lineage P.2)56 and in the Philippines (lineage P.3)57. Whereas K417 is described in the epitopes of RBD class 1 and class 2 antibodies31, alterations to K417 tend to affect class 1 antibody binding and are therefore somewhat less important for the polyclonal antibody response to the RBD, which is dominated by class 2 antibody responses, which are more susceptible to substitutions such as E484K44,58,59. In addition to their antigenic effect, both K417N and K417T are expected to moderately decrease ACE2-binding affinity19 (Fig. 5b). The ΔH69–V70 deletion has been identified in variants associated with immune escape in immunocompromised individuals treated with convalescent plasma24. Experiments have shown that ΔH69–V70 does not reduce neutralization by a panel of convalescent sera; however, it may compensate for infectivity deficits associated with affinity-boosting RBM mutations that may emerge due to immune-mediated selection22.

The first genomes belonging to the B.1.1.7 lineage were sequenced in the south of England in September 2020. Lineage B.1.1.7 is defined by the presence of 23 nucleotide mutations across the genome that map to a single branch of the phylogenetic tree3. Of these 23 mutations, 14 encode amino acid changes and three are deletions, including six amino acid substitutions in the spike protein (N501Y, A570D, P681H, T716I, S982A and D1118H) and two NTD deletions (ΔH69–V70 and ΔY144)3. The lineage has been associated with a rapidly increasing proportion of reported SARS-CoV-2 cases, and phylogenetic analyses indicate that this lineage was associated with a growth rate estimated to be 40–70% higher than that of other lineages60,61. There is also evidence that this lineage may be associated with a higher viral load62. In addition to N501Y, for which there is some evidence that it reduces neutralization by a small proportion of RBD antibodies63, there is evidence for an antigenic effect of ΔY144 (Fig. 5b). This deletion is expected to alter the conformation of the N3 NTD loop (amino acid positions 140–156) and has been demonstrated to abolish neutralization by a range of neutralizing antibodies30. The B.1.1.7 spike mutations have been shown to diminish neutralization of a higher proportion of NTD-specific neutralizing antibodies (9 of 10; 90%) than RBD-specific neutralizing antibodies (5 of 31; 16%)63. Given the immunodominance of the RBD, this could explain the modest reductions in neutralizing activity of convalescent sera against authentic B.1.1.7 or pseudoviruses carrying the B.1.1.7 spike mutations64,65. The co-occurrence of ΔY144 and E484K is concerning with respect to the polyclonal antibody response as the N3 loop, which ΔY144 changes, is predicted to be among the most immunogenic regions of the spike protein (Fig. 2a), and amino acid substitutions at position 484 diminish neutralization by a range of RBD-targeting antibodies.

Reports of lineages with N501Y circulating in the UK were followed by reports of another lineage possessing N501Y circulating in South Africa (lineage B.1.351), which has been rapidly expanding in frequency since December 2020 (ref.66). In addition to N501Y, lineage B.1.351 is defined by the presence of five further spike amino acid substitutions (D80A, D215G, K417N, E484K and A701V) and a deletion in the NTD, Δ242–244. High numbers of B.1.351 viruses also have the spike amino acid substitutions L18F, R246I and D614G. A similar NTD deletion, Δ243–244, abolishes binding by the antibody 4A8 (ref.42), and L18F and R246I also occur within the NTD supersite and likely affect antibody binding as well30. The co-occurrence of K417N, E484K and these NTD substitutions suggests that lineage B.1.351 may overcome the polyclonal antibody response by reducing neutralization by class 1 and class 2 RBD-specific antibodies and NTD-specific antibodies (Fig. 5b). Data reported in one study showed that nearly half of examined convalescent plasma samples (21 of 44; 48%) had no detectable neutralization activity against the B.1.351 variant58. Other experiments with pseudotyped viruses showed that the B.1.351 variant was also resistant to the neutralizing activity of some mAbs (12 of 17; 70%)67. Similarly, a study showed that the neutralizing effect of convalescent plasma collected from 14 individuals was strongly reduced against a live (authentic) B.1.351 virus (with IC50 reduced by 3.2-fold to 41.9-fold relative to the first-wave virus)68.

The P.1 lineage, a sublineage of B.1.1.28, was first detected in Brazil69 and in travellers from Brazil to Japan70. Lineage P.1 is characterized by the presence of several amino acid substitutions in the spike protein: L18F, T20N, P26S, D138Y, R190S, K417T, E484K, N501Y, H655Y and T1027I69. In addition to substitutions at positions 417, 484 and 501 discussed above, the P.1 lineage has a cluster of substitutions close to the described antigenic regions of the NTD, including L18F, which is known to reduce neutralization by some antibodies30. The substitutions T20N and P26S also occur in or near the NTD supersite30 at residues with high antibody accessibility scores (Fig. 5b), and T20N introduces a potential glycosylation site that could result in glycan shielding (Box 2) of part of the supersite. The P.1 lineage has also been associated with a reinfection case in Manaus, Brazil27, indicating it is contributing to antigenic circumvention of what might have been an otherwise effective immune response. Analyses integrating genomic and mortality data estimate that P.1 may be 1.7 to 2.4-fold more transmissible and that previous infection by non-P.1 SARS-CoV-2 provides 54–79% of the protection against P.1 infection compared with non-P.1 lineages71. More details of the frequency and geographic distribution of the P1 lineage can be found at the Pango lineages website72.

Increasingly, lineages possessing independent occurrences of mutations in common with the variants of concern B.1.1.7, B.1.351 and P.1 are being detected, demonstrating convergent evolution. For example, viruses of lineage B.1.525, which has been observed in several countries, albeit at low frequency to date, have NTD deletions ΔH69–V70 and ΔY144 in common with viruses of the B.1.1.7 lineage; E484K in common with the B.1.351 and P.1 lineages; and spike amino acid substitutions Q52R, Q677H and F888L73. Repeated amino acid substitutions at position 677 and the independent emergence of Q677H in several lineages in the USA provides strong evidence of adaptation, potentially through an effect of this mutation on the proximal polybasic furin cleavage site, although further experiments are required to determine its impact74. Other novel variants have been identified spreading in California and New York, USA (B.1.427 and B.1.429, and B.1.526, respectively). The B.1.427 and B.1.429 variants carry an antigenically noteworthy substitution, L452R75, which has been shown to reduce neutralization by several mAbs43,45,48,59 and convalescent plasma43. L452R independently appeared in several other lineages around the globe between December 2020 and February 2021, indicating that this amino acid substitution is probably the result of viral adaptation due to increasing immunity in the population75. L452R is also present in the A.27 lineage associated with a cluster of cases identified on the island of Mayotte76. The lineage B.1.526 has been found to carry either S477N or E484K, among other lineage-defining mutations77,78, both of which were described as antigenically important above. A new variant carrying E484K currently designated A.VOI.V2 was recently identified in Angola from cases involving travel from Tanzania79. This variant carries several amino acid substitutions in the spike protein and three deletions in the NTD, some of which are within the antigenic supersite79. Another variant within the A lineage, the prevalence of which is rising in Uganda (A.23.1), shares with the B.1.1.7 lineage a substitution at position 681 within the furin cleavage site (P681R has been found in the A lineage, whereas P681H has been found in the B.1.1.7 lineage), and additionally has the amino acid substitutions R102I, F157L, V367F and Q613H. Q613H is speculated to be important as it occurs at a position neighbouring D614G80. Amino acid position 157 has been identified as an epitope residue, with F157A reducing neutralization by the mAb 2489 (ref.34).

New variants will continue to emerge, and although it is important to understand the phenotypes of emerging variants in terms of infectivity, transmissibility, virulence and antigenicity, it is also important to quantify the phenotypic impacts of specific mutations present in variants, both individually and in combination with other mutations. As new variants with unforeseen combinations of mutations continue to emerge, such insights will allow predictions of virus phenotype. For example, recently detected viruses of lineage B.1.617.1 were anticipated to show altered antigenicity due to the presence of the substitutions L452R and E484Q, which have been described as affecting antibody recognition39,43,45,48,81. Moving forwards, the experimental characterization of SARS-CoV-2 spike mutations to date will continue to provide extremely useful information on individual mutations or combinations of mutations that may not yet have been seen in circulating viruses.

Vaccine efficacy against new variants

To date, vaccines have been licensed and rolled out very successfully in several countries, but the number of individuals vaccinated still represents a small fraction of the global population (Supplementary Table 1). To assess the impacts of mutations on vaccine efficacy, authentic viruses and pseudoviruses possessing particular spike mutations (either individually or in combination) and larger sets of mutations representing variants of concern and other circulating spike mutations have been assessed by neutralization assays with postvaccination sera (Supplementary Table 1). Typically, studies report a fold change in variant virus, or pseudovirus, neutralization relative to wild-type virus (the serum concentration at which 50% neutralization (IC50) is achieved with the variant divided by the average IC50 for the wild-type virus).

Postvaccination sera from a cohort of 20 volunteers immunized with the mRNA vaccine mRNA-1273 (Moderna) or BNT162b2 (Pfizer–BioNTech) showed high binding titres for anti-SARS-CoV-2 spike IgM and IgG with plasma neutralizing activity and relative numbers of RBD-specific antibodies equivalent to those in natural infection59. Furthermore, epitope mapping of mAbs isolated from postvaccination sera showed they targeted a range of RBD epitopes similar to those isolated from naturally infected individuals59. The plasma neutralizing activity and the numbers of RBD-specific memory B cells were found to be equivalent to those of plasma from individuals who had recovered from natural SARS-CoV-2 infection59. Investigations with pseudoviruses possessing RBD mutations carried by variants of concern demonstrated that the neutralizing activity of plasma from vaccinated individuals showed a small but significant decreases of onefold to threefold against E484K, N501Y or the K417N + E484K + N501Y triple mutant59. Other data indicate that the effect of N501Y alone on neutralization is relatively modest, and other studies using sera from 20 participants in a trial of the BNT162b2 vaccine showed neutralizing titres equivalent to those of pseudoviruses carrying the N501 and Y501 mutations82. Other investigations with recombinant viruses carrying N501Y, ΔH69–V70 + N501Y + D614G or E484K + N501Y + D614G demonstrated that compared with the Wuhan-Hu-1 reference virus, only E484K + N501Y + D614G resulted in a small and modest reduction in neutralization by postvaccination sera elicited by two BNT162b2 doses, and only modest differences in neutralization were seen compared with the Wuhan-Hu-1 reference virus83.

As stated earlier, convalescent plasma from individuals infected with pre-B.1.1.7 viruses (that is, viruses that circulated before the emergence of the B.1.1.7 lineage) shows only a modest reduction in neutralization activity against B.1.1.7 or pseudovirus possessing B.1.1.7 spike mutations63,78, and results obtained with postvaccination sera are broadly consistent with this. Pseudoviruses carrying the set of B.1.1.7 spike mutations evaluated with postvaccination serum from individuals who received the BNT162b2 vaccine (two doses)63,78,84 or mRNA-1273 vaccine (two doses)63 exhibited only a modest reduction in neutralization titres (less than threefold). However, assays using pseudovirus carrying B.1.1.7 spike mutations and with the addition of E484K, a combination that has been observed in sequencing of circulating isolates, showed larger, more significant drops (6.7-fold) in neutralization with postvaccination sera isolated from individuals who received the BNT162b2 vaccine85. In a live-virus neutralization assay, neutralizing titres of ChAdOx1 nCoV-19 (Oxford–AstraZeneca) postvaccination sera were nine times lower than titres against the B.1.1.7 lineage relative to a canonical non-B.1.1.7 lineage (Wuhan-Hu-1 with the S247R spike mutation)86. Similarly, neutralizing activity of sera elicited by the inactivated vaccine Covaxin (Bharat Biotech) against B.1.1.7 viruses was largely preserved87. Pseudovirus and live-virus neutralization assays showed that the neutralizing activity of sera from individuals after the two doses of the ChaAdOx1 vaccine against the B.1.351 variant was reduced or abrogated86. Postvaccination sera from individuals who received two doses of mRNA-1273 (28 days apart) showed reduced neutralization of the B.1.351 variant (6.4-fold reduction)88. By contrast, neutralizing activity of sera elicited by the inactivated vaccine BBIBP-CorV (Sinopharm) against the authentic virus B.1.351 showed only a slight reduction (less than twofold)89.

Comparison of the differing extents to which variants affect neutralization by postvaccination serum is complicated by the different methods used in various studies. However, one study tested eight SARS-CoV-2 variants of interest or concern, including B.1.1.298, B.1.1.7 and P.1, as well as three B.1.351 variants, distinguished by their combination of NTD mutations, representing sequence diversity in circulating viruses of this lineage. Pseudoviruses were generated by the same system and tested with postvaccination sera from individuals who received two doses of either the BNT162b2 vaccine (n = 30) or the mRNA-1273 vaccine (n = 35)90. Compared with wild type, pseudoviruses with D614G or the mutations defining lineages B.1.1.7, B.1.1.298 and B.1.429 each showed non-statistically significant decreases in neutralization90. Lineages P.1 and P.2 each showed significant decreases, with both BNT162b2 (6.7-fold and 5.8-fold, respectively) and mRNA-1273 (4.5-fold and 2.9-fold, respectively) postvaccination sera90. The three B.1.351 variants investigated, representing the majority of deposited B.1.351 sequences, showed much larger decreases in neutralization activity, ranging from 34-fold to 42-fold (BNT162b2) and from 19.2-fold to 27.7-fold (mRNA-1273). Taken together, these data indicate that E484K is the primary determinant of the decreases in neutralization titres, which distinguish P.1, P.2 and the three B.1.351 variants from the other pseudoviruses tested. In addition to E484K, further mutations that are shared by each of the three B.1.351 variants, but are not possessed by the P.1. and P.2 lineages, are D80A, Δ242–244, K417N (though K417T is present in P.1) and A701V.

To complement the experimental data provided by neutralization assays, there is emerging evidence from clinical trials on the impact of variants on vaccine efficacy. Early indications suggest that these are broadly consistent with the laboratory results, with the B.1.351 variant showing greater signs of vaccine escape. The ChAdOx1 nCoV-19 vaccine showed clinical efficacy against the B.1.1.7 variant but failed to provide protection against mild to moderate disease caused by the B.1.351 variant, with vaccine efficacy against the variant estimated at 10.4% (95% confidence interval −76.8 to 54.8)85,86,91. Preliminary data from clinical trials reported that the NVX-CoV2373 (Novavax) protein-based vaccine provides 95.6% efficacy against the wild-type virus and that this is moderately lower for the B.1.1.7 variant (85.6%) and is further reduced for the B.1.351 variant (60.0%)91. Similarly, the single-dose vaccine JNJ-78436735 (Johnson & Johnson/Janssen) has been shown to provide 72% protection against moderate to severe SARS-CoV-2 infections in the USA, but the proportion significantly decreased to 57% in South Africa (at a time when the B.1.351 variant was widespread)92. These data indicate that NVX-CoV2373 and JNJ-78436735 are clinically efficacious against the B.1.1.7 variant and variants circulating in the USA, and are consistent in that the B.1.351 variant is associated with a larger reduction in vaccine efficacy.

In addition to evaluation of vaccine efficacy against SARS-CoV-2 variants and mutations, the effects of mutations on some mAbs used as therapeutics have been described (Supplementary Table 2). Single mAb treatment can exert a selective pressure that potentially increases the possibility of mutational escape of the targeted antigen. The risk is likely to be reduced with the use of cocktails of two or more mAbs targeting non-overlapping epitopes. REGN-COV2 (Regeneron) (included in the RECOVERY trial in the UK) and AZD7742 (AstraZeneca) are two examples of mAbs cocktails that have been developed93. Importantly, some mutations in the RBM have already been identified in variants which are circulating in the UK (for example, N439K, T478I and V483I) and are likely to impact antigenicity.

Conclusions

There is now clear evidence of the changing antigenicity of the SARS-CoV-2 spike protein and of the amino acid changes that affect antibody neutralization. Spike amino acid substitutions and deletions that impact neutralizing antibodies are present at significant frequencies in the global virus population, and there is emerging evidence of variants exhibiting resistance to antibody-mediated immunity elicited by vaccines. Greater understanding of the correlates of immune protection is required to provide a context for the results of studies reporting changes in neutralization. A comprehensive understanding of the consequences of spike mutations for antigenicity will encompass both T cell-mediated immunity and non-spike epitopes recognized by antibodies. To monitor vaccine efficacy and to better understand the implications of antigenic variation for vaccine effectiveness, it will be important to collect information on vaccine status and viral genome sequence data from individuals infected with SARS-CoV-2. More generally, a broader understanding of the phenotypic impacts of mutations across the SARS-CoV-2 genome and their consequences for variant fitness will help elucidate drivers of transmission and evolutionary success.

Recent studies have shown the potential selective pressure exerted by convalescent plasma and mAb treatments on SARS-CoV-2 evolution in immunocompromised individuals24,25,26. Such circumstances, involving long-term virus shedders, may have contributed to the sporadic emergence of the more heavily mutated variants (for example, seen in the B.1.1.7 and B.1.351 lineages). Given that therapeutics (vaccines and antibody-based therapies) target mainly the SARS-CoV-2 spike protein, the selection pressures that favour the emergence of new variants carrying immune escape mutations generated in chronic infections24,25,26 will be similar to those selecting for mutations that allow reinfections within the wider population27,28,29. Therefore, sequencing of viruses associated with prolonged infections will provide useful information on mutations that could contribute to increased transmissibility or escape from vaccine-mediated immunity.

The collective data on the effect of mutations on vaccines and convalescent serum efficacy show that the polyclonal antibody response is focused on a few immunodominant regions, indicating the high probability of future mutation-mediated escape from host immunity. As antigenically different variants are continuing to emerge, it will become necessary to routinely collect serum samples from vaccinated individuals and from individuals who have been infected with circulating variants of known sequence. Cross-reactive immunity between circulating lineages can be assessed by measuring the ability of sera to neutralize panels of circulating viruses. The systematic surveillance of antigenic SARS-CoV-2 variants will be enhanced by the establishment of a network similar to the WHO-coordinated Global Influenza Surveillance and Response System (GISRS), a collaborative global effort responsible for tracking the antigenic evolution of human influenza viruses and making recommendations on vaccine composition. Modelling approaches to predict the evolutionary trajectories of emerging variants based on an understanding of the phenotypic effects of mutations will assist this process, as is the case for influenza virus94.

Prediction of the mutational pathways by which a virus such as SARS-CoV-2 will evolve is extremely challenging. Nonetheless, there is a rapidly expanding knowledge base regarding the effect of SARS-CoV-2 spike mutations on antigenicity and other aspects of virus biology. The integration of these data and emerging SARS-CoV-2 sequences has the potential to facilitate the automated detection of potential variants of concern at low frequency (that is, before they are spreading widely). Tracking the emergence of these viruses flagged as potential antigenically significant variants will help to guide the implementation of targeted control measures and further laboratory characterization. An important part of this process will be the preparation of updated vaccines tailored to emerging antigenic variants that are maximally cross-reactive against all circulating variants. All of these processes will benefit from close international collaboration and the rapid and open sharing of data.