Introduction

Geminiviruses are circular single-stranded DNA viruses with one or two components to their genomes. They are transmitted by insects and infect either monocots or dicots [10]. The names of geminiviruses have been standardised, and a set of rules to derive names for newly identified species were laid down several years ago [6]. In 2003, following guidelines established by the International Committee on Taxonomy of Viruses (ICTV) [11], we published a comprehensive list of species and isolates of geminiviruses [7]. One major development outlined at the time was the application of an arbitrary threshold value with which to demarcate distinct geminivirus species. This threshold was determined by analysing a large number of DNA-A sequences (n = 217) of members of the genus Begomovirus, from which it became clear that 89% nucleotide sequence identity represented an appropriate working value [4]. This allowed us to identify 102 distinct begomovirus species. This number increased to 147 by 2004. Since then, the number of complete DNA-A sequences has risen to 592, necessitating another review of the list of species in the context of the criteria established in 2003. This will provide the opportunity to update the list of species and isolate names and correct many of the errors present in the sequence database entries according to the established guidelines. In addition, we propose guidelines to incorporate strain and variant demarcation criteria and descriptors to the virus names so as to have a more precise identification of the rapidly increasing number of geminivirus sequences.

There is no formally accepted definition for any taxa below the species level, and no standardized approach has been established to deal with this issue. Certainly, the mandate of the ICTV does not include any consideration under the species level, and, hence, the decision has been left to the initiative of speciality groups like the Geminiviridae Study Group. With the exponential increase in DNA sequencing, and because biologists are encountering new isolates for which the biological properties are being determined and/or are of importance in breeding programs for disease resistance, establishing a geminivirus nomenclature system below the species level has become timely and essential. In order to classify viruses and to avoid further confusion, we published in 2005 a paper [5] describing the nomenclature used by virologists below the species level, and we proposed, for the time being, to restrict the number of categories to “strains” and “variants”. It is de facto accepted by the virologists that there is no homogeneity in the demarcation criteria, nomenclature and classification below the species level, and each specialty group is establishing an appropriate system for its respective family. However, newly proposed classification systems, such as that proposed herein for geminiviruses, adds additional value to the science of virus taxonomy because it sets a useful precedent.

Molecular genomic diversity below the species level

For pairwise comparisons of the full-length sequences of the genomes (or DNA-A genomic components) of 672 geminivirus isolates (225, 456 comparisons), at least two peaks can be distinguished in the range of 85–100% identity (Fig. 1a). The application of an arbitrary demarcation value of approximately 93% in the matrix of comparisons discriminated two populations that we have called “strains” and “variants”. These populations were then plotted separately to illustrate a distribution of percentage identities, shown in Figs. 1b and c, respectively. The “strains” peak ranges from 85 to 96%, while the “variants” peak ranges from 92 to 100%. There is an overlap between these two categories, just as there is an overlap between the peaks of the species and “strains” categories. Nevertheless, in the pairwise comparison matrix, it is straightforward to demarcate these categories. The first peak includes all begomoviruses that are clearly distinguishable as strains within the species level and can often be associated with a specific phenotype, host range or geographical distribution, while the second peak includes variants for which no clear unifying genotypic or phenotypic features is apparent. There is also a “shoulder” at 99–100% which may be attributable to either random point mutations or PCR/sequencing errors.

Fig. 1
figure 1

Distribution of pairwise sequence comparison (PASC) identity percentages between DNA-A sequences for 672 geminivirus isolates, under the species level; a for all isolates, b for members of the strain level, and c for variants

Virus strains

Although there is no official definition for a strain, the strain concept is widely used, and a de facto definition states “ strains are best represented by viruses belonging to the same species and having stable and heritable biological, serological, and/or molecular differences“. This definition seems broad enough to accommodate many different situations, however the demarcation of strains and variants as per the threshold defined in the previous paragraph does not fit with some accepted strain descriptors for geminiviruses presently in use, such as:

  • East African cassava mosaic virus—Uganda2 Mild

  • East African cassava mosaic virus—Uganda2 Severe

  • Tomato golden mosaic virus—Common

  • Tomato golden mosaic virus—Yellow vein

The obvious reason for this discrepancy is that very subtle differences, possibly only a few nucleotides [1], can cause major phenotypic differences and thus fall outside the previously determined demarcation. A difference of 8% in pairwise comparisons, corresponding to the peak of the strain level, accounts for approximately 200 nts/comparison (100/geminivirus genome). This is much more than the number of mutations that is known to change an isolate phenotype from severe to mild [2, 3]. Chatterji et al. [2, 3] demonstrated that among the 127 nts that differed between the severe and mild DNA-A component of tomato leaf curl New Delhi virus (ToLCNDV), the phenotypic difference was in fact due to one mutation in the N-terminus of the Rep protein and a point mutation in one iteron in the common region. Although the visible phenotype (severe or mild) was de facto associated with these isolates, it is therefore understandable that it was a misnomer, and by extension we can appreciate that such phenotypic differences may not be associated with 8% difference in sequence.

Virus variants

The definition of a variant is “something that differs slightly from the norm”, but with respect to viruses it means a slightly different genome, symptom, or mode of transmission. The term was recently proposed for use with geminiviruses with very small differences, and this definition would therefore apply to isolates exhibiting phenotypic differences that could be explained by a few nucleotide differences. A difference of 2–3% in pairwise comparisons corresponds to 50–80 nucleotide differences (25–40 nucleotides per geminivirus genome).

Need for descriptors and classification guidelines under the species level

Due to the steadily increasing number of available geminivirus sequences, it is becoming increasingly important to provide a rational system for assigning a newly characterized isolate to an existing strain, to a new strain, or to leave the isolate as a variant at the species level. Strain descriptors under the species level and guidelines to determine where a new isolate would best be classified are therefore needed. This can be achieved in two ways: first, by attempting to define quantitatively what constitutes a strain within a species, and second, by adopting descriptive identifiers to indicate a virus at the strain level. For the time being, variants could simply be defined by the absence of a descriptor and would correspond to all isolates that are not included in a specific strain. For strain designation, discriminating symptoms (mild, severe, yellow vein, stunting, etc.) and differential hosts (cowpea, soybean, mungbean, tobacco, tomato, etc.) are privileged descriptors and should be used more often when appropriate. When used at the strain level, host and symptom descriptors imply some level of host/symptom adaptation, as in the case of TYLCV isolates. In the case of unavailable distinguishing descriptors, letters A, B, C... would be used to designate the different strains.

Guidelines to demarcate strain designation

The matrix of the distances of pairwise sequence comparisons of all virus isolates can cluster them from the most closely related to the least related. The use of a percentage identity figure, as defined above, will allow grouping of virus isolates in strains (85–93%) and variants (94–100%) of strains or species. However, in some instances, due to extensive recombination, some isolates are highly related to several strains within a species, or even to isolates belonging to different species, making their classification contentious. We have investigated different methods of demarcation, and a quantitative evalution of the relationship of a contentious isolate to all the isolates of a specific species seems the most appropriate method for resolving this classification dilemma.

Homogeneous classification of geminivirus isolates into strains and species

Of 252 isolates, representing 209 species, 102 cluster in more than one strain per species, but only 37 of those present some degree of heterogeneity at the species level worth considering in this paper. The other 65 isolates comply with the 89% rule, showing an intra-species pairwise nucleotide identity of 91%. The remaining 37 isolates, currently belonging to 17 species, can be divided into two categories. In the first category, 17 isolates, belonging to 5 species, have intra-species pairwise comparisons that are below the species threshold level. In the second category, 20 isolates, belonging to 14 species, have pairwise comparisons above the species threshold (Fig. 2). This heterogeneity reflects in part the history of geminivirus taxonomy and in part the difficulty in some instances to assign a virus isolate to the correct species, or the lack of precise guidelines for assigning an isolate to a specific species. This paper proposes to correct the heterogeneity of geminivirus isolates at the strain level by including in the same species a number of isolates previously belonging to different species (Figs. 3, 4).

Fig. 2
figure 2

Matrix of distances (% identity) of PASC identity percentages between DNA-A sequences of 47 geminivirus isolates belonging to 21 virus species. To avoid confusion with the new nomenclature now established, the old nomenclature has been used in this matrix and may not be the one finally chosen and listed in Table 1. The grey and light grey cells identify variant, strain and species relationships, respectively. The thick cell borders represent proposed new species. At the lower left end side of the species boxes is indicated the intra-species pairwise percentage identity, while the inter-species pairwise percentage identity is indicated between two species boxes

Fig. 3
figure 3

PASC identity percentages between DNA-A sequences for 672 geminivirus isolates. Genus, species and isolate levels are identified

Fig. 4
figure 4

Phylogenetic tree representing 200 geminivirus representative isolates of 200 species. Chicken anemia virus (CAV) has been used as an outgroup individual. The tree was calculated and designed with the software MegAlign of DNAStar (Lasergene) using the Clustal V algorithm. Virus names for the abbreviation in the tree can be found in Table 1. The accession number used for each virus is listed after the virus abbreviation. For convenience, the tree has been truncated into two separate clusters. The genera in the family Geminiviridae are indicated. The genus Begomovirus has been separated into clusters, one each from the Old and New World, respectively

Table 1 Updated list of geminivirus species and isolate names with strain and variant descriptors

In the first category of strains that have intra-species pairwise comparisons below the species level, it is clear that recombination between different isolates has led to higher levels of identity between them, constituting a set of viruses that is best kept together as a single species. The example for this situation is the TYLCV cluster, comprising five strains with pairwise percentages from 92 to 85% (Fig. 2).

The second category corresponds to viruses belonging to different species for which intermediates have been found or for which, with hindsight, anomalous decisions have been made over the years. A good example is the cluster including TbLCJV-[JR;3] and HYVKgV-[JR;TobKG5]. For these isolates the species threshold was set at 90%. At a 89% threshold, these five viruses would be classified as three species. Similarly, PYMTV, but not PYMPV, would be clustered with PYMV. Another example, where intermediates have been found, is the AYVCNV/AYVV cluster. It is now clear that this cluster resembles the TYLCV cluster and therefore should be treated similarly. The ToLCIRV/ToLCKV and CLCuMV/CLCuRV clusters are of the same category and should also be reconsidered as a single species (Fig. 2).

If the clusters of the second category are reclassified in single species, the intra-species pairwise percentages for the 21 clusters vary between 92 and 88%, and the inter-species pairwise percentages vary between 62 and 86% (Fig. 2).

On the basis of this proposal, the following viruses would be incorporated into a single species.

Ageratum yellow vein virus

AYVV-A[ID;Tom].AB100305

AYVV-A[ID;Tom].AB100305

AYVV-B[TW;Tao3;05].DQ866134

AYVV-B[TW;Tao3;05].DQ866134

AYVTV-[TW;Tai;99].AF307861

AYVV-C[TW;Tai;99].AF307861

AYVCNV-A[CN;Gx68;03].AJ849916

AYVV-D[CN;Gx68;03].AJ849916

AYVCNV-B[CN;Hn2.19;01].AJ564744

AYVV-E[CN;Hn2.19;01].AJ564744

Cotton leaf curl Multan virus

CLCuMV-A[PK;Y62;95].AJ002447

CLCuMV-A[PK;Y62;95].AJ002447

CLCuMV-B[PK;Mul].AJ496461

CLCuMV-B[PK;Mul].AJ496461

CLCuMV-C[IN;Bha;05].DQ191160

CLCuMV-C[IN;Bha;05].DQ191160

CLCuRV-[IN;Abo;03].AY795606

CLCuMV-D[IN;Abo;03].AY795606

Honeysuckle yellow vein mosaic virus

HYVMV-A[JR;FK1].AB178945

HYVMV-A[JR;FK1].AB178945

HYVKgV-[JR;TobKG5].AB178949

HYVMV-D[JR;TobKG5].AB178949

Honeysuckle yellow vein virus

HYVV-UK[UK;Nor1;99].AJ542540

HYVV-A[UK;Nor1;99].AJ542540

HYVKoV-[JR;HY12;00].AB178946

HYVV-C[JR;HY12;00].AB178946

TbLCKoV-[JR;KK;Tom].AB055009

HYVV-D[JR;KK;Tom].AB055009

Potato yellow mosaic virus

PYMV-Po[VE].D00940

PYMV-Po[VE].D00940

PYMV-To[GP;Tom].AY120882

PYMV-To[GP;Tom].AY120882

PYMTV-[TT;Tom].AF039031

PYMV-TT[TT;Tom].AF039031

Tomato leaf curl Karnataka virus

ToLCKV-A[IN;Jan;05].AY754812

ToLCKV-A[IN;Jan;05].AY754812

ToLCKV-B[IN;Ban;93].U38239

ToLCKV-B[IN;Ban;93].U38239

ToLCIRV-[IR;Ira].AY297924

ToLCKV-C[IR;Ira].AY297924

Guidelines for the classification of geminivirus isolates in variants, strains and species

In order to classify all geminivirus isolates in a similar manner, and therefore obtain a homogeneous classification, the following guidelines are proposed:

  1. 1.

    Compare a new geminivirus isolate sequence to all known sequences representative of species;

    • if the pairwise sequence comparison analysis <88%, it belongs to a new species.

    • if pairwise sequence comparison analysis =88–89%, it belongs tentatively to the closest species.

    • if pairwise sequence comparison analysis >89%, it belongs definitively to that species.

  2. 2.

    Compare a new geminivirus isolate sequence to all known sequences representative of strains and variants in the identified species;

    • if pairwise sequence comparison analysis <93% to all known members, it is a member of a new strain in that species,

    • if pairwise sequence comparison analysis > 94% to an existing isolate, it is a variant of that strain in that species.

The software used for the pairwise sequence comparison analysis is the Clustal V algorithm and a subset of species representative sequences will be available on line at http://www.danforthcenter.org/iltab/geminivirus.

Nomenclature of virus isolate descriptors

In addition to the descriptor information becoming part of the virus name, it has been requested of GenBank to systematically request from authors a minimum of information with the deposited sequence, including the date and exact GPS location of the site from where the isolate was obtained. Although this has not been implemented yet, there are good reasons to believe that it will be very soon, as this information is increasingly important for epidemiological and evolutionary studies. It might even be possible to retrieve such information for the hundreds of isolates already recorded.

The Geminiviridae Study Group previously accepted that the first isolate of a species to be described did not require a distinguishing descriptor (for example TYLCV, TYLCSV, ToLCV) and did not always include this information in the species list, primarily to provide a concise name. However, because of the perceived need for distinguishing and informative descriptors, it is advisable to reconsider this decision and add an appropriate descriptor in all cases.

List of isolates that could be promoted to strain status

It is apparent that a stable genetic change in a virus leading to a distinctive phenotype can be as small as an alteration to a single nucleotide. However, our statistical analysis indicates a peak corresponding to approximately 90–91% identity, representing about 300 nucleotide changes between genome (genomic component) sequences for these isolates. Because most of the recognized begomovirus strains cluster within the peak, we propose to define all such isolates as strains. On this basis, reviewing geminivirus information compiled in sequence databases and the last update of geminivirus isolates that we have done [5], the following begomoviruses would gain the status of strain:

Begomovirus

Accession number

AYVV

X74516

AYVV-[Tom]

AB100305

BYVMV-[Mad]

AF241479

BYVMV-[301]

AJ002453

CLCuGV-[Hl/Cai]

AJ542539

EACMV-[TZ]

Z83256

EACMV-[KEK2B]

AJ006458

EpYVV-[MNS2]

AJ438936

EpYVV-[Yam]

AB079766

HYVMV-[Yam]

AB079765

HYVV-[SP1]

AB182261

MCLCuV-[GT]

AF325497

MCLCuV-[CR]

AY064391

PaLCuCNV-[G10]

AJ558125

PaLCuV-[Cot]

AJ436992

PaLCuV

Y15934

PepGMV-[Tam]

U57457

PepGMV-[CR]

AF149227

PepGMV-[Di]

AY928512

PepGMV-[Mo]

AY928516

PepGMV-[Ser]

AY928514

SiMoV -[BR]

AY090555

SiMoV-[A1B3]

AJ557450

ToChLPV-[BCS]

AY339619

ToLCBV

Z48182

ToLCJV

AB100304

ToLCJV-[Age]

AB162141

ToLCV-[AU]

S53251

ToSLCV-[NI1]

AJ508784

ToSLCV-[NI2]

AJ508785

TYLCCNV-[Y43]

AJ781302

TYLCTHV-[SaNa]

AY514632

The following viruses probably should be grouped within the mild strain of TYLCV on the basis of the phenotype of the virus that originally described that cluster:

Abbreviation

Accession number

New abbreviation

TYLCV-[Atu]

AB116633

TYLCV-Mld[Atu]

TYLCV-[Kis]

AB116634

TYLCV-Mld[Kis]

TYLCV-[SzD]

AB116635

TYLCV-Mld[SzD]

TYLCV-[SzOs]

AB116636

TYLCV-Mld[SzOs]

TYLCV-[SzY]

AB116632

TYLCV-Mld[AzY]

TYLCV-[Sz]

AB110218

TYLCV-Mld[Sz]

The following two pairs of viruses have pairwise sequence identities of about 91% with other isolates of the same virus species, and therefore one member of the pair deserves the status of strain:

First virus

Accession number

Second virus

Accession number

ToLCSDV-[Gez]

AY044137

ToLCSDV-[Sha]

AY044139

TYLCSV-[Sic]

Z28390

TYLCSV-[Tun]

AY736854

Based on the pairwise sequence comparison score, the following four isolates require a strain descriptor:

Begomovirus

Accession number

TYLCSV-[ES2]

L27708

TYLCSV-[U83-8]

AJ519675

TYLCSV-[ES1]

Z25751

TYLCSV-[MA]

AY702650

Using the same criteria, a single curtovirus could be considered a strain:

Curtovirus

Accession number

BCTV-Cal[Log]

AF379637

This virus already has a strain descriptor in the published list (BCTV-Cal[Log]) along with BCTV-Cal. They were both originally assigned as California strains before other curtovirus species were recognized and have retained this unecessary strain descriptor since then. Hence, the viruses should be referred to as BCTV-[Cal] and BCTV-Log[Cal].

Examples of nomenclature for descriptors under the species level

Virus names should adopt the nomenclature structure:

Species name, strain descriptor (symptoms, host, location, if appropriate or a letter such as A, B, C) [variant descriptor (country: location: [host]: year)]

The following case studies are used to illustrate name derivation:

Species/virus name

Abbreviation

East African cassava mosaic virus

East African cassava mosaic virus, Tanzania [Tanzania:Yellow vein]

EACMV-TZ[TZ:YV]

East African cassava mosaic virus, Kenya [Uganda:1997]

EACMV-KE[UG:97]

East African cassava mosaic virus, Uganda [Tanzania:10]

EACMV-UG[TZ:10]

East African cassava mosaic virus, Uganda [Uganda:Severe2:1997]

EACMV-UG[UG:Sev2:97]

East African cassava mosaic virus, Uganda [Kenya:Wote:K282:2002]

EACMV-UG[KE:Wot:K282:02]

The original virus isolate for the strain that induces very severe symptoms on cassava was found in Uganda, hence the descriptor “Uganda”. This was the second EACMV isolate from Uganda, hence the use of [Severe 2] as variant descriptor. Because recombination within the capsid protein sequence is associated with this phenotype, “Uganda Severe” becomes a label for this genotype. The severe strains found in Kenya and Tanzania were the first to be described in these countries. Because it is highly likely that many more isolates will be described in the future, it is advisable to use a more specific location rather than the country name to distinguish variants, such as “Wote” in the example above.

Species/virus name

Abbreviation

Mungbean yellow mosaic Indian virus

Mungbean yellow mosaic India virus [India:Varanasi:Dolichos]

MYMIV-[IN:Var:Dol]

Mungbean yellow mosaic India virus [Nepal:Lalitpur]

MYMIV-[NP:Lal]

Mungbean yellow mosaic India virus [Pakistan:106]

MYMIV-[PK:106]

Mungbean yellow mosaic India virus [Pakistan:130.12]

MYMIV-[PK:130.12]

Mungbean yellow mosaic India virus [Pakistan:130.7]

MYMIV-[PK:130.7]

Mungbean yellow mosaic India virus [Pakistan:14]

MYMIV-[PK:14]

Mungbean yellow mosaic India virus [Pakistan:Cowpea:2000]

MYMIV-[PK:Cp:00]

Mungbean yellow mosaic India virus [Pakistan:Islamabad:2000]

MYMIV-[PK:Isl:00]

As all of these MYMIV isolates exhibit approximately 95% identity, they should be considered variants of the same species, and consequently there is no need for a strain descriptor. Some of them originate from a different host than the original isolate, and induce very severe and recognizable symptoms in this host, hence the descriptor “Cowpea” and “Dolichos” for these isolates. They have been found in different places in Pakistan, Nepal and India; hence the host name has been qualified by the inclusion of country of origin to provide useful information, and an arbitrary distinguishing sample number has been added in some cases (130.12, 130.7, 14, etc.).

TYLCV was originally isolated in Israel, therefore the variant descriptor should be “Israel” or a more precise location. Because the other isolates listed here cluster with the so-called mild isolate (TYLCV-Mld[IL]) that also originated from Israel, they could adopt the “Mild” strain descriptor. Many of these isolates are from Japan and were distinguished either by a single location or by providing two locations when more than one isolate originated from the same district. This is commendable, and should set a precedent for naming TYLCV variants from Spain and Portugal.

Species/virus name

Abbreviation

Tomato yellow leaf curl virus

Tomato yellow leaf curl virus, Israel [Israel:Rehovot:1986]

TYLCV-IL[IL:Reo:86]

Tomato yellow leaf curl virus, Israel [Italy:Sicily:2004]

TYLCV-IL[IT:Sic:04]

Tomato yellow leaf curl virus, Israel [Japan:Haruno:2005]

TYLCV-IL[JR:Han:05]

Tomato yellow leaf curl virus, Israel [Japan:Misumi:Stellaria]

TYLCV-IL[JR:Mis:Ste]

Tomato yellow leaf curl virus, Mild [Israel:1993]

TYLCV-Mld[IL:93]

Tomato yellow leaf curl virus, Mild [Japan:Yaizu]

TYLCV-Mld[JR:Yai]

Tomato yellow leaf curl virus, Mild [Jordan:Cucumber:2005]

TYLCV-Mld[JO:Cuc:05]

Tomato yellow leaf curl virus, Mild [Jordan:Homra:2003]

TYLCV-Mld[JO:Hom:03]

Tomato yellow leaf curl virus, Mild [Jordan:Tomato:2005]

TYLCV-Mld[JO:Tom:05]

Tomato yellow leaf curl virus, Mild [Lebanon;LBA44:2005]

TYLCV-Mld[LB;LBA44:05]

Tomato yellow leaf curl virus, Mild [Portugal:2:1995]

TYLCV-Mld[PT:2:95]

Tomato yellow leaf curl virus, Mild [Reunion:2002]

TYLCV-Mld[RE:02]

Tomato yellow leaf curl virus, Mild [Spain:72:1997]

TYLCV-Mld[ES:72:97]

Tomato yellow leaf curl virus, Mild [Spain:Almeria:1999]

TYLCV-Mld[ES:Alm:99]

Future sample denomination

At the current rate of begomovirus isolation and determination of their complete genomic sequences (230 new isolates appeared during the year starting December 2005), we can predict the addition of hundreds of new virus isolates to the present list in the coming years. As a consequence, there is a growing need to establish a standardized and informative set of isolate descriptors. One possibility is to associate a sample with four descriptors: the original host, the original symptoms, the date of sampling and the GPS coordinates of the plant from which the sample was taken. With this basic information, one can precisely position the virus sample in space and time, and isolates could be mapped automatically. The date of the original sample is important for evolutionary and epidemiological purposes, and so far this is not recorded in sequence databases. Geographic Information Systems (GIS) is now routinely used for automated mapping, and many scientists have embraced this technology. Virologists should be encouraged to do the same, and both of these descriptors will eventually be adopted by NCBI and the other databases.

Conclusion

Virus taxonomy and nomenclature are scientific tools created by scientists to simplify the work of describing and discussing biological entities like viruses. One must not forget that these tools do not exist in nature, and scientists have developed them in the knowledge that they are the best descriptive tools available at any one time. During the past five years, virologists have improved immensely both the taxonomy and the nomenclature for geminiviruses. This is attested by the fact that similar abbreviations of names are largely clustered in the same groups of isolates in a phylogenetic tree built from complete sequences of their genomic components. From a total of 672 isolates, only two clusters show some slight overlap between the 200 demarcated species (TYLCV and HYVMV), a phenomenon that is readily explained by the presence of large recombinant fragments within the genomic components. This is a remarkable correlation in view of the huge number of recombination events that have apparently occurred between many geminiviruses. However, to progress further and cope with a steadily increasing number of virus isolates, we need to derive simple guidelines to enable a more uniform, coherent and informative set of descriptors to be established for strains and variants of geminiviruses. This will complement data of phylogenetic trees and distributions of percentages of pairwise comparisons based on full-length genomic sequences that remain excellent tools for strain and variant demarcation.