1 INTRODUCTION

Six human coronaviruses had been identified before the emergence of SARS-CoV-2 (Raza et al., 2020; Wang et al., 2020) SARS-CoV-2 caused COVID-19 infection which is a more pathogenic form in comparison to the earlier identified SARS-CoV (2002) and Middle East respiratory syndrome coronavirus (MERS-CoV, 2013) (Naqvi et al., 2020). CoVs is enveloped, single strand and positive-sense RNA virus, have the largest RNA viral genome, (Lokman et al., 2020; Lu et al., 2020; Naqvi et al., 2020). The genome size of the SARS-CoV-2 varies from 29.8kb to 29.9 kb and followed the same pattern of gene characteristics of known CoVs (Khailany et al., 2020). The first study of full length genomic sequence of SARS-CoV-2 was done in China by Yongzhen Zhang team (Wu et al., 2020); it revealed that SARS-CoV-2 encodes 27 proteins from 14 ORFs including 15 non-structural, 4 major structural and 8 accessory protein. Spike glycoprotein (S), membrane (M), envelope (E) and nucleocapsid (N) are the four major structural proteins of SARS-CoV-2 (Lokman et al., 2020;Naqvi et al., 2020; Wang et al., 2020), the products of these structural genes play important roles in viral pathogenicity (Lokman et al., 2020). The accessory proteins are encoded by ORF8, ORF7a, ORF7b, ORF6 and ORF3a genes (Khailany et al., 2020).

However, RNA viruses tend to harbor error prone RNA dependent RNA polymerases which make the occurrence of mutations and recombination events rather frequent, this might play a role in the evolution of SARS-CoV-2. A recent study using phylogenetic network analysis has shown that the virus appears to be evolving into three distinct clusters; A and C being found mostly in Europe and America along while B being most common type in East Asia (Uddin et al., 2020). The study on genomic variation of SARS-CoV-2 is very important to analyze the disease course, pathogenesis, diagnosis, treatment and prevention (Wang et al., 2020), moreover it gives insights into the pattern of spread, genetic diversity and the dynamics of evolution (Khailany et al., 2020).). The present study investigated the molecular variations of Telangana CoVID-19 variant MT577016.

2 METHODS

This study is an in Silico case report study; it is part of a project aimed to study the genetic variations of SARS-CoV-2 isolated in India; the study covered published sequences over a period of six months from March 2020 to September 2020. The genome sequences of SARS-CoV-2 were obtained from National Center for Biotechnology Information (NCBI) Virus Variation Resource repository (https://www.ncbi.nlm.nih.gov/ genbank/sars-cov-2-seqs/) and these strains had been determined according to the time of isolation and their origin. The sequence of SARS-CoV-2 accession number NC_045512 which represented the original Wuhan strain was used as the standard sequence. Genome sequences of SARS-CoV-2 was aligned against the Wuhan strain. The study was done using NCBI Nucleo-BLAST (NCBI) and variations of the nucleotides and proteins were stated. During collection of the data for this project; the sequence of SARS-CoV-2 accession number MT577016 showed very low homology when compared with Wuhan reference strain, other variants isolated from the same geographical area (India) and global variants therefore the changes in the nucleotides and mutations of this genome was determined.

Evolutionary relationships: the evolutionary relationship was inferred using the Neighbor-Joining method (Saitou and Nei, 1987). The optimal tree is shown. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) is shown next to the branches (Felsenstein, 1985). The evolutionary distances were computed using the Maximum Composite Likelihood method (Tamura et al., 2004) and are in the units of the number of base substitutions per site. This analysis involved 30 nucleotide sequences of SARS-2. All ambiguous positions were removed for each sequence pair (pairwise deletion option). There were a total of 29903 positions in the final dataset. Evolutionary analyses were conducted in MEGA X (Kumar et al., 2018).

3 RESULTS

The SARS-CoV-2 accession number MT577016 which is the subject of this study was submitted by Yadav et al., 2020 on 14 March 2020 to Maximum Containment Laboratory, National Institute of Virology, Pashan, Pune, Maharashtra (India) and it got the accession number on 21 September 2020 (https://www.ncbi.nlm.nih.gov/ nuccore/MT577016.1?report=GenBank). This isolate was named Telangana CoVID-19 variant in this study, as it was isolated in Telangana (India).

All the sequences of SARS-CoV-2 isolated in India showed high similarity (99.95–99.99%) to Wuhan strain NC_045512 except Telangana CoVID-19 variant which showed 98.75% homology. The Telangana CoVID-19 variant showed 98.75% homology compared to the Wuhan strain NC_045512. The analysis identified 301 nucleotide changes; most frequent nucleotide changes were T: A 120(39.9%), followed by C:A 62(20.6%) and T: G 53(17.6%) whereas the less frequent nucleotide changes showed in G:T and C:T (Fig. 1). These nucleotide change correspond to 258 different mutations; most of them 80.6% (208/258) were missense point mutations (Figs. 2a, 2b), followed by 17.1% (44/258) silent point mutations. The majority of these mutations 76.4% (197/258) occurred in the open reading frame 1 a/b (ORF1 a/b) while the ORF3a showed 4.7% (12/258) and few mutations 1.6% (4/258) occurred in 3′UTR terminal loop. The critical mutations occurred in viral structural genes; 16.7% (43/258) mutations reported in S gene; 8 were silent and 34 missense. One missense mutation was observed in E gene, whereas no mutation was detected in membrane (M) gene and nucleocapside (N). Table 1 showed the mutations associated with the Telangana variant MT577016. As presented in Figs. 2a, 2b, the majority of the missense mutations that resulted in the change in codons are associated with the ORF1ab and relate to the changes in the corresponding amino acids encoded for several proteins such as leader protein, 3C-like proteinase, RNA-dependent RNA polymerase, helicase, 3'-5' exonulease, endoRNAse, 2'-O-ribose methyl transferase.

Fig. 1.
figure 1

Showed the frequency and types of nucleotide changes.

Fig. 2.
figure 2

(a) Showed the landscape (ORF1ab) of Telangana CoVID-19 (MT577016) genome representing amino acid changes. The missense mutations (*) resulting into corresponding amino acid changes (**) in respective protein as marked. (b) Showed the landscape (S gene, ORF3a and E gene) of Telangana CoVID-19 (MT577016) genome representing amino acid changes. The missense mutations (*) resulting into corresponding amino acid changes (**) in respective protein as marked.

Table 1.   Distribution and site of mutations among Telangana variant MT577016

The Phylogenetic tree showed that Telangana CoVID-19 variant was located in separate node whereas all other SARS-CoV-2 variants clustered together closely in comparative analysis with Indian (Fig. 3a) and global variants (Fig. 3b), the accession number of each sequence was mentioned. Number at nodes indicates percent bootstrap value above 50 supported by more than 1000 replicates. The bar indicates the Jukes-Cantor evolutionary distance.

Fig. 3.
figure 3

(a) Showed the phylogenetic affiliation of Telangana CoVID-19 (MT577016) variant and sequences of closest phylogenetic neighbors which were retrieved from India. (b) Showed the phylogenetic affiliation of Telangana CoVID-19 (MT577016) variant and sequences of closest phylogenetic neighbors which were retrieved from different countries.

4 DISCUSSION

SARS-CoV-2, like other coronaviruses, contains a nonstructural gene with proofreading activity (Deng et al., 2020). As a typical RNA virus the average evolutionary rate for Co-Vs could be 10-4 substitute per bp per year during each replication cycle (Ahmed-Abakura, 2020; Wang et al., 2020)However, knowledge of genomic interindividual variability and genetic variations could explain the discrepancies of spread, severity, and mortality of COVID-19 (Ahmed-Abakur and Alnour, 2020; Wang et al., 2020). As reported the present study determined the comparative genomic variations associated with the Telangana CoVID-19 variant.

The genomic sequences of SARS-CoV-2 had been released by the worldwide scientific community in the last months to understand the molecular characteristics and evolutionary origin of this virus. Several authors determined the genetic variations of SARS- Co-19, and contrary to our findings all of them reported high homology. The Telangana CoVID-19 variant showed the lowest homology 98.75% compared to the published data and revealed high incidence mutations (302 nucleotides were changed and 258 different mutations were detected). Khailany et al., 2020 analyzed 95 SARS-CoV-2 complete genomes and reported only 116 mutations. While among 66 SARS-CoV-2 variants Ahmed-Abakur and Alnour 2020 reported 143 mutations and showed one variant as hyper mutated variant as it displayed 28 different point mutations. Further, Lu et al., 2020 studied 10 genome sequences of 2019-nCoV and showed more than 99·98% homology whereas Wang et al., 2020 analyzed 95 full-length genomic sequences of SARS-CoV-2 and reported that the homology among different isolates was extremely high (99.91%-100%) at the nucleotide level. In addition, Ceraolo and Giorgi, 2020 studied 56 SARS-CoV-2 genomes and showed high level of conservation (>99% sequence identity) and pointed only two core positions of high variability, one a silent variant in the ORF1ab and the other in ORF8 which resulted in two variants, ORF8-L and ORF8-S. Recently new variant of SARS-CoV-2 (VUI 202012/01) characterized with multiple spike protein mutations has been identified in United Kingdom that lead to huge increase in COVID-19 cases (with an estimated rise in reproductive number (R) by 0.4) (European Centre for Disease Prevention and Control, 2020). Thus the Telangana Covid 19 variant could be considered unique variant of SARS-2 based on sequence identity, frequency and occurrence of mutations, particularly as it displayed high incidence of mutations on structural genes which may affect the structural of the structural proteins and subsequently the pathogenicity. However, the emergence of the new variant could be attributed to; prolonged infection which may lead to accumulation of immune escape mutations at high rate, virus adaptation processes that occurs in susceptible animal species and then could transmitted back to humans (European Centre for Disease Prevention and Control, 2020).

Our study indicated that most of mutation occurred on ORF1 followed by S gene; these findings were in alignment with many reports in term of site of mutations (Ahmed-Abakur and Alnour, 2020; Khailany et al., 2020). However spike glycoprotein play important role during the entry of coronaviruses into host cells (Chang et al., 2012; Lokman et al., 2020; van Pesch et al., 2020; Shu and Gong, 2016), therefore any mutation in this gene might change the pattern of pathogenicity. It is understood that the hotspot mutations have the ability to cause changes in the amino acid sequences (Wang et al., 2020), resulting in significant changes in the stability, favoring various interactions, and conformational diversity (40. Our finding agreed with Khailany et al. 2020 where they did not detected mutations in N and M genes and disagreed with Wang et al., 2020 who stated that SARS-COV-2 is relatively conserved, especially in the E gene.

5 CONCLUSIONS

This study presented evidence of a reportedly existing SARS-CoV-2 variant, which has shown an extraordinary ability to mutate even at the structural genes. Further studies are required to elucidate the exact role of genetic variations in SARS-CoV-2 that has challenged the human race, unprecedented in the recent history.