ReviewHigh-throughput sequencing (HTS) for the analysis of viral populations
Introduction
Viruses are probably the most abundant organisms on Earth and major drivers of evolution at all levels of organization and time-scales (Zhang et al., 2018). They also represent the most diverse types of genome organization, which are used to establish the seven higher categories of the current classification of viruses (http://www.ictvonline.org/virustaxonomy.asp). This variety of genomes not only speaks about their different evolutionary origin, but also about the ways they interact with their hosts. Being obligate parasites of cellular organisms, they have adapted to many different life-styles and survival strategies, often leading to very specific changes and traits. However, despite their variety, viruses usually attract the interest of researchers, clinicians, public health officials and the general population because of the serious problems they can originate when infecting a new species or when a new viral strain expands among individuals of a former host, overcoming their intrinsic or previously effective defenses, and causing disease. In fact, viruses represent the most common agents responsible for emerging and re-emerging infectious diseases.
Despite their apparent genomic simplicity, viruses encode all the necessary features to successfully complete their life-cycle. Many of these depend on interactions with their hosts and all are based on genetic differences that can be passed onto the usually large offspring produced by a single virus. Their high population sizes along with their mutation rates (Sanjuan et al., 2010), usually several orders of magnitude larger than those of cellular organisms, create the ideal conditions for making viruses exceptional “evolutionary machines”, capable of exploiting every minute genetic variation.
The relevance of analyzing the genetic variability of viruses has been evident even before the advent of sequencing techniques. The first genome sequence was that of bacteriophage ΦX174 (Sanger et al., 1977), almost two decades earlier than that of a cellular organism, the bacteria Haemophilus influenza (Fleischmann et al., 1995). Virologists, molecular biologists, infectious disease specialists, and many other researchers have used a wide range of molecular methodologies to learn about the genetic differences and properties of infecting viral populations. One frequent problem they have had to face is the extraordinary variability found in many of those populations, especially for viruses with a single-stranded RNA genome (Moya et al., 2004). The detailed analysis of the genetic variation in a viral population was out of reach for most studies and only “average” or “consensus” sequences were usually obtained.
The recent development of high-throughput sequencing (HTS) methods (Loman et al., 2015) along with their increased precision and lower costs are changing the focus from the Sanger-method sequencing to HTS as the standard technique for obtaining viral sequences (Goodwin et al., 2016). As we will address below, this also means that new concepts and analytical methods have to be applied to accommodate the differences between the results produced by the new and the previous techniques and how they inform us about the genetic composition of the analyzed populations. However, current HTS technologies still present relatively high sequencing errors , from about 0.1% in Illumina (Goodwin et al., 2016) up to 12.7% in MinION ONT (Bowden et al., 2019) which, although possibly assumable in some disciplines, may not provide the required accuracy in others, such as the design of antiviral therapies (Del Campo et al., 2018) or the quality standards required in forensic genetics (Arenas et al., 2017; Budowle et al., 2014).
Here, we provide a general overview of techniques, concepts, analytical methods and several applications of HTS to study viral populations. We will focus more in exposing the range of tools or concepts than in detailing their underlying theoretical or algorithmic bases. More detailed reviews of the major topics covered in this work have been published already (Table 1) and the interested reader is referred to those and the references cited herein for additional details on these topics.
Section snippets
HTS technologies and their application to the analyses of viral populations
The development of DNA sequencing by chain termination, commonly referred to as Sanger sequencing, revolutionized biological research (Sanger et al., 1977). This methodology coupled with the development of automated DNA sequencers in the mid 1990s allowed labs to sequence large numbers of genes and whole genomes, culminating with the sequencing of the human genome and the beginning of the sequencing era (Liesegang, 2001; Consortium, International Human Genome Sequencing, and International Human
Overview of major methods for the analysis of HTS data
There are four major types of analysis of HTS results from viral populations: assembly of new genomes, mapping of reads to a reference sequence (or resequencing), the study of viral sequences included in metagenomics data, and the analysis, usually detection, of specific or rare variants. There are many programs for the general analysis of these different applications that can be applied to HTS data of virus samples. However, most of the algorithms implemented in these tools were originally
Virus population genetics and genomics with HTS data
Most HTS technologies provide large amounts of information about the within-sample variability. For viruses, especially RNA viruses, this opens the possibility of characterizing the viral population not just by consensus or master sequences - which can be reconstructed from the raw data - but in terms of the actual variability present in the sample. Naturally, this requires the use of concepts and methods derived from population genetics, which summarize the relevant parameters governing and
Molecular evolutionary analysis from HTS data
In this section, we describe the application of HTS data to the evolutionary analysis at the molecular level, including the estimation of substitution and recombination rates, signatures of selection, and genome-wide evolutionary histories. Clearly, genetic analyses based on large amounts of data (i.e., genome-wide) can provide high statistical confidence. However, systematic biases in the methods applied to analyze such large datasets can provide precise but inaccurate results (S. Kumar et
Analysis of environmental and engineered samples. Metaviromes
After consideration of the methodological and analytical issues and tools for the study of viral populations, the next sections will deal with some of the most frequent applications of HTS for this kind of studies. Viral metagenomics, or viromics, comprises the study of viral genome sequences (Angly et al., 2006; Edwards and Rohwer, 2005), or metavirome, from resident ecological communities adapted to a specific ecosystem type or biome. Ecosystems and their derived biosamples are classified
Major applications of HTS in the viral analysis of clinical samples
HTS has opened new opportunities in different areas of clinical virology (Capobianchi et al., 2013). The introduction of NGS represented a great boost in four main fields at the interface between clinical setting and applied research. These are (i) the diagnoses of new infectious agents involved in clinical syndromes caused by multiple agents or a combination of them; (ii) the impact of minority variants involved in antiviral resistance; (iii) the study of intra-host evolutionary dynamics, and
The application of HTS in the analysis of virus transmission clusters and outbreaks
The fast evolutionary rate of RNA viruses and the serious health issues for people infected by many of them have enabled the establishment of a new epidemiological approach, the analysis of transmission clusters and chains through the comparison of sequence information obtained from viruses infecting patients and their potential sources. This strategy is becoming a new standard for surveillance of general populations and specific groups (Agoti et al., 2019; Alamil et al., 2019), and it has also
Challenges and future directions
Over the last few years we have witnessed a transition from Sanger sequencing to High-Throughput Sequencing in the field of viral genomics. HTS can capture much more viral diversity than any other technique and generate large amounts of sequences for population analyses at affordable costs. But HTS also comes with some technological caveats (e.g., high sequencing errors, heterogeneous coverage) and bioinformatic challenges (e.g., genome assembly) that still need to be solved. Given the limited
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
We thank Michel Tibayrenc, Editor-in-Chief of the journal, for the invitation to contribute with this article. MPL was partially funded the Milken Institute School of Public Health Pilot Fund Program, the Margaret Q. Landenberger Research Foundation and the Fundação para a Ciência e a Tecnologia (T495756868-00032862). MA is funded by grants “RYC-2015-18241” from MICIU (Spanish Government) and “ED431F 2018/08” from the Xunta de Galicia. FGC, NGG, MAB and JH are funded by projects BFU2017-89594R
References (351)
- et al.
Genomic characterization of hepatitis C virus transmitted founder variants with deep sequencing
Infect. Genet. Evol.
(2019) - et al.
Multiplexed next-generation sequencing and de novo assembly to obtain near full-length HIV-1 genome from plasma virus
J. Virol. Methods
(2016) - et al.
Influence of mutation and recombination on HIV-1 in vitro fitness recovery
Mol. Phylogenet. Evol.
(2016) - et al.
Mutation and recombination in pathogen evolution: relevance, methods and controversies
Infect. Genet. Evol.
(2018) - et al.
Current Next Generation Sequencing technology may not meet forensic standards
Forensic Sci. Int. Genet.
(2012) - et al.
Intra-host viral variability in children clinically infected with H1N1 (2009) pandemic influenza
Infect. Genet. Evol.
(2015) - et al.
Next-generation sequencing technology in clinical virology
Clin. Microbiol. Infect.
(2013) - et al.
Phylogenetic analysis of an epidemic outbreak of acute hepatitis C in HIV-infected patients by ultra-deep pyrosequencing
J. Clin. Virol.
(2017) - et al.
Heterogeneous recombination among hepatitis B virus genotypes
Infect. Genet. Evol.
(2017) - et al.
From clinical sample to complete genome: comparing methods for the extraction of HIV-1 RNA for high-throughput deep sequencing
Virus Res.
(2017)
Metagenomics sheds light on the ecology of marine microbes and their viruses
Trends Microbiol.
Gene tree discordance, phylogenetic inference and the multispecies coalescent
Trends Ecol. Evol.
Hepatitis C virus deep sequencing for sub-genotype identification in mixed infections: a real-life experience
Int. J. Infect. Dis.
Application of next generation sequencing in clinical microbiology and infection prevention
J. Biotechnol.
Detection of quasispecies variants predicted to use CXCR4 by ultra-deep pyrosequencing during early HIV infection
AIDS.
Phylogenetic analysis as a forensic tool in HIV transmission investigations
AIDS
Metagenomic analysis of the viral community in Namib Desert hypoliths
Environ. Microbiol.
Environmental drivers of viral community composition in antarctic soils identified by viromics
Microbiome.
Transmission patterns and evolution of respiratory syncytial virus in a community outbreak identified by genomic analysis
Virus Evol.
Genomic analysis of respiratory syncytial virus infections in households and utility in inferring who infects the infant
Sci. Rep.
HapCompass: a fast cycle basis algorithm for accurate haplotype assembly of sequence data
J. Comput. Biol.
Haplotype assembly in polyploid genomes and identical by descent shared tracts
Bioinformatics
Inferring epidemiological links from deep sequencing data: a statistical learning approach for human, animal and plant diseases
Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci.
RNA and DNA Sanger sequencing versus next-generation sequencing for HIV-1 drug resistance testing in treatment-Naive patients
J. Antimicrob. Chemother.
Stochastic interplay between mutation and recombination during the acquisition of drug resistance mutations in human immunodeficiency virus type 1
J. Virol.
Evolutionary strategies of viruses, bacteria and archaea in hydrothermal vent ecosystems revealed through metagenomics
PLoS One
Virus population dynamics and acquired virus resistance in natural microbial communities
Science
The marine viromes of four oceanic regions
PLoS Biol.
Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites
Genetics
State-of the art methodologies dictate new standards for phylogenetic analysis
BMC Evol. Biol.
PipeCraft: flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data
Mol. Ecol. Resour.
Genome-wide heterogeneity of nucleotide substitution model fit
Genome Biol. Evol.
Identifying the important HIV-1 recombination breakpoints
PLoS Comput. Biol.
The importance and application of the ancestral recombination graph
Front. Genet.
Advances in computer simulation of genome evolution: toward more realistic evolutionary genomics analysis by approximate Bayesian computation
J. Mol. Evol.
Trends in substitution models of molecular evolution
Front. Genet.
Applications of the coalescent for the evolutionary analysis of genetic data
Encycl. Bioinforma. Comput. Biol.
The effect of recombination on the reconstruction of ancestral sequences
Genetics.
The influence of Re combination on the estimation of selection from coding sequence alignments
Nat. Sel.
Forensic genetics and genomics: much more than just a human affair
PLoS Genet.
Hospital outbreak of middle east respiratory syndrome coronavirus
N. Engl. J. Med.
De novo assembly of viral quasispecies using overlap graphs
Genome Res.
Emerging concepts of data integration in pathogen phylodynamics
Syst. Biol.
Evolutionary dynamics of local pandemic H1N1/2009 influenza virus lineages revealed by whole-genome analysis
J. Virol.
SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing
J. Comput. Biol.
Detection of HIV transmission clusters from phylogenetic trees using a multi-state birth-death model
J. R Soc. Interface/R Soc.
The Bayesian revolution in genetics
Nat. Rev. Genet.
Challenges and opportunities in estimating viral genetic diversity from next-generation sequencing data
Front. Microbiol.
Metagenomic characterization of Chesapeake Bay Virioplankton
Appl. Environ. Microbiol.
A pan-HIV strategy for complete genome sequencing
J. Clin. Microbiol.
Cited by (31)
Rapid genotype recognition of human adenovirus based on surface-enhanced Raman scattering combined with machine learning
2024, Sensors and Actuators B: ChemicalDiscovery of Virus-Host interactions using bioinformatic tools
2022, Methods in Cell BiologyCitation Excerpt :Nowadays, development of sequencing techniques opened a new world in viral genome identification (Seto et al., 2011). Several experimental technologies, such as Ribonucleic Acid (RNA) sequencing (Depledge, Mohr, & Wilson, 2019), Deoxyribonucleic Acid (DNA) sequencing (França, Carrilho, & Kist, 2002), Next-Generation Sequencing (NGS) (Barzon, Lavezzo, Militello, Toppo, & Palù, 2011; Capobianchi, Giombini, & Rozera, 2013), High-Throughput-NGS (HT-NGS) (Pérez-Losada et al., 2020), Mass Spectrometry (MS) (Buchberger, DeLaney, Johnson, & Li, 2018) and imaging allowed the characterization of a high number of viral genomes, including whole genome sequencing, viral Open Reading Frames (ORF), gene functions and phylogeny proteins (Pappas et al., 2021). In fact, these High-Throughput Screening (HTS) technologies used in the expansion of omics data continues to produce large amounts of information from different populations and cell types for a variety of infectious diseases (Pappas et al., 2021; Schneider & Orchard, 2011).
Fungal communities in Nelumbinis semen characterized by high-throughput sequencing
2021, International Journal of Food MicrobiologyCitation Excerpt :High-throughput sequencing (HTS), also known as next-generation sequencing technique, comes along with them. It has become one of the most crucial candidates to analyze the structure and composition of fungal community in the environmental and microbial fields, displaying outstanding advantages, such as increased precision, high sequencing efficiency, and low cost (Pérez-Losada et al., 2020; Tang et al., 2018). As a culture-independent central molecular tool, it can quickly and efficiently produce massive reads, which are beneficial for many practical applications (de Carvalho et al., 2019; Ma et al., 2021; Papademas et al., 2021).
Application of high-throughput sequencing technology in HIV drug resistance detection
2021, Biosafety and HealthCitation Excerpt :Nanopore sequencing technology is a new type of sequencing technology developed by Oxford NanoPore Technology (ONT), which belongs to the same category as SMRT. The key difference appears that ONT uses α-hemolysin as the nanochannel [47]. However, α-hemolysin has a small diameter, only allowing a single nucleic acid polymer to pass through.
The role of mobile genetic elements in organic micropollutant degradation during biological wastewater treatment
2020, Water Research XCitation Excerpt :The study of viral communities in WWTPs has been limited due to the low percentage of host bacteria that can be cultured in the laboratory. However, recent advances in high-throughput sequencing technologies have enabled researchers to sequence the whole viral metagenome in several samples (Edwards and Rohwer, 2005; Pérez-Losada et al., 2020). Genes identified in the phage metagenomes of several WWTPs and other environments such as the mouse gut include ARGs and 16S rRNA genes from Firmicutes, Proteobacteria, Bacteroidetes, and Actinobacteria (Parsley et al., 2010a,b; Del Casale et al., 2011; Modi et al., 2013).
Evaluation of haplotype callers for next-generation sequencing of viruses
2020, Infection, Genetics and EvolutionCitation Excerpt :Next-generation sequencing (NGS) technologies provide novel opportunities to study the evolution of many viruses that impose health issues among humans, such as human immunodeficiency virus (HIV), hepatitis C virus (HCV), human papillomavirus (HPV), and influenza (Pérez-Losada et al., 2020).