DEWE: A novel tool for executing differential expression RNA-Seq workflows in biomedical research
Graphical abstract
Introduction
Transcriptomics profiling aims to identify and quantify all transcripts present within a cell type or tissue at a particular state, and thus provides information on which genes are being expressed in precise experimental settings, differentiation or disease conditions. Such profiling is essential to understand how changes in gene expression relate to functional changes in the organism, as well as to provide insights into transcriptional regulation, signalling pathways and gene network organisation [1]. Traditional transcriptomic approaches were based on microarrays cDNA-DNA hybridisation, but high-throughput sequencing of mRNA (also called RNA-Seq) offers many advantages over hybridisation-based studies. Deep sequencing allows the identification and quantification of eventually all mRNA in the samples of the experiment with potentially high accuracy. Accuracy depends on the sequencing depth of a cell type at a specific condition, including small RNAs and other non-coding RNAs, such as micro-RNAs. The increase of sequencing coverage in new platforms and the introduction of depletion techniques have enabled dual RNA-Seq, i.e. to perform simultaneous transcriptomic studies in interacting organisms. For instance, it is now possible to characterise host-pathogen interactions in a single experiment [2]. Moreover, RNA-Seq can identify de novo transcripts as it is not dependent on previous probe design and synthesis [3].
The many advantages of RNA-Seq are partly possible due to the generation of an enormous number of raw sequencing reads, typically tens of millions for a standard experiment, which capture even low abundant transcripts. Consequently, the analysis of RNA-Seq data requires software specifically designed to handle huge amounts of data.
Over recent years, a number of data analysis methods and software tools were developed to support the different tasks generally included in RNA-Seq data analysis [4]. Typically, the main stages of a differential expression (DE) workflow include: (i) trimming reads and clipping adapters (e.g. using FastQC [5] or Trimmomatic [6]); (ii) reading alignments (e.g. using Bowtie2 [7] or HISAT2 [8]); (iii) transcript assembly and quantification (e.g. with StringTie [9], Cufflinks [10] or iReckon [11]); and, (iv) the DE analysis itself (e.g. supported by Ballgown [12], edgeR [13], DESeq [14], baySeq [15], or Cuffdiff [10]).
Existing software varies greatly in terms of the stages of analysis covered. Notably, some software combines several of the previous tools in order to implement complete workflows [[16], [17], [18]]. Moreover, since the installation, the configuration and the use of these tools are not always trivial, a variety of interfaces exists to help non-proficient end-users [19]. For example, easyRNASeq [20], Nextpresso [21], Galaxy for RNA-Seq [22], RNASeqGUI [23], RobiNA [24], RSeqFlow [25], and SePIA [26].
Despite these efforts, RNA-Seq interfaces are still affected by several technical difficulties [19]. Therefore, this work presents DEWE (Differential Expression Workflow Executor), a new RNA-Seq DE analysis tool that enables the execution of complete workflows by non-proficient users as well as analysis customisation by experienced bioinformaticians. DEWE runs inside a Docker container to expedite installation and configuration in the main operating systems, i.e. Windows, Mac OS X and Linux [[27], [28], [29]]. Likewise, DEWE interface was designed to minimise the software learning curve. Ultimately, the aim of DEWE is to allow less experienced users (in particular, biomedical and health researchers) to use known analysis workflows as a black box, while enabling more advanced users to customise existing workflows, or even build their own pipelines, according to particular needs and interests.
Section snippets
DEWE differential expression analysis workflows
DEWE offers built-in, easy-to-configure and well-consolidated workflows to conduct differential expression analyses as well as enables the execution of individual analysis steps. In particular, DEWE workflows entail the following steps (Fig. 1): (i) the creation of a reference index for the genome of interest, (ii) the alignment of reads to the reference index, (iii) transcript assembly and quantification, and (iv) the differential expression analysis itself. Noteworthy, DEWE workflows do not
Results
The motivation of DEWE is to equip users less proficient in bioinformatics with the means to execute differential expression analyses while enabling GUI-supported advanced customisation if desired. Therefore, among DEWE's main contributions, it is relevant to notice the out-of-the-box use of well-established and varied analysis tools, including the customised execution of individual tools as well as complete workflows, and the user-friendly management and visualisation of a large number of
Discussion
The first comparison of DEWE's design premises with those of similar purpose tools enabled the identification of key requirements in terms of software installation, configuration, usability and documentation. The software analysed were ArrayExpressHTS [39], easyRNASeq [20], Galaxy [22], PRADA [40], RNASeqGUI [23], and RobiNA [24].
Most of the DE tools are platform dependent, except for RobiNA. To overcome/minimise installation issues, DEWE provides all-in-one installers, i.e. the automatic
Conclusion
DEWE is a new RNA-Seq analysis tool specifically designed to allow users less proficient in bioinformatics to conduct differential expression analyses on their own, whereas enabling analysis customisation and software extension by more advanced users. DEWE offers out-of-the-box, easy-to-configure, and well-established analysis tools, including individual DE steps as well as complete workflows. DEWE's interface enables the user-friendly management of differential expression results, including
Summary
Transcriptomics profiling aims to identify and quantify all transcripts present within a cell type or tissue at a particular state, and thus provide information on the genes expressed in specific experimental settings, differentiation or disease conditions. RNA-Seq technology is becoming the standard approach for such studies, requiring specific analysis software that facilitate the work of laboratory scientists. Available tools are often hard to install, configure and use by users without
Authors’ contribution
ABM, AL, BS, and HLF conceived and designed the tool. ABM and HLF built and tested the tool. ABM, AL, BS, FFR and HLF drafted the manuscript. All authors read and approved the final version of the manuscript.
Conflicts of interest
Borja Sánchez is on the scientific board and is a co-founder of Microviable Therapeutics SL. The other authors do not have competing interests.
Acknowledgment
Authors are thankful to Noé Vázquez for his guidance on how to setup Xpra in the Docker image. SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from the University of Vigo for hosting its IT infrastructure. This work was supported by the Spanish “Programa Estatal de Investigación, Desarrollo e Inovación Orientada a los Retos de la Sociedad” (grant AGL2013-44039R); the Asociación Española Contra el Cancer (“Obtención de péptidos bioactivos contra el Cáncer Colo-Rectal
References (44)
- et al.
Transcriptional and post-transcriptional gene regulation by long non-coding RNA, genomics
Proteomics, Bioinf.
(2017) - et al.
AIBench: a rapid application development framework for translational research in biomedicine
Comput. Methods Progr. Biomed.
(2010) - et al.
BioAnnote: a software platform for annotating biomedical documents with application in medical learning environments
Comput. Methods Progr. Biomed.
(2013) - et al.
Enabling systematic, harmonised and large-scale biofilms data computation: the biofilms experiment workbench
Comput. Methods Progr. Biomed.
(2015) - et al.
S2P: a software tool to quickly carry out reproducible biomedical research projects involving 2D-gel and MALDI-TOF MS protein data
Comput. Methods Progr. Biomed.
(2018) - et al.
RNA sequencing and transcriptomal analysis of human monocyte to macrophage differentiation
Gene
(2013) - et al.
Temporal biological variability in dendritic cells and regulatory T cells in peripheral blood of healthy adults
J. Immunol. Methods
(2016) - et al.
Resolving host–pathogen interactions by dual RNA-seq
PLoS Pathog.
(2017) - et al.
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
Nat. Biotechnol.
(2010) - et al.
A survey of best practices for RNA-seq data analysis
Genome Biol.
(2016)
Trimmomatic: a flexible trimmer for Illumina sequence data
Bioinformatics
Fast gapped-read alignment with Bowtie 2
Nat. Methods
HISAT: a fast spliced aligner with low memory requirements
Nat. Methods
StringTie enables improved reconstruction of a transcriptome from RNA-seq reads
Nat. Biotechnol.
Differential analysis of gene regulation at transcript resolution with RNA-seq
Nat. Biotechnol.
iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data
Genome Res.
Ballgown bridges the gap between transcriptome assembly and expression analysis
Nat. Biotechnol.
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data
Bioinformatics
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
Genome Biol.
Generalized empirical Bayesian methods for discovery of differential data in high-throughput biology
Bioinformatics
Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown
Nat. Protoc.
Cited by (9)
METTL3 suppresses invasion of lung cancer via SH3BP5 m6A modification
2024, Archives of Biochemistry and BiophysicsHuman Cytomegalovirus-IE2 Affects Embryonic Liver Development and Survival in Transgenic Mouse
2022, Cellular and Molecular Gastroenterology and HepatologyCitation Excerpt :Then, the sequence quality was verified by FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). The reads were mapped to the reference genome of mouse (mm10) by hisat2 software (http://daehwankimlab.github.io/hisat2/) for similarity analysis.35 Quantitative analysis of the gene expression level was performed by feature count software and the DEGs were analyzed by edgeR software.
In silico and functional analyses of immunomodulatory peptides encrypted in the human gut metaproteome
2020, Journal of Functional FoodsCitation Excerpt :Roughly this represented about 2.5 × 109 clean bases after application of quality filtering suggested by Illumina. Filtered RNA data was exported in FASTq format and was used as input for DEWE (http://www.sing-group.org/dewe/) (López-Fernández, Blanco-Míguez, Fdez-Riverola, Sánchez, & Lourenço, 2019). Files corresponding to this study are available at the European Nucleotide Archive under accession PRJEB33568.
Genome Data Resources and Tools for Sequence Analysis
2023, Advances in Bioinformatics and Big Data AnalyticsGene Expression Tools from a Technical Perspective: Current Approaches and Alternative Solutions for the KnowSeq Suite
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)RNAdetector: a free user-friendly stand-alone and cloud-based system for RNA-Seq data analysis
2021, BMC Bioinformatics