DEWE: A novel tool for executing differential expression RNA-Seq workflows in biomedical research

https://doi.org/10.1016/j.compbiomed.2019.02.021Get rights and content

Highlights

  • DEWE is a multiplatform tool developed for RNA-Seq differential expression analyses.

  • DEWE is particularly designed for researchers without advanced bioinformatics skills.

  • DEWE offers built-in workflows to execute complete differential expression analyses.

  • DEWE interface allows the advanced management and visualisation of the results.

  • DEWE reduces the learning curve required to run a differential expression analysis.

Abstract

Background

Transcriptomics profiling aims to identify and quantify all transcripts present within a cell type or tissue at a particular state, and thus provide information on the genes expressed in specific experimental settings, differentiation or disease conditions. RNA-Seq technology is becoming the standard approach for such studies, but available analysis tools are often hard to install, configure and use by users without advanced bioinformatics skills.

Methods

Within reason, DEWE aims to make RNA-Seq analysis as easy for non-proficient users as for experienced bioinformaticians. DEWE supports two well-established and widely used differential expression analysis workflows: using Bowtie2 or HISAT2 for sequence alignment; and, both applying StringTie for quantification, and Ballgown and edgeR for differential expression analysis. Also, it enables the tailored execution of individual tools as well as helps with the management and visualisation of differential expression results.

Results

DEWE provides a user-friendly interface designed to reduce the learning curve of less knowledgeable users while enabling analysis customisation and software extension by advanced users. Docker technology helps overcome installation and configuration hurdles. In addition, DEWE produces high quality and publication-ready outputs in the form of tab-delimited files and figures, as well as helps researchers with further analyses, such as pathway enrichment analysis.

Conclusions

The abilities of DEWE are exemplified here by practical application to a comparative analysis of monocytes and monocyte-derived dendritic cells, a study of clinical relevance. DEWE installers and documentation are freely available at https://www.sing-group.org/dewe.

Introduction

Transcriptomics profiling aims to identify and quantify all transcripts present within a cell type or tissue at a particular state, and thus provides information on which genes are being expressed in precise experimental settings, differentiation or disease conditions. Such profiling is essential to understand how changes in gene expression relate to functional changes in the organism, as well as to provide insights into transcriptional regulation, signalling pathways and gene network organisation [1]. Traditional transcriptomic approaches were based on microarrays cDNA-DNA hybridisation, but high-throughput sequencing of mRNA (also called RNA-Seq) offers many advantages over hybridisation-based studies. Deep sequencing allows the identification and quantification of eventually all mRNA in the samples of the experiment with potentially high accuracy. Accuracy depends on the sequencing depth of a cell type at a specific condition, including small RNAs and other non-coding RNAs, such as micro-RNAs. The increase of sequencing coverage in new platforms and the introduction of depletion techniques have enabled dual RNA-Seq, i.e. to perform simultaneous transcriptomic studies in interacting organisms. For instance, it is now possible to characterise host-pathogen interactions in a single experiment [2]. Moreover, RNA-Seq can identify de novo transcripts as it is not dependent on previous probe design and synthesis [3].

The many advantages of RNA-Seq are partly possible due to the generation of an enormous number of raw sequencing reads, typically tens of millions for a standard experiment, which capture even low abundant transcripts. Consequently, the analysis of RNA-Seq data requires software specifically designed to handle huge amounts of data.

Over recent years, a number of data analysis methods and software tools were developed to support the different tasks generally included in RNA-Seq data analysis [4]. Typically, the main stages of a differential expression (DE) workflow include: (i) trimming reads and clipping adapters (e.g. using FastQC [5] or Trimmomatic [6]); (ii) reading alignments (e.g. using Bowtie2 [7] or HISAT2 [8]); (iii) transcript assembly and quantification (e.g. with StringTie [9], Cufflinks [10] or iReckon [11]); and, (iv) the DE analysis itself (e.g. supported by Ballgown [12], edgeR [13], DESeq [14], baySeq [15], or Cuffdiff [10]).

Existing software varies greatly in terms of the stages of analysis covered. Notably, some software combines several of the previous tools in order to implement complete workflows [[16], [17], [18]]. Moreover, since the installation, the configuration and the use of these tools are not always trivial, a variety of interfaces exists to help non-proficient end-users [19]. For example, easyRNASeq [20], Nextpresso [21], Galaxy for RNA-Seq [22], RNASeqGUI [23], RobiNA [24], RSeqFlow [25], and SePIA [26].

Despite these efforts, RNA-Seq interfaces are still affected by several technical difficulties [19]. Therefore, this work presents DEWE (Differential Expression Workflow Executor), a new RNA-Seq DE analysis tool that enables the execution of complete workflows by non-proficient users as well as analysis customisation by experienced bioinformaticians. DEWE runs inside a Docker container to expedite installation and configuration in the main operating systems, i.e. Windows, Mac OS X and Linux [[27], [28], [29]]. Likewise, DEWE interface was designed to minimise the software learning curve. Ultimately, the aim of DEWE is to allow less experienced users (in particular, biomedical and health researchers) to use known analysis workflows as a black box, while enabling more advanced users to customise existing workflows, or even build their own pipelines, according to particular needs and interests.

Section snippets

DEWE differential expression analysis workflows

DEWE offers built-in, easy-to-configure and well-consolidated workflows to conduct differential expression analyses as well as enables the execution of individual analysis steps. In particular, DEWE workflows entail the following steps (Fig. 1): (i) the creation of a reference index for the genome of interest, (ii) the alignment of reads to the reference index, (iii) transcript assembly and quantification, and (iv) the differential expression analysis itself. Noteworthy, DEWE workflows do not

Results

The motivation of DEWE is to equip users less proficient in bioinformatics with the means to execute differential expression analyses while enabling GUI-supported advanced customisation if desired. Therefore, among DEWE's main contributions, it is relevant to notice the out-of-the-box use of well-established and varied analysis tools, including the customised execution of individual tools as well as complete workflows, and the user-friendly management and visualisation of a large number of

Discussion

The first comparison of DEWE's design premises with those of similar purpose tools enabled the identification of key requirements in terms of software installation, configuration, usability and documentation. The software analysed were ArrayExpressHTS [39], easyRNASeq [20], Galaxy [22], PRADA [40], RNASeqGUI [23], and RobiNA [24].

Most of the DE tools are platform dependent, except for RobiNA. To overcome/minimise installation issues, DEWE provides all-in-one installers, i.e. the automatic

Conclusion

DEWE is a new RNA-Seq analysis tool specifically designed to allow users less proficient in bioinformatics to conduct differential expression analyses on their own, whereas enabling analysis customisation and software extension by more advanced users. DEWE offers out-of-the-box, easy-to-configure, and well-established analysis tools, including individual DE steps as well as complete workflows. DEWE's interface enables the user-friendly management of differential expression results, including

Summary

Transcriptomics profiling aims to identify and quantify all transcripts present within a cell type or tissue at a particular state, and thus provide information on the genes expressed in specific experimental settings, differentiation or disease conditions. RNA-Seq technology is becoming the standard approach for such studies, requiring specific analysis software that facilitate the work of laboratory scientists. Available tools are often hard to install, configure and use by users without

Authors’ contribution

ABM, AL, BS, and HLF conceived and designed the tool. ABM and HLF built and tested the tool. ABM, AL, BS, FFR and HLF drafted the manuscript. All authors read and approved the final version of the manuscript.

Conflicts of interest

Borja Sánchez is on the scientific board and is a co-founder of Microviable Therapeutics SL. The other authors do not have competing interests.

Acknowledgment

Authors are thankful to Noé Vázquez for his guidance on how to setup Xpra in the Docker image. SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from the University of Vigo for hosting its IT infrastructure. This work was supported by the Spanish “Programa Estatal de Investigación, Desarrollo e Inovación Orientada a los Retos de la Sociedad” (grant AGL2013-44039R); the Asociación Española Contra el Cancer (“Obtención de péptidos bioactivos contra el Cáncer Colo-Rectal

References (44)

  • S. Andrews, FastQC: a quality control tool for high throughput sequence data, (n.d.)....
  • A.M. Bolger et al.

    Trimmomatic: a flexible trimmer for Illumina sequence data

    Bioinformatics

    (2014)
  • B. Langmead et al.

    Fast gapped-read alignment with Bowtie 2

    Nat. Methods

    (2012)
  • D. Kim et al.

    HISAT: a fast spliced aligner with low memory requirements

    Nat. Methods

    (2015)
  • M. Pertea et al.

    StringTie enables improved reconstruction of a transcriptome from RNA-seq reads

    Nat. Biotechnol.

    (2015)
  • C. Trapnell et al.

    Differential analysis of gene regulation at transcript resolution with RNA-seq

    Nat. Biotechnol.

    (2012)
  • A.M. Mezlini et al.

    iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data

    Genome Res.

    (2013)
  • A.C. Frazee et al.

    Ballgown bridges the gap between transcriptome assembly and expression analysis

    Nat. Biotechnol.

    (2015)
  • M.D. Robinson et al.

    edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

    Bioinformatics

    (2010)
  • M.I. Love et al.

    Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

    Genome Biol.

    (2014)
  • T.J. Hardcastle

    Generalized empirical Bayesian methods for discovery of differential data in high-throughput biology

    Bioinformatics

    (2016)
  • M. Pertea et al.

    Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

    Nat. Protoc.

    (2016)
  • Cited by (9)

    • Human Cytomegalovirus-IE2 Affects Embryonic Liver Development and Survival in Transgenic Mouse

      2022, Cellular and Molecular Gastroenterology and Hepatology
      Citation Excerpt :

      Then, the sequence quality was verified by FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). The reads were mapped to the reference genome of mouse (mm10) by hisat2 software (http://daehwankimlab.github.io/hisat2/) for similarity analysis.35 Quantitative analysis of the gene expression level was performed by feature count software and the DEGs were analyzed by edgeR software.

    • In silico and functional analyses of immunomodulatory peptides encrypted in the human gut metaproteome

      2020, Journal of Functional Foods
      Citation Excerpt :

      Roughly this represented about 2.5 × 109 clean bases after application of quality filtering suggested by Illumina. Filtered RNA data was exported in FASTq format and was used as input for DEWE (http://www.sing-group.org/dewe/) (López-Fernández, Blanco-Míguez, Fdez-Riverola, Sánchez, & Lourenço, 2019). Files corresponding to this study are available at the European Nucleotide Archive under accession PRJEB33568.

    • Genome Data Resources and Tools for Sequence Analysis

      2023, Advances in Bioinformatics and Big Data Analytics
    • Gene Expression Tools from a Technical Perspective: Current Approaches and Alternative Solutions for the KnowSeq Suite

      2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    View all citing articles on Scopus
    View full text