S2P: A software tool to quickly carry out reproducible biomedical research projects involving 2D-gel and MALDI-TOF MS protein data

https://doi.org/10.1016/j.cmpb.2017.11.024Get rights and content

Highlights

  • S2P, a multiplatform application for the integrated analysis of 2D-gel and MALDI-based mass spectrometry protein data is presented.

  • Our software provides different functionalities to process 2D-gel and MALDI-mass spectrometry protein identification-based data in a computer-aided, reproducible manner.

  • S2P incorporates a user-friendly, intuitive graphical user interface (GUI), designed for biomedical researchers without advanced bioinformatics skills.

  • S2P reduces the time that researchers need to invest in order to prepare data for subsequent analysis, such as discovering potential biomarkers.

  • Different case studies demonstrate the usefulness of S2P.

Abstract

Background and objective

2D-gel electrophoresis is widely used in combination with MALDI-TOF mass spectrometry in order to analyze the proteome of biological samples. For instance, it can be used to discover proteins that are differentially expressed between two groups (e.g. two disease conditions, case vs. control, etc.) thus obtaining a set of potential biomarkers. This procedure requires a great deal of data processing in order to prepare data for analysis or to merge and integrate data from different sources. This kind of work is usually done manually (e.g. copying and pasting data into spreadsheet files), which is highly time consuming and distracts the researcher from other important, core tasks. Moreover, engaging in a repetitive process in a non-automated, handling-based manner is prone to error, thus threatening reliability and reproducibility. The objective of this paper is to present S2P, an open source software to overcome these drawbacks.

Methods

S2P is implemented in Java on top of the AIBench framework, and relies on well-established open source libraries to accomplish different tasks.

Results

S2P is an AIBench based desktop multiplatform application, specifically aimed to process 2D-gel and MALDI-mass spectrometry protein identification-based data in a computer-aided, reproducible manner. Different case studies are presented in order to show the usefulness of S2P.

Conclusions

S2P is open source and free to all users at http://www.sing-group.org/s2p. Through its user-friendly GUI interface, S2P dramatically reduces the time that researchers need to invest in order to prepare data for analysis.

Introduction

2D-gel electrophoresis and mass spectrometry using matrix assisted laser desorption ionization coupled to time of flight analyzers (MALDI-TOF-MS), are widely used in conjunction in order to perform proteome analysis [1], [2]. In brief, while the comparison of 2D-gels allows obtaining a set of differentially expressed spots, MALDI-TOF-MS allows identifying the proteins separated in such spots.

The scientific community is particularly interested in the challenging task of finding proteins that can be used to differentiate different conditions of health with the aim to aid in the diagnosis, prognosis and development of new targeted therapies [3], [4], [5]. In order to find such proteins, known as biomarkers, a typical experimental workflow combining 2D-gel and MALDI-TOF-MS can involve the following steps: (i) separating the proteins present in a complex proteome; (ii) comparing the 2D-gels across samples to obtain the spots that were found expressed differentially; (iii) excising such spots and treating them for protein identification; (iv) linking the protein identifications to the 2D-gel spots; and (v) performing different types of data analysis to discover the potential biomarkers and extract meaningful biological knowledge. Such workflow generates a large amount of data, which need to be processed before they can be properly analyzed. A considerable part of the aforementioned data processing is usually carried out manually by laboratory researchers (e.g. using text editors and spreadsheet software). However, such a repetitive and non-automated process presents important drawbacks: it is time consuming, it is error-prone, and it tends to lack reliability and reproducibility.

To overcome the aforementioned drawbacks we have developed the S2P software application (http://www.sing-group.org/s2p/), a free software that aims to help researchers overcome these tedious but necessary data processing steps.

S2P was created with a twofold purpose: to improve reproducibility and to save time. Currently, lack of reproducibility is a growing concern in science [6]. The S2P software aims to improve reproducibility by avoiding human errors due to manual data processing. For instance, this issue has been particularly important in recent genomics bioinformatics, where gene name errors have been shown to be widespread in the scientific literature due to the use of Excel [7], [8]. Through its user-friendly GUI interface, S2P dramatically reduces the time that researchers need to invest in order to prepare data for analysis. To the best of our knowledge there is currently no other application offering similar functionalities.

The usefulness of S2P is illustrated by three case studies. The first is a case study experiment that aims to establish a biomarker-based method to allow better diagnosis and monitoring of patients with bladder cancer. The second aims to develop a longitudinal study to unravel the evolution of proteome of the peritoneal dialysate with time, so that biomarkers and molecular profiles for diagnosis and prognosis can be obtained. Finally, the third case study, which demonstrates how S2P has been extended to support new types of data, shows how it can be used to determine the relative abundance of serum protein using Mascot emPAI quantification data.

Section snippets

Case study 1 dataset

The first case study uses a dataset composed of 14 patients plus 1 healthy group of 6 individuals. Plasma samples were collected from 7 anonymous patients diagnosed with bladder cancer, 7 anonymous patients diagnosed with lower urinary tract symptoms (LUTS) and 6 healthy individuals, following standard procedures. All patients and healthy volunteers were informed about the project and their consent was obtained in written form. The local ethics committee approved the study. This experiment was

Results

With the goal of showing the main features of S2P as well as its usefulness to analyze real data, this section shows how it has been used to process and analyze the three case study datasets presented.

Discussion

We provided an overview of the S2P v1.2 software and presented three real-world use cases to illustrate is capabilities. The two main advantages of S2P, when compared to alternative approaches to achieve the same data analysis workflow are time saving and reproducibility. To our knowledge, the unique alternative approach to such workflow is that laboratory researchers do the processing by themselves using text editors and spreadsheet software.

Regarding the time saving, it is important to

Conclusions

S2P (http://www.sing-group.org/s2p/) is freely distributed under license GPLv3, providing a friendly graphical user interface designed to allow researchers to save time in data processing tasks related to 2D-gel electrophoresis and MALDI mass spectrometry protein identification-based data. The usefulness of S2P has been demonstrated by its application to real experiments, where it notably speeds up data processing as well as improves experiment reproducibility and reliability. S2P is open to

Mode of availability

The S2P software is licensed under a GNU GPL 3.0 License (http://www.gnu.org/copyleft/gpl.html). The S2P software along with full documentation and training tutorials are free and publicly available at http://www.sing-group.org/s2p.

Authors’ contributions

HL-F, JEA, HMS, and JLC conceived, coordinated and designed S2P. HL-F, DG-P, and MR-J developed S2P. HL-F, JEA, SJ, and HMS tested the application. HL-F created the S2P website. JEA, SJ, and HMS performed the laboratory experiments. HL-F, JEA, HMS, and FF-R wrote the manuscript. FF-R, DG-P, MR-J, and JLC drafted the manuscript critically. All authors read and approved the final version of the manuscript.

Acknowledgements

We are thankful to Doctors Pedro M. Baltazar, Luís Campos Pinheiro, and Fernando Calais da Silva from the Urology Service at Centro Hospital de Lisboa Central (Lisboa, Portugal) for providing case study 1 samples. We are also thankful to Doctors Fernando Teixeira e Costa and Aura Ramos from the Nephrology Department at Hospital Garcia de Orta (Almada, Portugal) for providing case study 2 samples. SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from University of

Funding

This work has been partially funded by: (i) the "Platform of integration of intelligent techniques for analysis of biomedical information” project (TIN2013-47153-C3-3-R) from Spanish Ministry of Economy and Competitiveness; (ii) the “Discovery of biomarkers for bladder carcinoma diagnosis” project from Nova Medical School (Nova Health Project Pilot 01); (iii) the “Biomarkers for rheumatic inflammatory diseases diagnosis” project from Nova Medical School (Nova Health Project Pilot 02); (iv) the

Conflict of interest statement

The authors have no conflicts of interest to declare.

References (28)

  • C.-T. Hsueh et al.

    Novel biomarkers for diagnosis, prognosis, targeted therapy and clinical trials

    Biomark. Res.

    (2013)
  • M. Baker

    Reproducibility: seek out stronger science

    Nature

    (2016)
  • B.R. Zeeberg et al.

    Mistaken Identifiers: Gene name errors can be introduced inadvertently when using excel in bioinformatics

    BMC Bioinform.

    (2004)
  • M. Ziemann et al.

    Gene name errors are widespread in the scientific literature

    Genome Biol

    (2016)
  • Cited by (4)

    • FLEXOR: A support tool for efficient and seamless experiment data processing to evaluate musculo-articular stiffness

      2019, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      Let us consider the burden that managing different files across different software tools by hand would imply. Again, without appropriate automated technical support, this task is rather time-consuming and error-prone [15,16]. In the present paper, FLEXOR is presented as an integrated, seamless, software solution that implements several procedures, such as signal processing, data management methods, algorithms and flexible functionalities, to support different adjustments that empower physicians and sport-medicine practitioners to observe and measure the responsiveness (i.e., MAS) of plantar flexor muscle-tendon units (MTU) while enabling trial-and-error experimentation with different adjustment parameters and without having to move between different software applications.

    • DEWE: A novel tool for executing differential expression RNA-Seq workflows in biomedical research

      2019, Computers in Biology and Medicine
      Citation Excerpt :

      Section 5.9 of the user's manual provides the corresponding technical details. The DEWE software v1.2 is implemented in Java 8 using AIBench [32], which is a framework for the rapid development of scientific applications, with several successful biomedical developments [33–36]. The source code of DEWE is divided into four modules: (i) the api, which contains the Application Programming Interface (API) definition; (ii) the core, which incorporates the default API implementation; (iii) the gui, which includes several reusable GUI components; and, (iv) the aibench, which implements the final GUI application.

    1

    These authors contributed equally to this work.

    View full text