ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article
Revised

AlignmentViewer: Sequence Analysis of Large Protein Families

[version 2; peer review: 2 approved]
PUBLISHED 15 Oct 2020
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Bioinformatics gateway.

Abstract

AlignmentViewer is a web-based tool to view and analyze multiple sequence alignments of protein families. The particular strengths of AlignmentViewer include flexible visualization at different scales as well as analysis of conservation patterns and of the distribution of proteins in sequence space. The tool is directly accessible in web browsers without the need for software installation. It can handle protein families with tens of thousands of sequences and is particularly suitable for evolutionary coupling analysis, e.g. via EVcouplings.org.

Keywords

alignment viewer, MSA, JavaScript, protein alignments, web-based, tool,

Revised Amendments from Version 1

We have updated the manuscript and the software according to reviewer comments. We have added a new feature to select the coloring scheme of the alignment, addressed cross-browser compatibility issues, and improved the usability. We have also addressed comments from the reviewers meant to improve the clarity of the manuscript.

See the authors' detailed response to the review by Alex Bateman
See the authors' detailed response to the review by Erik Larsson Lekholm

Introduction

Multiple sequence alignment (MSA) analysis (e.g., analysis of sequence patterns, subfamilies, specificity residues, evolutionary couplings) and visualization allows researchers to extract information and gain a better understanding of protein families. MSA is a basic step in many protein analysis workflows, including 3D structure prediction (Marks et al., 2011), structure detection in flexible (‘disordered’) domains (Toth-Petroczy et al., 2016), function prediction (Tamames et al., 1998) and intracellular localization (Goldberg et al., 2014).

A number of useful tools exist for the visualization of protein MSAs, such as, MView, Wasabi, AliView, MSAViewer and Jalview (Brown et al., 1998; Larsson, 2014; Veidenberg et al., 2016; Waterhouse et al., 2009; Yachdav et al., 2016). MView was one of the first online browser-based MSA viewers, with alignments formatted as an HTML document. Wasabi is a web-based tool particularly useful for phylogenetic analysis and incorporates phylogeny-aware alignment methods. Another desktop application, AliView, has features such as sorting, viewing, removing, editing and merging sequences from large nucleotide sequence datasets. MSAViewer is an interactive MSA visualizer in JavaScript that implements basic features of viewing, scrolling and motif selection. Jalview is a Java-based desktop tool accessible through websites using an embeddable applet, but unfortunately the technology for these applets is no longer supported in most browsers.

AlignmentViewer complements these MSA tools and provides the following features: (i) in-browser and serverless execution, (ii) visualization of very large MSAs, (iii) visualization of conservation patterns, (iv) sequence filtering, (v) logo display, (vi) pairwise sequence identity map, (vii) sequence space exploration by UMAP dimensionality reduction, and (viii) display of top-ranked evolutionary couplings (Hopf et al., 2019).

An earlier version of this article can be found on bioRxiv (DOI: https://doi.org/10.1101/269720); additional features have been implemented since the earlier version.

Methods

Operation

AlignmentViewer is a web-based tool written in JavaScript with minimal system requirements. AlignmentViewer works best on Chrome regardless of operating system. AlignmentViewer is developed with the D3 library (d3js.org) to produce dynamic and interactive data visualizations, with performance (speed) for large alignments a major consideration. The tool is entirely client-based, running inside a web browser without the need for server-side computation.

Implementation

Users can access AlignmentViewer and all its features directly from alignmentviewer.org, but its serverless execution enables anyone to quickly start a local copy for online or offline use. Hyperlinks for lookup in background databases, such as Uniprot or Pfam, are made directly from the client. Alignments can be passed to AlignmentViewer also via a URL query parameter that is served by https and is properly encoded (e.g., https://alignmentviewer.org/?url=https://alignmentviewer.org/example/1bkr_A.1-108.msa.txt), enabling seamless integration from external web services via a simple link (e.g. the EVcouplings, evcouplings.org, web server (Hopf et al., 2019) offers visualization of computed alignments via a link to AlignmentViewer). The tool has been thoroughly tested with many large alignments. An alignment with, e.g., 50,000 sequences (about 13 MB of memory) loads in the Safari browser within one minute; further speedup is planned.

Use case

Figure 1 shows the main functionalities from AlignmentViewer explained in more detail in the next subsections. The top sub-figure shows the msa view with the sequence logo and the alignment capturing most of the attention. This view lets the user examine in depth the alignment. Each amino acid position is represented in sequence logo and the height shows users the information content of each position, in bits. Then, from left to right we show the pixel view, a part of the stats view, and the sequence space (with annotations). The pixel view gives an overview display of the alignment to enable a coarse view of the alignment for better visualization and pattern identification. The all versus all sequence identity sub-figure in Figure 1 (part of the stats view) displays allows users to identify possible clusters in the alignment based on sequence identity. The bottom right sub-figure of Figure 1 displays the sequences clustered by similarity (see section Sequence space) highlighted by user-provided annotations to aid in interpretation of the clusters.

93d34fbb-f42c-4cdc-957a-2e517ffc35c0_figure1.gif

Figure 1. AlignmentViewer visualization of the beta-lactamase protein domain family.

Bars above the sequence alignment quantify residue conservation. The alignment consensus logo (just below the bar chart) is based on the amino acid frequencies. Lower left: pixel view of the alignment especially useful for large families; lower middle: protein-protein sequence similarity matrix graded by percentage identity; lower right: distribution of sequences in sequence space (UMAP projection), colored by species groups.

MSA view

Alignment details. The msa view page has summary information: number of sequences, conservation and gap counts for each position, a sequence logo, and the residues in one letter code. By default, columns with gaps in the reference sequence (first row) are omitted in order to facilitate visual focus on sequence patterns relative to a protein of interest and to avoid extremely gapped alignment views typical of many MSA presentations. The amino acids are colored using a conventional coloring scheme, adopted from Mview, based on amino acid properties.

Sequence attributes and sorting. Sequences in the alignment can be sorted using one of four different methods: (i) the original order provided by the user, (ii–iii) by % sequence identity between a particular sequence and the reference (top) sequence, relative to the first or the second (gaps not counted), and (iv) by user-provided (upload annotations tab) sequence weights or other attributes, such as alignment profile scores (e.g., HMM bit scores). Sequences can be filtered by sequence identity relative to a reference sequence or by percentage of gaps.

Pixel view (suitable for large families)

The pixel view (image view website tab) leads to an overview of the entire depth and breadth of an MSA. The amino acid letters are represented by small rectangles of pixels, retaining the amino acid type coloring (image view tab). This striking visual impression can reveal patterns of conservation and variation, especially for large alignments. This is very useful to gain an intuitive view of sequence properties, noise at the uncertain edges of a protein family, as well as subfamily distributions. The coloring scheme can be by (1) amino acid properties, (2) hydrophobicity (red to blue) or (3) mutational difference (stronger color) in a sequence relative to the reference (first row) sequence.

Stats view

The stats view tab leads to plots of statistical properties of the set of sequences in the alignment, including (i) sequence identity relative to the reference sequence, and (ii) min, max, and average of (i); and (iii) a pairwise sequence identity matrix in which each pixel represents the degree of similarity between two sequences, such that a block-diagonal structure of the matrix is indicative of distinct subfamilies, given, e.g., a tree-derived sequence order as user input. The ordering of sequences by phylogeny is (currently) not part of the tool and can be performed using external tools, e.g., Wasabi (Veidenberg et al., 2016).

Annotations and evolutionary couplings

Users can upload custom numerical attributes or labels for the sequences in the MSA (upload annotations) or evolutionary couplings between residue positions (load couplings). Adding these attributes allows users to use sequence weights, compare different measures of sequence fitness (e.g., bitscore, sequence identity, statistical energy) or visualize evolutionary coupling constraints for pairs of positions.

Sequence space

Users can view representations of the MSA sequences in two- or three-dimensional space under the “sequence space” tab. These representations are generated using the Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction algorithm (McInnes et al., 2018), whichhas been adapted for Javascript using the umap-js library (https://github.com/PAIR-code/umap-js). The alignmentviewer.org implementation uses the number of amino acid differences between pairs of sequences (the Hamming distance) as the distance metric parameter. The algorithm then iteratively calculates an embedding in two- or three-dimensional space, which is displayed in real time for the end users. UMAP hyperparameters are set to reasonable defaults, but can also be configured via the settings panel. Sequences can be colored by user provided annotations ("upload annotations" tab).

Conclusion

AlignmentViewer is a lightweight online viewer for biological multiple sequence alignments that focuses on usability and performance. Written in JavaScript, this tool can be used in many browsers. The architecture of AlignmentViewer allows its use without software installation and without an internet connection. The visualization capabilities, analysis features and metrics in AlignmentViewer are useful in many areas of biology, especially evolutionary, structural, synthetic and chemical biology. In the future we plan to add a visualization of species diversity, predicted contact maps, and organization by sequence subfamilies with specificity residues. A standalone version of AlignmentViewer is available at alignmentviewer.org and is in use by external services including EVcouplings.org. AlignmentViewer is an open source project hosted on GitHub, which welcomes engagement of interested members of the community.

Data availability

All data underlying the results are available as part of the article and no additional source data are required.

Software availability

AlignmentViewer website and demo can be found at: https://alignmentviewer.org/.

Source code available at: https://github.com/sanderlab/alignmentviewer.

Archived source code at time of publication: https://doi.org/10.5281/zenodo.4063551 (Reguant, 2020).

License: MIT license.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 27 Mar 2020
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Reguant R, Antipin Y, Sheridan R et al. AlignmentViewer: Sequence Analysis of Large Protein Families [version 2; peer review: 2 approved] F1000Research 2020, 9:213 (https://doi.org/10.12688/f1000research.22242.2)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 15 Oct 2020
Revised
Views
2
Cite
Reviewer Report 16 Oct 2020
Alex Bateman, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK 
Approved
VIEWS 2
The authors have addressed my concerns fully.

The authors may wish to double check the alignment of the example 1KR:A on the website. It does not look well aligned to my eyes. This suggestion is not required ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Bateman A. Reviewer Report For: AlignmentViewer: Sequence Analysis of Large Protein Families [version 2; peer review: 2 approved]. F1000Research 2020, 9:213 (https://doi.org/10.5256/f1000research.30135.r73164)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 1
VERSION 1
PUBLISHED 27 Mar 2020
Views
28
Cite
Reviewer Report 16 Apr 2020
Alex Bateman, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK 
Approved with Reservations
VIEWS 28
Overall there is a need for Javascript based alignment viewers which can handle large numbers of sequences. So this software is rather timely. However, I think the current implementation seems to still contain significant bugs and the web page requires ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Bateman A. Reviewer Report For: AlignmentViewer: Sequence Analysis of Large Protein Families [version 2; peer review: 2 approved]. F1000Research 2020, 9:213 (https://doi.org/10.5256/f1000research.24533.r61763)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 15 Oct 2020
    Roc Reguant, Department of Cell Biology, Harvard Medical School, Boston, USA
    15 Oct 2020
    Author Response
    Manuscript changes:
    Please fix capitalisation of PFAM to Pfam

    Response: The manuscript has been updated to Pfam.

    Software/Website changes:
    The computing conservation box is annoyingly placed. It does not go away and covers the
    ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 15 Oct 2020
    Roc Reguant, Department of Cell Biology, Harvard Medical School, Boston, USA
    15 Oct 2020
    Author Response
    Manuscript changes:
    Please fix capitalisation of PFAM to Pfam

    Response: The manuscript has been updated to Pfam.

    Software/Website changes:
    The computing conservation box is annoyingly placed. It does not go away and covers the
    ... Continue reading
Views
25
Cite
Reviewer Report 09 Apr 2020
Erik Larsson Lekholm, Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden 
Approved
VIEWS 25
This Software Tool article, describing a web-based sequence alignment viewer (AlignmentViewer), is well-written and very clearly presented. The introduction nicely motivates the need for this tool, given the many other available alternatives. The main features of the software are clearly ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Lekholm EL. Reviewer Report For: AlignmentViewer: Sequence Analysis of Large Protein Families [version 2; peer review: 2 approved]. F1000Research 2020, 9:213 (https://doi.org/10.5256/f1000research.24533.r61767)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 30 Apr 2020
    Chris Sander, Department of Cell Biology, Harvard Medical School, Boston, USA
    30 Apr 2020
    Author Response
    Thank you for your comments and suggestions - they are spot on.
    We will work on taking these into account on the web site and in the next version of the ... Continue reading
  • Author Response 15 Oct 2020
    Roc Reguant, Department of Cell Biology, Harvard Medical School, Boston, USA
    15 Oct 2020
    Author Response
    A little bit of detail may be added to the “Sequence space” section. “Two- or three-dimensional sequence space” is at first a bit confusing, as no information has been given ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 30 Apr 2020
    Chris Sander, Department of Cell Biology, Harvard Medical School, Boston, USA
    30 Apr 2020
    Author Response
    Thank you for your comments and suggestions - they are spot on.
    We will work on taking these into account on the web site and in the next version of the ... Continue reading
  • Author Response 15 Oct 2020
    Roc Reguant, Department of Cell Biology, Harvard Medical School, Boston, USA
    15 Oct 2020
    Author Response
    A little bit of detail may be added to the “Sequence space” section. “Two- or three-dimensional sequence space” is at first a bit confusing, as no information has been given ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 27 Mar 2020
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.