research-article

Open Access

ULTRA: A Model Based Tool to Detect Tandem Repeats

Authors:
Daniel Olson

University of Montana, Missoula, MT, USA

University of Montana, Missoula, MT, USA
View Profile

,
Travis Wheeler

University of Montana, Missoula, MT, USA

University of Montana, Missoula, MT, USA
View Profile

BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health InformaticsAugust 2018Pages 37–46https://doi.org/10.1145/3233547.3233604

Published:15 August 2018Publication History

BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

Pages 37–46

ABSTRACT

In biological sequences, tandem repeats consist of tens to hundreds of residues of a repeated pattern, such as atgatgatgatgatg ('atg' repeated), often the result of replication slippage. Over time, these repeats decay so that the original sharp pattern of repetition is somewhat obscured, but even degenerate repeats pose a problem for sequence annotation: when two sequences both contain shared patterns of similar repetition, the result can be a false signal of sequence homology. We describe an implementation of a new hidden Markov model for detecting tandem repeats that shows substantially improved sensitivity to labeling decayed repetitive regions, presents low and reliable false annotation rates across a wide range of sequence composition, and produces scores that follow a stable distribution. On typical genomic sequence, the time and memory requirements of the resulting tool (ULTRA) are competitive with the most heavily used tool for repeat masking (TRF). ULTRA is released under an open source license and lays the groundwork for inclusion of the model in sequence alignment tools and annotation pipelines.

References

Stephen F Altschul, Thomas L Madden, Alejandro A Sch"affer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J Lipman . 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research Vol. 25, 17 (1997), 3389--3402.Google ScholarCross Ref
John AL Armour . 2006. Tandemly repeated DNA: why should anyone care? Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis Vol. 598, 1 (2006), 6--14.Google ScholarCross Ref
Albino Bacolla, Jacquelynn E Larson, Jack R Collins, Jian Li, Aleksandar Milosavljevic, Peter D Stenson, David N Cooper, and Robert D Wells . 2008. Abundance and length of simple repeats in vertebrate genomes are determined by their structural properties. Genome Research Vol. 18, 10 (2008), 1545--1553.Google ScholarCross Ref
Gary Benson . 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research Vol. 27, 2 (1999), 573.Google ScholarCross Ref
Juan Caballero, Arian FA Smit, Leroy Hood, and Gustavo Glusman . 2014. Realistic artificial DNA sequences as negative controls for computational genomics. Nucleic Acids Research Vol. 42, 12 (2014), e99--e99.Google ScholarCross Ref
International Human Genome Sequencing Consortium . 2001. Initial sequencing and analysis of the human genome. Nature Vol. 409, 6822 (2001), 860.Google Scholar
Sean R Eddy . 2009. A new generation of homology search tools based on probabilistic inference. Genome Informatics Vol. 23 (2009), 205--211.Google Scholar
Marta Farré, Montserrat Bosch, Francesc López-Giráldez, Montserrat Ponsà, and Aurora Ruiz-Herrera . 2011. Assessing the role of tandem repeats in shaping the genomic architecture of great apes. PLoS One Vol. 6, 11 (2011), e27239.Google ScholarCross Ref
Martin C Frith . 2010. A new repeat-masking method enables specific detection of homologous sequences. Nucleic Acids Research Vol. 39, 4 (2010), e23--e23.Google ScholarCross Ref
Martin C Frith, Michiaki Hamada, and Paul Horton . 2010. Parameters for accurate genome alignment. BMC Bioinformatics Vol. 11, 1 (2010), 80.Google ScholarCross Ref
Rita Gemayel, Marcelo D Vinces, Matthieu Legendre, and Kevin J Verstrepen . 2010. Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annual Review of Genetics Vol. 44 (2010), 445--477.Google ScholarCross Ref
An Jansen, Rita Gemayel, and KJ Verstrepen . 2012. Unstable microsatellite repeats facilitate rapid evolution of coding and regulatory sequences. Repetitive DNA Vol. 7 (2012), 108--125.Google ScholarCross Ref
Yechezkel Kashi and David G King . 2006. Simple sequence repeats as advantageous mutators in evolution. TRENDS in Genetics Vol. 22, 5 (2006), 253--259.Google ScholarCross Ref
Jessica Kolb, Nadia A Chuzhanova, Josef Högel, Karen M Vasquez, David N Cooper, Albino Bacolla, and Hildegard Kehrer-Sawatzki . 2009. Cruciform-forming inverted repeats appear to have mediated many of the microinversions that distinguish the human and chimpanzee genomes. Chromosome Research Vol. 17, 4 (2009), 469--483.Google ScholarCross Ref
Sébastien Leclercq, Eric Rivals, and Philippe Jarne . 2007. Detecting microsatellites within genomes: significant variation among algorithms. BMC Bioinformatics Vol. 8, 1 (2007), 125.Google ScholarCross Ref
Kian Guan Lim, Chee Keong Kwoh, Li Yang Hsu, and Adrianto Wirawan . 2012. Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Briefings in Bioinformatics Vol. 14, 1 (2012), 67--81.Google ScholarCross Ref
Angelika Merkel and Neil Gemmell . 2008. Detecting short tandem repeats from genome data: opening the software black box. Briefings in Bioinformatics Vol. 9, 5 (2008), 355--366.Google ScholarCross Ref
Jaina Mistry, Robert D Finn, Sean R Eddy, Alex Bateman, and Marco Punta . 2013. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Research Vol. 41, 12 (2013), e121--e121.Google ScholarCross Ref
Aleksandr Morgulis, E Michael Gertz, Alejandro A Sch"affer, and Richa Agarwala . 2006. A fast and symmetric DUST implementation to mask low-complexity DNA sequences. Journal of Computational Biology Vol. 13, 5 (2006), 1028--1040.Google ScholarCross Ref
Michal Nánási, Tomávs Vinavr, and Brovna Brejová . 2014. Probabilistic approaches to alignment with tandem repeats. Algorithms for Molecular Biology Vol. 9, 1 (2014), 3.Google ScholarCross Ref
Danilo Pumpernik, Borut Oblak, and Branko Borvstnik . 2008. Replication slippage versus point mutation rates in short tandem repeats of the human genome. Molecular Genetics and Genomics Vol. 279, 1 (2008), 53--61.Google ScholarCross Ref
K Usdin and E Grabczyk . 2000. DNA repeat expansions and human disease. Cellular and Molecular Life Sciences Vol. 57, 6 (2000), 914--931.Google ScholarCross Ref
Travis J Wheeler, Jody Clements, Sean R Eddy, Robert Hubley, Thomas A Jones, Jerzy Jurka, Arian FA Smit, and Robert D Finn . 2012. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Research Vol. 41, D1 (2012), D70--D82.Google ScholarCross Ref
Wing-Cheong Wong, Sebastian Maurer-Stroh, and Frank Eisenhaber . 2010. More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology. PLoS Computational Biology Vol. 6, 7 (2010), e1000867.Google ScholarCross Ref
John C Wootton and Scott Federhen . 1996. {33} Analysis of compositionally biased regions in sequence databases. Computer Methods for Macromolecular Sequence Analysis Vol. 266 (1996), 554--571.Google ScholarCross Ref

Index Terms

ULTRA: A Model Based Tool to Detect Tandem Repeats
1. Applied computing
  1. Life and medical sciences
    1. Computational biology
      1. Molecular sequence analysis

Recommendations

Identification and analysis of novel tandem repeats in the cell surface proteins of archaeal and bacterial genomes using computational tools: Primary Research Papers

We have identified four novel repeats and two domains in cell surface proteins encoded by the Methanosarcina acetivorans genome and in some archaeal and bacterial genomes. The repeats correspond to a certain number of amino acid residues present in ...
Read More
Detecting fuzzy amino acid tandem repeats in protein sequences
BCB '11: Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine

Tandem repetitions within protein amino acid sequences often correspond to regular secondary structures and form multi-repeat 3D assemblies of varied size and function. Developing internal repetitions is one of the evolutionary mechanisms that proteins ...
Read More
Application of the Burrows-Wheeler Transform for searching for tandem repeats in DNA sequences

Genomic sequences contain a variety of repeated structures of various lengths and types, interspersed or in tandem. Repetitive structures play an important role in molecular biology; they are related to the genetic backgrounds of inherited diseases, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
August 2018
727 pages
ISBN:9781450357944
DOI:10.1145/3233547
General Chairs:
Amarda Shehu
George Mason University, USA
,
Cathy Wu
University of Delaware, USA
,
Program Chairs:
Christina Boucher
University of Florida, USA
,
Jing Li
Case Western Reserve University, USA
,
Hongfang Liu
Mayo Clinic, USA
,
Mihai Pop
University of Maryland, USA
Copyright © 2018 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 August 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
annotation error
sequence alignment
tandem repeats
Qualifiers
- research-article
Conference

Acceptance Rates
BCB '18 Paper Acceptance Rate46of148submissions,31%Overall Acceptance Rate254of885submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 829
  Total Downloads
- Downloads (Last 12 months)155
- Downloads (Last 6 weeks)27
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ULTRA: A Model Based Tool to Detect Tandem Repeats

BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Identification and analysis of novel tandem repeats in the cell surface proteins of archaeal and bacterial genomes using computational tools: Primary Research Papers

Detecting fuzzy amino acid tandem repeats in protein sequences

Application of the Burrows-Wheeler Transform for searching for tandem repeats in DNA sequences

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

ULTRA: A Model Based Tool to Detect Tandem Repeats

BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Identification and analysis of novel tandem repeats in the cell surface proteins of archaeal and bacterial genomes using computational tools: Primary Research Papers

Detecting fuzzy amino acid tandem repeats in protein sequences

Application of the Burrows-Wheeler Transform for searching for tandem repeats in DNA sequences

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media