skip to main content
10.1145/3233547.3233604acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article
Open Access

ULTRA: A Model Based Tool to Detect Tandem Repeats

Published:15 August 2018Publication History

ABSTRACT

In biological sequences, tandem repeats consist of tens to hundreds of residues of a repeated pattern, such as atgatgatgatgatg ('atg' repeated), often the result of replication slippage. Over time, these repeats decay so that the original sharp pattern of repetition is somewhat obscured, but even degenerate repeats pose a problem for sequence annotation: when two sequences both contain shared patterns of similar repetition, the result can be a false signal of sequence homology. We describe an implementation of a new hidden Markov model for detecting tandem repeats that shows substantially improved sensitivity to labeling decayed repetitive regions, presents low and reliable false annotation rates across a wide range of sequence composition, and produces scores that follow a stable distribution. On typical genomic sequence, the time and memory requirements of the resulting tool (ULTRA) are competitive with the most heavily used tool for repeat masking (TRF). ULTRA is released under an open source license and lays the groundwork for inclusion of the model in sequence alignment tools and annotation pipelines.

References

  1. Stephen F Altschul, Thomas L Madden, Alejandro A Sch"affer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J Lipman . 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research Vol. 25, 17 (1997), 3389--3402.Google ScholarGoogle ScholarCross RefCross Ref
  2. John AL Armour . 2006. Tandemly repeated DNA: why should anyone care? Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis Vol. 598, 1 (2006), 6--14.Google ScholarGoogle ScholarCross RefCross Ref
  3. Albino Bacolla, Jacquelynn E Larson, Jack R Collins, Jian Li, Aleksandar Milosavljevic, Peter D Stenson, David N Cooper, and Robert D Wells . 2008. Abundance and length of simple repeats in vertebrate genomes are determined by their structural properties. Genome Research Vol. 18, 10 (2008), 1545--1553.Google ScholarGoogle ScholarCross RefCross Ref
  4. Gary Benson . 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research Vol. 27, 2 (1999), 573.Google ScholarGoogle ScholarCross RefCross Ref
  5. Juan Caballero, Arian FA Smit, Leroy Hood, and Gustavo Glusman . 2014. Realistic artificial DNA sequences as negative controls for computational genomics. Nucleic Acids Research Vol. 42, 12 (2014), e99--e99.Google ScholarGoogle ScholarCross RefCross Ref
  6. International Human Genome Sequencing Consortium . 2001. Initial sequencing and analysis of the human genome. Nature Vol. 409, 6822 (2001), 860.Google ScholarGoogle Scholar
  7. Sean R Eddy . 2009. A new generation of homology search tools based on probabilistic inference. Genome Informatics Vol. 23 (2009), 205--211.Google ScholarGoogle Scholar
  8. Marta Farré, Montserrat Bosch, Francesc López-Giráldez, Montserrat Ponsà, and Aurora Ruiz-Herrera . 2011. Assessing the role of tandem repeats in shaping the genomic architecture of great apes. PLoS One Vol. 6, 11 (2011), e27239.Google ScholarGoogle ScholarCross RefCross Ref
  9. Martin C Frith . 2010. A new repeat-masking method enables specific detection of homologous sequences. Nucleic Acids Research Vol. 39, 4 (2010), e23--e23.Google ScholarGoogle ScholarCross RefCross Ref
  10. Martin C Frith, Michiaki Hamada, and Paul Horton . 2010. Parameters for accurate genome alignment. BMC Bioinformatics Vol. 11, 1 (2010), 80.Google ScholarGoogle ScholarCross RefCross Ref
  11. Rita Gemayel, Marcelo D Vinces, Matthieu Legendre, and Kevin J Verstrepen . 2010. Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annual Review of Genetics Vol. 44 (2010), 445--477.Google ScholarGoogle ScholarCross RefCross Ref
  12. An Jansen, Rita Gemayel, and KJ Verstrepen . 2012. Unstable microsatellite repeats facilitate rapid evolution of coding and regulatory sequences. Repetitive DNA Vol. 7 (2012), 108--125.Google ScholarGoogle ScholarCross RefCross Ref
  13. Yechezkel Kashi and David G King . 2006. Simple sequence repeats as advantageous mutators in evolution. TRENDS in Genetics Vol. 22, 5 (2006), 253--259.Google ScholarGoogle ScholarCross RefCross Ref
  14. Jessica Kolb, Nadia A Chuzhanova, Josef Högel, Karen M Vasquez, David N Cooper, Albino Bacolla, and Hildegard Kehrer-Sawatzki . 2009. Cruciform-forming inverted repeats appear to have mediated many of the microinversions that distinguish the human and chimpanzee genomes. Chromosome Research Vol. 17, 4 (2009), 469--483.Google ScholarGoogle ScholarCross RefCross Ref
  15. Sébastien Leclercq, Eric Rivals, and Philippe Jarne . 2007. Detecting microsatellites within genomes: significant variation among algorithms. BMC Bioinformatics Vol. 8, 1 (2007), 125.Google ScholarGoogle ScholarCross RefCross Ref
  16. Kian Guan Lim, Chee Keong Kwoh, Li Yang Hsu, and Adrianto Wirawan . 2012. Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance. Briefings in Bioinformatics Vol. 14, 1 (2012), 67--81.Google ScholarGoogle ScholarCross RefCross Ref
  17. Angelika Merkel and Neil Gemmell . 2008. Detecting short tandem repeats from genome data: opening the software black box. Briefings in Bioinformatics Vol. 9, 5 (2008), 355--366.Google ScholarGoogle ScholarCross RefCross Ref
  18. Jaina Mistry, Robert D Finn, Sean R Eddy, Alex Bateman, and Marco Punta . 2013. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Research Vol. 41, 12 (2013), e121--e121.Google ScholarGoogle ScholarCross RefCross Ref
  19. Aleksandr Morgulis, E Michael Gertz, Alejandro A Sch"affer, and Richa Agarwala . 2006. A fast and symmetric DUST implementation to mask low-complexity DNA sequences. Journal of Computational Biology Vol. 13, 5 (2006), 1028--1040.Google ScholarGoogle ScholarCross RefCross Ref
  20. Michal Nánási, Tomávs Vinavr, and Brovna Brejová . 2014. Probabilistic approaches to alignment with tandem repeats. Algorithms for Molecular Biology Vol. 9, 1 (2014), 3.Google ScholarGoogle ScholarCross RefCross Ref
  21. Danilo Pumpernik, Borut Oblak, and Branko Borvstnik . 2008. Replication slippage versus point mutation rates in short tandem repeats of the human genome. Molecular Genetics and Genomics Vol. 279, 1 (2008), 53--61.Google ScholarGoogle ScholarCross RefCross Ref
  22. K Usdin and E Grabczyk . 2000. DNA repeat expansions and human disease. Cellular and Molecular Life Sciences Vol. 57, 6 (2000), 914--931.Google ScholarGoogle ScholarCross RefCross Ref
  23. Travis J Wheeler, Jody Clements, Sean R Eddy, Robert Hubley, Thomas A Jones, Jerzy Jurka, Arian FA Smit, and Robert D Finn . 2012. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Research Vol. 41, D1 (2012), D70--D82.Google ScholarGoogle ScholarCross RefCross Ref
  24. Wing-Cheong Wong, Sebastian Maurer-Stroh, and Frank Eisenhaber . 2010. More than 1,001 problems with protein domain databases: transmembrane regions, signal peptides and the issue of sequence homology. PLoS Computational Biology Vol. 6, 7 (2010), e1000867.Google ScholarGoogle ScholarCross RefCross Ref
  25. John C Wootton and Scott Federhen . 1996. {33} Analysis of compositionally biased regions in sequence databases. Computer Methods for Macromolecular Sequence Analysis Vol. 266 (1996), 554--571.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. ULTRA: A Model Based Tool to Detect Tandem Repeats

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      BCB '18: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
      August 2018
      727 pages
      ISBN:9781450357944
      DOI:10.1145/3233547

      Copyright © 2018 Owner/Author

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 August 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      BCB '18 Paper Acceptance Rate46of148submissions,31%Overall Acceptance Rate254of885submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader