Skip to main content
Log in

Exact tandem repeats analyzer (E-TRA): A new program for DNA sequence mining

  • Research Article
  • Published:
Journal of Genetics Aims and scope Submit manuscript

Abstract

Exact Tandem Repeats Analyzer 1.0 (E-TRA) combines sequence motif searches with keywords such as ‘organs’, ‘tissues’, ‘cell lines’ and ‘development stages’ for finding simple exact tandem repeats as well as non-simple repeats. E-TRA has several advanced repeat search parameters/options compared to other repeat finder programs as it not only accepts GenBank, FASTA and expressed sequence tags (EST) sequence files, but also does analysis of multiple files with multiple sequences. The minimum and maximum tandem repeat motif lengths that E-TRA finds vary from one to one thousand. Advanced user defined parameters/options let the researchers use different minimum motif repeats search criteria for varying motif lengths simultaneously. One of the most interesting features of genomes is the presence of relatively short tandem repeats (TRs). These repeated DNA sequences are found in both prokaryotes and eukaryotes, distributed almost at random throughout the genome. Some of the tandem repeats play important roles in the regulation of gene expression whereas others do not have any known biological function as yet. Nevertheless, they have proven to be very beneficial in DNA profiling and genetic linkage analysis studies. To demonstrate the use of E-TRA, we used 5,465,605 human EST sequences derived from 18,814,550 GenBank EST sequences. Our results indicated that 12.44% (679,800) of the human EST sequences contained simple and non-simple repeat string patterns varying from one to 126 nucleotides in length. The results also revealed that human organs, tissues, cell lines and different developmental stages differed in number of repeats as well as repeat composition, indicating that the distribution of expressed tandem repeats among tissues or organs are not random, thus differing from the un-transcribed repeats found in genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Benson G. 1999 Tandem repeats finder: a program to analyse DNA sequences.Nucl. Acids Res. 27, 573–580.

    Article  PubMed  CAS  Google Scholar 

  • Bilgen M., Karaca M., Onus A. N. and Ince A. G. 2004 A software program combining sequence motif searches with keywords for finding repeats containing DNA sequences.Bioinformatics 20, 3379–3386.

    Article  PubMed  CAS  Google Scholar 

  • Fondon J. W., Mele G. M., Brezinschek R. I., Cummings D., Pande A. and Wren J.et al. 1998 Computerized polymorphic marker identification: experimental validation and a predicted human polymorphism catalog.Proc. Natl. Acad. Sci. USA 95, 7514–7519.

    Article  PubMed  CAS  Google Scholar 

  • Heslop-Harrison J. S. 2003 Tandemly repeated DNA sequences and centromeric chromosomal regions ofArabidopsis species.Chromosome Res. 11, 241–253.

    Article  PubMed  CAS  Google Scholar 

  • Huang C., Lin Y., Yang Y., Huang S. and Chen C. 1998 The telomeres ofStreptomyces chromosomes contain conserved palindromic sequences with potential to form complex secondary structures.Mol. Microbiol. 28, 905–916.

    Article  PubMed  CAS  Google Scholar 

  • Ince A. G., Onus A. N., Elmasulu S. Y., Bilgen M. and Karaca M. 2004In silico data mining for development ofCapsicum microsatellites.Proc. Int. 3rd Balkan Symposium on vegetables and potatoes. Bursa, Turkey,Acta Horticulturae (in press).

  • Kantety R. V., La Rota M., Matthews D. E. and Sorrells M. E. 2002 Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat.Plant Mol. Biol. 48, 501–510.

    Article  PubMed  CAS  Google Scholar 

  • Karaca M., Saha S., Jenkins J. N., Zipf A., Kohel R. and Stelly D. M. 2002a Simple Sequence Repeat (SSR) markers linked to theLigon lintless (Li1) mutant in cotton.J. Heredity 93, 221–224.

    Article  CAS  Google Scholar 

  • Karaca M., Saha S., Zipf A., Jenkins J. N. and Lang D. J. 2002b Genetic diversity among Forage Bermuda grass (Cynodon spp.): evidence from chloroplast and nuclear DNA finger-printingCrop Sci. 42, 2118–2127.

    Article  CAS  Google Scholar 

  • Klintschar M. and Wiegand P. 2003 Polymerase slippage in relation to the uniformity of tetrameric repeat stretches.Forensic Sci. Int. 135, 163–166.

    Article  PubMed  CAS  Google Scholar 

  • Kurtz S., Jomuna V. C., Ohlebusch E., Schleiermacher C., Stoye J. and Giegerich R. 2001 REPuter: the manifold applications of repeat analysis on a genomic scale.Nucl. Acids Res. 29, 4633–4642.

    Article  PubMed  CAS  Google Scholar 

  • Lalioti M. D., Scott H. S., Buresi C., Bottani A., Norris M. A., Malafosse A. and Antonarakis S. E. 1997 Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy.Nature 386, 847–852.

    Article  PubMed  CAS  Google Scholar 

  • McMurray C. T. 1999 DNA secondary structure: a common and causative factor for expansion in human disease.Proc. Natl. Acad. Sci. USA 96, 1823–1825.

    Article  PubMed  CAS  Google Scholar 

  • Parisi V., Fonzo V. D. and Aluf-Pentini F. 2003 STRING: finding tandem repeats in DNA sequences.Bioinformatics 19, 1733–1738.

    Article  PubMed  CAS  Google Scholar 

  • Quackenbush J., Cho D., Lee F. L., Holt I., Karamycheva S. and Parvizi B.et al. 2001 The TIGR gene indices: analysis of gene transcript sequences in highly sampled eukaryotic species.Nucl. Acids Res. 29, 159–164.

    Article  PubMed  CAS  Google Scholar 

  • Richard G. F., Hennequin C., Thierry A. and Dujon B. 1999 Trinucleotide repeats and other micro satellites in yeasts.Res. Microbiol. 150, 589–602.

    Article  PubMed  CAS  Google Scholar 

  • Riley D. E. and Krieger J. N. 2004 Short tandem repeats are associated with diverse mRNAs encoding membrane-targeted proteins.Bioassays 26, 434–444.

    Article  CAS  Google Scholar 

  • Saha S., Karaca M., Jenkins J. N., Zipf A. E., Reddy O. U. K., Pepper A. E. and Kantety R. 2003 Simple sequence repeats as useful resources to study transcribed genes of cotton.Euphytica 130, 355–364.

    Article  CAS  Google Scholar 

  • Schmid K. J., Sorensen T. R., Stracke R., Torjek O., Altmann T., Mitchell-Olds T. and Weisshaar B. 2003 Large-scale identification and analysis of genome-wide single-nucleotide polymorphisms for mapping inArabidopsis thaliana.Genome Res. 13, 1250–1257.

    Article  PubMed  Google Scholar 

  • Scott K. D., Eggler P., Seaton G., Rossetto M., Ablett E. M., Lee L. S. and Henry R. J. 2000 Analysis of SSRs derived from grape ESTs.Theor. Appl. Genet. 100, 723–726.

    Article  CAS  Google Scholar 

  • Sreenu V. B., Vishwanath A., Nagaraju J. and Nagarajaram H. A. 2003 MICdb: database of prokaryotic microsatellites.Nucl. Acids Res. 31, 106–108.

    Article  PubMed  CAS  Google Scholar 

  • Thiel T., Michalek V. and Graner A. 2003 Exploiting EST databases for the development and characterization of genederived SSR-markers in barley (Hordeum vulgare L.).Theor. Appl. Genet. 106, 411–422.

    PubMed  CAS  Google Scholar 

  • van Belkum A., Scherer S., van Alphen L. and Verbrugh H. 1998 Short sequence DNA repeats in prokaryotic genomes.Microbiol. Mol. Biol. Rev. 62, 275–293.

    PubMed  Google Scholar 

  • Wheeler D. L., Churc D. M., Federhen S., Lash A. E., Madden T. L. and Pontius J. U.et al. 2003 Database resources of the national center for biotechnology.Nucl. Acids Res. 31, 28–33.

    Article  PubMed  CAS  Google Scholar 

  • Wren J. D., Forgacs E., Fondon J. W., Pertsemlidis A., Cheng S. Y. and Gallardo T.et al. 2000 Repeat polymorphisms within gene regions: phenotypic and evolutionary implications.Am. J. Hum. Genet. 67, 345–356.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehmet Karaca.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Karaca, M., Bilgen, M., Onus, A.N. et al. Exact tandem repeats analyzer (E-TRA): A new program for DNA sequence mining. J Genet 84, 49–54 (2005). https://doi.org/10.1007/BF02715889

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02715889

Keywords

Navigation