skip to main content
10.1145/332306.332553acmconferencesArticle/Chapter ViewAbstractPublication PagesrecombConference Proceedingsconference-collections
Article
Free Access

Extracting structured motifs using a suffix tree—algorithms and application to promoter consensus identification

Authors Info & Claims
Published:08 April 2000Publication History

ABSTRACT

This paper introduces two exact algorithms for extracting conserved structured motifs from a set of DNA sequences. Structured motifs are composed of p ⪈ 2 parts separated by constrained spacers These algorithms use a suffix tree for fulfilling this task. They are efficient enough to be able to extract site consensus, such as promoter sequences, from a whole collection of non coding sequences extracted from a genome. In particular, their time complexity scales linearly with N2n where n is the average length of the sequences and N their number. An application with interesting results to the identification of promoter consensus sequences in bacterial genomes is shown.

References

  1. 1.O. O. Berg and P. H. yon Hippel. Seleetton of DNA binding sites by regulatory proteins. If. The binding specificity of cyclic AMP receptor protein to recognition sites. J. Mol. B~ol., 200:709-793, 1988.Google ScholarGoogle ScholarCross RefCross Ref
  2. 2.P. Bieganski, J. Riedl, J. V. Carl~, and E M. P~zel. Generalized suffix trees for biological sequence data: applications and implementations. In Proc. of the .27th Hau~a~ Int. Oonf. on Systems Sc~., pages 35-44. iEEE Comp,ter Society Press, 1994.Google ScholarGoogle Scholar
  3. 3.A. Brazrna, I. Jona~en, J. Vdo, and E Ukkonen. Predicting gene regulatory elements ;n sdzao on a genomic scale. Gcno,ne Research, 8:1202-1215, 1998.Google ScholarGoogle Scholar
  4. 4.L R. Cardon and G. D. Stormo. Expectation Maximizalion algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragqnents. J. Mol. Bwl., 223 139-170, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  5. 5.B. Combrugghe, S. B,mby, and H. Buc. Cyclic AMP receptor protein: role in transcription activation Sczence, 224:831- 838, 1984.Google ScholarGoogle Scholar
  6. 6.Y. M Fraenkel, Y. Mandel, D Friedherg, and H Margalit. Identification of common motifs in ,nahgned DNA sequences: application to ~schcr~ch~a eoh trp regulon. Oomput. Appl B;osc;., 11.379-387, 1995.Google ScholarGoogle Scholar
  7. 7.D. J. Galas, M. Eggert, and M S Waterman. Rigorous pattern-recognition methods for DNA sequences. Analysts of promoter sequenee~q from Escher~ch~a colt J. Mot. B:ol., 18l~:117-128, 1985.Google ScholarGoogle Scholar
  8. 8.C. A. Gross, M. Lon~to, and R. Losiek. Bacterial sigma factors. In S. L. Knight and K. R. Yamamoto, editors, 7Yanscr~phonal Rcgulatlon, volume 1, pages 129-176. Cold Spring Harbor Laboratory Press, 1992.Google ScholarGoogle Scholar
  9. 9.D. Ousfield. Algorithms o~ Strings, 7Yees, and Sequences: Computer Smence and Computational B~oloBy. Cambridge University Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.J. D. Helmann. Compilation and analysis of Baedlus subtdts c~-dependent promoter sequences: evidence for extended contact between RNA polymerase and upstream promoter DNA. Nucleic Acids Res., ~a:2aal-2aao, lgos.Google ScholarGoogle Scholar
  11. 11.A. K!ingenhoff, K. Frech, K. Qimndt, and T. Werner. Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity. Bsotnformatscs 1, 15:180-186, 1999.Google ScholarGoogle Scholar
  12. 12.C. E. Lawrence and A. A. Reilly. An expectation maximization (F.M) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins: struct., lunar., and genetics, 7:41-51, 1990.Google ScholarGoogle Scholar
  13. 13.B. Lewin. Genes VI. Oxford University Press, 1997.Google ScholarGoogle Scholar
  14. 14.E. M. McOreight. A space-economical suffix tree construction algorithm. J. ACM, 23:262-272, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.M. A. Mulder, H. Zappe, and L. M. 8teyn. Mycobacterial promoters. Tuber. L~ng Dis., 78:211-223, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  16. 16.O. N. Ozoline, A. A. Deer, and M. V. Arkhipova. Noncanonical sequence elements in the promoter structure, cluster analysis of promoters recognized by Escher, ch,a cot: RNA polymerase. Pluelelc Acids Res., 25:4703-4709, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  17. 17.W H. Press, S. A. Teukolsky, W. T Vetterling, and B. P. F!annery. Numerical Recipes in 0 : The Art of Sctent:fic Computing. Cambridge Univ. Press, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.M.T. Record, W. S. Reznikoff, M. L. Craig, K. L McQuade, and P. J. Schlax. Esc~erichia coli RNA polymerase a?~ promotets, and the kinetics of the steps of transcription initiation. ID F. C. Neidhardt, editor, Escher~ch~a colt and Salmonella, volume 1, pages 792-820. ASM Press, 1996.Google ScholarGoogle Scholar
  19. 19.M.-F Sagot. Spelling approximate repeated or common motifs using a suffix tree. In C. L. Lucchesi and A. V. Mourn, editors, LATIIV'98: TheoretscM In{ormat,cs, Lecture Notes in Computer Science, pages 111-127. Springer-Verlag, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.T. D. Schneider, G. D. Stormo, L. Gold, and A. Ehrenfeucht. Information content of binding sites on nucleotide sequences. Y. Mot. Blot., 188:415-431, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  21. 21.G. D. Stormo and O. W. Hartzell Ill. identifying proteinbinding rotes from unaligned DNA fragments. Proc. Natl. Acad. Scs. USA, 86:1183-1187, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  22. 22.M. Tompa. An exact method for finding short motifs in sequences, with application go the ribosome binding site problem. In Seventh Interna~,onal Sympossum on Intelhgent Systems for Molecular Bsology, pages 262-271, Heidelberg, Germany, 1999. AAAI Press. Google ScholarGoogle Scholar
  23. 23.E. Ukkonen On-line construction of snmx-trees. Algorithm, ca, 14:249-260, 1995.Google ScholarGoogle Scholar
  24. 24.J. van Helden, B. Andre, and J. Collado-Vides. Extracting regulatory sites from the upstream region of yeast genes by computational a~alysis of oligonueleotide frequencies. J. Mol. B,ot., 281:827-842, 1998.Google ScholarGoogle Scholar
  25. 25.A. Vanet, L. Marsan, A. Labigne, and M.-F. Sagot. Inferring regulatory elements from a whole genome. An analysis of the o~~ family of promoter signals. }999. submitted to J. Mol. B~ol.Google ScholarGoogle Scholar
  26. 26.A. Vane{, L. Marsan, and M.-F. Sagot. Promoter sequences and algorithlnical methods for identifying them. Research m Macrab~otogy, 150:1-21, 1999. in press.Google ScholarGoogle Scholar
  27. 27.T. Werner. Models for prediction and recognition of eukaryotic promoters. Mature. Oenome, 10:168-175, 1999Google ScholarGoogle Scholar
  28. 28.F. Wolfertstetter, K. Frech, G. Hcrrmann, and T. Werner. Identification of functional elements in unaligned nucleic acid sequence~ by a novel tuple search algorithms. Oomput. Appl. B,osc:., 12:71-80, 1996.Google ScholarGoogle Scholar
  1. Extracting structured motifs using a suffix tree—algorithms and application to promoter consensus identification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      RECOMB '00: Proceedings of the fourth annual international conference on Computational molecular biology
      April 2000
      329 pages
      ISBN:1581131860
      DOI:10.1145/332306

      Copyright © 2000 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 April 2000

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate148of538submissions,28%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader