Skip to main content

Simultaneously Learning DNA Motif along with Its Position and Sequence Rank Preferences through EM Algorithm

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2012)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7262))

Abstract

Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e. position preference and sequence rank preference). This information is usually required from the user. This paper presents a de novo motif discovery algorithm called SEME which uses pure probabilistic mixture model to model the motif’s binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large scale synthetic datasets, 32 metazoan compendium benchmark datasets and 164 ChIP-Seq libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (co-TF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct co-TF motifs and, at the same time, predicted co-TF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each co-TF reveals potential interaction mechanisms between the primary TF and the co-TF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the co-TFs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ashburner, M.: Gene ontology: Tool for the unification of biology. Nature Genetics 25, 25–29 (2000)

    Article  Google Scholar 

  2. Bailey, T.L.: Dreme: Motif discovery in transcription factor chip-seq data. Bioinformatics 27(12), 1653 (2011)

    Article  Google Scholar 

  3. Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 2, pp. 28–36 (1994)

    Google Scholar 

  4. Berger, M.F., Bulyk, M.L.: Protein binding microarrays (pbms) for rapid, high-throughput characterization of the sequence specificities of dna binding proteins. Methods in Molecular Biology-Clifton then Totowa 338, 245 (2006)

    Google Scholar 

  5. Chen, X., Hughes, T.R., Morris, Q.: Rankmotif++: a motif-search algorithm that accounts for relative ranks of k-mers in binding transcription factors. Bioinformatics 23(13), i72 (2007)

    Article  Google Scholar 

  6. Chen, X., Xu, H., Yuan, P., Fang, F., Huss, M., Vega, V.B., Wong, E., Orlov, Y.L., Zhang, W., Jiang, J., et al.: Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133(6), 1106–1117 (2008)

    Article  Google Scholar 

  7. Ettwiller, L., Paten, B., Ramialison, M., Birney, E., Wittbrodt, J.: Trawler: de novo regulatory motif discovery pipeline for chromatin immunoprecipitation. Nature Methods 4(7), 563–565 (2007)

    Article  Google Scholar 

  8. Euskirchen, G.M., Rozowsky, J.S., Wei, C.L., Lee, W.H., Zhang, Z.D., Hartman, S., Emanuelsson, O., Stolc, V., Weissman, S., Gerstein, M.B., et al.: Mapping of transcription factor binding regions in mammalian cells by chip: comparison of array-and sequencing-based technologies. Genome Research 17(6), 898 (2007)

    Article  Google Scholar 

  9. Frith, M.C., Hansen, U., Spouge, J.L., Weng, Z.: Finding functional sequence elements by multiple local alignment. Nucleic Acids Research 32(1), 189 (2004)

    Article  Google Scholar 

  10. Gao, N., Zhang, J., Rao, M.A., Case, T.C., Mirosevich, J., Wang, Y., Jin, R., Gupta, A., Rennie, P.S., Matusik, R.J.: The role of hepatocyte nuclear factor-3α (forkhead box a1) and androgen receptor in transcriptional regulation of prostatic genes. Molecular Endocrinology 17(8), 1484 (2003)

    Article  Google Scholar 

  11. Glynn, P.W., Iglehart, D.L.: Importance sampling for stochastic simulations. Management Science, 1367–1392 (1989)

    Google Scholar 

  12. Hu, M., Yu, J., Taylor, J.M.G., Chinnaiyan, A.M., Qin, Z.S.: On the detection and refinement of transcription factor binding sites using chip-seq data. Nucleic Acids Research 38(7), 2154 (2010)

    Article  Google Scholar 

  13. Keilwagen, J., Grau, J., Paponov, I.A., Posch, S., Strickert, M., Grosse, I.: De-novo discovery of differentially abundant transcription factor binding sites including their positional preference. PLoS Computational Biology 7(2), e1001070 (2011)

    Article  MathSciNet  Google Scholar 

  14. Kong, S.L., Li, G., Loh, S.L., Sung, W.K., Liu, E.T.: Cellular reprogramming by the conjoint action of erα, foxa1, and gata3 to a ligand-inducible growth state. Molecular Systems Biology 7(1) (2011)

    Google Scholar 

  15. Kulakovskiy, I.V., Boeva, V.A., Favorov, A.V., Makeev, V.J.: Deep and wide digging for binding motifs in chip-seq data. Bioinformatics 26(20), 2622 (2010)

    Article  Google Scholar 

  16. Lam, T.W., Sadakane, K., Sung, W.K., Yiu, S.M.: A space and time efficient algorithm for constructing compressed suffix arrays. Computing and Combinatorics, 21–26 (2002)

    Google Scholar 

  17. Linhart, C., Halperin, Y., Shamir, R.: Transcription factor and microRNA motif discovery: The Amadeus platform and a compendium of metazoan target sets. Genome Research 18(7), 1180 (2008)

    Article  Google Scholar 

  18. Liu, X.S., Brutlag, D.L., Liu, J.S.: An algorithm for finding protein–dna binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nature Biotechnology 20(8), 835–839 (2002)

    Google Scholar 

  19. Liu, Y., Schmidt, B., Liu, W., Maskell, D.L.: CUDA-MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units. Pattern Recognition Letters (2009)

    Google Scholar 

  20. Mahony, S., Auron, P.E., Benos, P.V.: Dna familial binding profiles made easy: comparison of various motif alignment and clustering strategies. PLoS Computational Biology 3(3), e61 (2007)

    Article  MathSciNet  Google Scholar 

  21. Narang, V., Mittal, A., Sung, W.K.: Localized motif discovery in gene regulatory sequences. Bioinformatics 26(9), 1152 (2010)

    Article  Google Scholar 

  22. Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17(suppl. 1), 207–214 (2001)

    Article  Google Scholar 

  23. Raphael, B., Liu, L.T., Varghese, G.: A uniform projection method for motif discovery in dna sequences. IEEE Transactions on Computational biology and Bioinformatics, 91–94 (2004)

    Google Scholar 

  24. Reid, J.E., Wernisch, L.: Steme: efficient em to find motifs in large data sets. Nucleic Acids Research 39(18), e126–e126 (2011)

    Article  Google Scholar 

  25. Roth1JT, F.P., Hughes, J.D., Estep, P.W., Church, G.M.: Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnology  16, 939 (1998)

    Article  Google Scholar 

  26. Sahu, B., Laakso, M., Ovaska, K., Mirtti, T., Lundin, J., Rannikko, A., Sankila, A., Turunen, J.P., Lundin, M., Konsti, J., et al.: Dual role of foxa1 in androgen receptor binding to chromatin, androgen signalling and prostate cancer. The EMBO Journal 30(19), 3962–3976 (2011)

    Article  Google Scholar 

  27. Sharov, A.A., Ko, M.S.H.: Exhaustive Search for Over-represented DNA Sequence Motifs with CisFinder. DNA Research (2009)

    Google Scholar 

  28. Sinha, S.: On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics 22(14) (2006)

    Google Scholar 

  29. Sinha, S., Tompa, M.: A statistical method for finding transcription factor binding sites. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 344–354 (2000)

    Google Scholar 

  30. Valouev, A., Johnson, D.S., Sundquist, A., Medina, C., Anton, E., Batzoglou, S., Myers, R.M., Sidow, A.: Genome-wide analysis of transcription factor binding sites based on chip-seq data. Nature Methods 5(9), 829 (2008)

    Article  Google Scholar 

  31. Wasserman, W.W., Sandelin, A.: Applied bioinformatics for the identification of regulatory elements. Nature Reviews Genetics 5(4), 276–287 (2004)

    Article  Google Scholar 

  32. Wu, Q., Ng, H.H.: Mark the transition: chromatin modifications and cell fate decision. Cell Research (2011)

    Google Scholar 

  33. Zhang, Z., Chang, C.W., Goh, W.L., Sung, W.K., Cheung, E.: Centdist: discovery of co-associated factors by motif distribution. Nucleic Acids Research 39(suppl. 2), W391 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, Z., Chang, C.W., Hugo, W., Cheung, E., Sung, WK. (2012). Simultaneously Learning DNA Motif along with Its Position and Sequence Rank Preferences through EM Algorithm. In: Chor, B. (eds) Research in Computational Molecular Biology. RECOMB 2012. Lecture Notes in Computer Science(), vol 7262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29627-7_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29627-7_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29626-0

  • Online ISBN: 978-3-642-29627-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics