A unified model for yeast transcript definition

  1. Timothy R. Hughes1,2,4
  1. 1Department of Molecular Genetics,
  2. 2Banting and Best Department of Medical Research and Donnelly Centre for Cellular and Biomolecular Research,
  3. 3Department of Pharmaceutical Sciences, University of Toronto, Toronto, Ontario, M5S 3E1, Canada

    Abstract

    Identifying genes in the genomic context is central to a cell's ability to interpret the genome. Yet, in general, the signals used to define eukaryotic genes are poorly described. Here, we derived simple classifiers that identify where transcription will initiate and terminate using nucleic acid sequence features detectable by the yeast cell, which we integrate into a Unified Model (UM) that models transcription as a whole. The cis-elements that denote where transcription initiates function primarily through nucleosome depletion, and, using a synthetic promoter system, we show that most of these elements are sufficient to initiate transcription in vivo. Hrp1 binding sites are the major characteristic of terminators; these binding sites are often clustered in terminator regions and can terminate transcription bidirectionally. The UM predicts global transcript structure by modeling transcription of the genome using a hidden Markov model whose emissions are the outputs of the initiation and termination classifiers. We validated the novel predictions of the UM with available RNA-seq data and tested it further by directly comparing the transcript structure predicted by the model to the transcription generated by the cell for synthetic DNA segments of random design. We show that the UM identifies transcription start sites more accurately than the initiation classifier alone, indicating that the relative arrangement of promoter and terminator elements influences their function. Our model presents a concrete description of how the cell defines transcript units, explains the existence of nongenic transcripts, and provides insight into genome evolution.

    Footnotes

    • 4 Corresponding author

      E-mail t.hughes{at}utoronto.ca

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.164327.113.

      Freely available online through the Genome Research Open Access option.

    • Received July 31, 2013.
    • Accepted October 28, 2013.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 3.0 Unported), as described at http://creativecommons.org/licenses/by/3.0.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server