skip to main content
10.1145/2649387.2660783acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
poster

Identification of protein coding regions in RNA transcripts

Published:20 September 2014Publication History

ABSTRACT

Massive parallel sequencing of RNA transcripts by the next generation technology (RNA-Seq) is a powerful method of generating critically important data for discovery of structure and function of eukaryotic genes. The transcripts may or may not carry protein-coding regions. If protein coding region is present, it should be a continuous (spliced) open reading frame. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions, complete or incomplete, in RNA transcripts assembled from RNA-Seq reads. Important feature of GeneMarkS-T is unsupervised estimation of parameters of the algorithm that makes unnecessary several conventional steps used in the gene prediction protocols, most importantly the manually curated preparation of training sets. We demonstrate that i/the GeneMarkS-T self-training is robust with respect to the presence of errors in assembled transcripts and ii/accuracy of GeneMarkS-T in identification of protein-coding regions and, particularly, in prediction of gene starts compares favorably to other existing methods.

Index Terms

  1. Identification of protein coding regions in RNA transcripts

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
      September 2014
      851 pages
      ISBN:9781450328944
      DOI:10.1145/2649387
      • General Chairs:
      • Pierre Baldi,
      • Wei Wang

      Copyright © 2014 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 September 2014

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate254of885submissions,29%