poster

Identification of protein coding regions in RNA transcripts

Authors:
Shiyuyun Tang

Georgia Institute of Technology, Atlanta, GA

Georgia Institute of Technology, Atlanta, GA
View Profile

,
Alexandre Lomsadze

Joint Georgia Tech and Emory, Atlanta, GA

Joint Georgia Tech and Emory, Atlanta, GA
View Profile

,
Mark Borodovsky

Joint Georgia Tech and Emory, Atlanta, GA

Joint Georgia Tech and Emory, Atlanta, GA
View Profile

BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health InformaticsSeptember 2014Pages 588https://doi.org/10.1145/2649387.2660783

Published:20 September 2014Publication History

BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Pages 588

ABSTRACT

Massive parallel sequencing of RNA transcripts by the next generation technology (RNA-Seq) is a powerful method of generating critically important data for discovery of structure and function of eukaryotic genes. The transcripts may or may not carry protein-coding regions. If protein coding region is present, it should be a continuous (spliced) open reading frame. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions, complete or incomplete, in RNA transcripts assembled from RNA-Seq reads. Important feature of GeneMarkS-T is unsupervised estimation of parameters of the algorithm that makes unnecessary several conventional steps used in the gene prediction protocols, most importantly the manually curated preparation of training sets. We demonstrate that i/the GeneMarkS-T self-training is robust with respect to the presence of errors in assembled transcripts and ii/accuracy of GeneMarkS-T in identification of protein-coding regions and, particularly, in prediction of gene starts compares favorably to other existing methods.

Index Terms

Identification of protein coding regions in RNA transcripts
1. Computing methodologies
  1. Machine learning

Recommendations

Computational discovery of human coding and non-coding transcripts with conserved splice sites

Motivation: Long non-coding RNAs (lncRNAs) resemble protein-coding mRNAs but do not encode proteins. Most lncRNAs are under lower sequence constraints than protein-coding genes and lack conserved secondary structures, making it hard to predict them ...
Read More
Computational identification of evolutionarily conserved exons
RECOMB '04: Proceedings of the eighth annual international conference on Research in computational molecular biology

Phylogenetic hidden Markov models (phylo-HMMs) have recently been proposed as a means for addressing a multi-species version of the ab initio gene prediction problem. These models allow sequence divergence, a phylogeny, patterns of substitution, and ...
Read More
A new approach for gene prediction using comparative sequence analysis
SAC '05: Proceedings of the 2005 ACM symposium on Applied computing

The availability of large fragments of genomic DNA makes it possible to apply comparative genomics for identification of protein-coding regions. In this work, a comparative analysis is conducted on homologous genomic sequences of organisms with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
September 2014
851 pages
ISBN:9781450328944
DOI:10.1145/2649387
General Chairs:
Pierre Baldi
University of California, Irvine
,
Wei Wang
University of California, Los Angeles
Copyright © 2014 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 September 2014
Check for updates
Author Tags
RNA-seq reads
gene prediction
hidden markov models
transcript assembly
unsupervised training
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate254of885submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 0
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

Identification of protein coding regions in RNA transcripts

BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

ABSTRACT

Cited By

Index Terms

Recommendations

Computational discovery of human coding and non-coding transcripts with conserved splice sites

Computational identification of evolutionarily conserved exons

A new approach for gene prediction using comparative sequence analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

Identification of protein coding regions in RNA transcripts

BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

ABSTRACT

Cited By

Index Terms

Recommendations

Computational discovery of human coding and non-coding transcripts with conserved splice sites

Computational identification of evolutionarily conserved exons

A new approach for gene prediction using comparative sequence analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

Digital Edition

Share this Publication link

Share on Social Media