Improved definition of the mouse transcriptome via targeted RNA sequencing

  1. Anton J. Enright1
  1. 1EMBL, European Bioinformatics Institute, Cambridge, CB10 1SD, United Kingdom;
  2. 2Garvan Institute of Medical Research, Sydney, New South Wales 2010, Australia;
  3. 3MRC Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom;
  4. 4St Vincent's Clinical School, UNSW Australia, Sydney, New South Wales 2052, Australia;
  5. 5Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4072, Australia;
  6. 6Comparative Bioinformatics, Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain
  1. Corresponding author: aje{at}ebi.ac.uk
  • 7 Present address: Hub de Bioinformatique et Biostatistique, Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI), Institut Pasteur, 75724 Paris Cedex 15, France

Abstract

Targeted RNA sequencing (CaptureSeq) uses oligonucleotide probes to capture RNAs for sequencing, providing enriched read coverage, accurate measurement of gene expression, and quantitative expression data. We applied CaptureSeq to refine transcript annotations in the current murine GRCm38 assembly. More than 23,000 regions corresponding to putative or annotated long noncoding RNAs (lncRNAs) and 154,281 known splicing junction sites were selected for targeted sequencing across five mouse tissues and three brain subregions. The results illustrate that the mouse transcriptome is considerably more complex than previously thought. We assemble more complete transcript isoforms than GENCODE, expand transcript boundaries, and connect interspersed islands of mapped reads. We describe a novel filtering pipeline that identifies previously unannotated but high-quality transcript isoforms. In this set, 911 GENCODE neighboring genes are condensed into 400 expanded gene models. Additionally, 594 GENCODE lncRNAs acquire an open reading frame (ORF) when their structure is extended with CaptureSeq. Finally, we validate our observations using current FANTOM and Mouse ENCODE resources.

Footnotes

  • Received September 17, 2015.
  • Accepted February 23, 2016.

This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

| Table of Contents
OPEN ACCESS ARTICLE

Preprint Server