Genome-wide discovery of human splicing branchpoints

  1. John S. Mattick1,2
  1. 1Garvan Institute of Medical Research, Sydney, New South Wales 2010, Australia;
  2. 2St. Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, New South Wales 2052, Australia;
  3. 3MRC Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom;
  4. 4Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Queensland 4072, Australia;
  5. 5Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4072, Australia;
  6. 6Illumina, Inc., San Diego, California 92122, USA;
  7. 7School of Medicine and Health Services, Department of Integrated Systems Biology and Department of Pediatrics, George Washington University, Washington DC 20037, USA
  1. Corresponding author: j.mattick{at}garvan.org.au
  1. 8 These authors contributed equally to this work.

Abstract

During the splicing reaction, the 5′ intron end is joined to the branchpoint nucleotide, selecting the next exon to incorporate into the mature RNA and forming an intron lariat, which is excised. Despite a critical role in gene splicing, the locations and features of human splicing branchpoints are largely unknown. We use exoribonuclease digestion and targeted RNA-sequencing to enrich for sequences that traverse the lariat junction and, by split and inverted alignment, reveal the branchpoint. We identify 59,359 high-confidence human branchpoints in >10,000 genes, providing a first map of splicing branchpoints in the human genome. Branchpoints are predominantly adenosine, highly conserved, and closely distributed to the 3′ splice site. Analysis of human branchpoints reveals numerous novel features, including distinct features of branchpoints for alternatively spliced exons and a family of conserved sequence motifs overlapping branchpoints we term B-boxes, which exhibit maximal nucleotide diversity while maintaining interactions with the keto-rich U2 snRNA. Different B-box motifs exhibit divergent usage in vertebrate lineages and associate with other splicing elements and distinct intron–exon architectures, suggesting integration within a broader regulatory splicing code. Lastly, although branchpoints are refractory to common mutational processes and genetic variation, mutations occurring at branchpoint nucleotides are enriched for disease associations.

Footnotes

  • Received August 12, 2014.
  • Accepted November 20, 2014.

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

| Table of Contents

Preprint Server