A platform for curated products from novel open reading frames prompts reinterpretation of disease variants

  1. Sudhakaran Prabakaran1,2,4
  1. 1Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom;
  2. 2Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra 411008, India;
  3. 3Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff CF14 4XN, United Kingdom;
  4. 4St Edmund's College, University of Cambridge, Cambridge CB3 0BN, United Kingdom
  1. 5 These authors contributed equally to this work.

  • Corresponding author: sp339{at}cam.ac.uk
  • Abstract

    Recent evidence from proteomics and deep massively parallel sequencing studies have revealed that eukaryotic genomes contain substantial numbers of as-yet-uncharacterized open reading frames (ORFs). We define these uncharacterized ORFs as novel ORFs (nORFs). nORFs in humans are mostly under 100 codons and are found in diverse regions of the genome, including in long noncoding RNAs, pseudogenes, 3′ UTRs, 5′ UTRs, and alternative reading frames of canonical protein coding exons. There is therefore a pressing need to evaluate the potential functional importance of these unannotated transcripts and proteins in biological pathways and human disease on a larger scale, rather than one at a time. In this study, we outline the creation of a valuable nORFs data set with experimental evidence of translation for the community, use measures of heritability and selection that reveal signals for functional importance, and show the potential implications for functional interpretation of genetic variants in nORFs. Our results indicate that some variants that were previously classified as being benign or of uncertain significance may have to be reinterpreted.

    Footnotes

    • Received March 5, 2020.
    • Accepted August 26, 2020.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents

    Preprint Server