Elsevier

Methods

Volumes 118–119, 15 April 2017, Pages 3-15
Methods

RNAcompete methodology and application to determine sequence preferences of unconventional RNA-binding proteins

https://doi.org/10.1016/j.ymeth.2016.12.003Get rights and content

Highlights

  • Detailed RNAcompete method for analyzing protein RNA-binding preferences.

  • RNAcompete utilized to analyze unconventional RNA-binding proteins, NUDT21 and CNBP.

  • RNAcompete-based motifs are consistent with previous experimental data.

Abstract

RNA-binding proteins (RBPs) participate in diverse cellular processes and have important roles in human development and disease. The human genome, and that of many other eukaryotes, encodes hundreds of RBPs that contain canonical sequence-specific RNA-binding domains (RBDs) as well as numerous other unconventional RNA binding proteins (ucRBPs). ucRBPs physically associate with RNA but lack common RBDs. The degree to which these proteins bind RNA, in a sequence specific manner, is unknown. Here, we provide a detailed description of both the laboratory and data processing methods for RNAcompete, a method we have previously used to analyze the RNA binding preferences of hundreds of RBD-containing RBPs, from diverse eukaryotes. We also determine the RNA-binding preferences for two human ucRBPs, NUDT21 and CNBP, and use this analysis to exemplify the RNAcompete pipeline. The results of our RNAcompete experiments are consistent with independent RNA-binding data for these proteins and demonstrate the utility of RNAcompete for analyzing the growing repertoire of ucRBPs.

Introduction

Hundreds of thousands of predicted RBPs are encoded by the 742 sequenced eukaryotic genomes. The vast majority of these RBPs, including those from well-studied organisms, have unknown RNA-binding preferences. For example, in human, recent evidence indicates that there are approximately 1200–1500 human proteins that associate with RNA, representing approximately 6–8% of the annotated human proteome [1], [2]; however, less than half of human RBPs that contain canonical RBDs (see below) have an established RNA-binding motif [3], [4]. This poor characterization of RBP sequence-binding preferences presents a significant barrier in the analysis of post-transcriptional gene regulation.

Most known sequence-specific RBPs contain canonical RBDs that mediate RNA-binding through protein-RNA interactions [5]. The most common sequence-specific eukaryotic RBDs are the RNA recognition motif (RRM) (∼246 in human), the Cys-Cys-Cys-His (CCCH-zf) type zinc finger domain (∼60 in human), and the HNRNP K homology (KH) domain (∼38 in human) [6]. Among these, the ∼90 amino acid RRM domain is the most extensively studied [5]. Here, RNA recognition typically occurs on the β-sheet surface and is often mediated by three exposed aromatic residues [7]. Eukaryotic KH domains span ∼70 amino acids and contain an RNA-binding cleft formed by a conserved GXXG motif flanked by two α-helices, a variable loop, and a β-strand [8]. Lastly, CCCH-zf domains are 12–30 amino acids long and bind RNA through stacking interactions, and hydrogen bonding between the protein backbone and Watson-Crick edges of bases [9]. Individual RBDs tend to bind RNA in the micromolar range [5]; however RNA-binding affinity and specificity can be significantly increased by combinations of RBDs or by additional RNA contacting residues located in regions outside of the RBD(s) [5].

Not all sequence-specific RBPs contain a canonical RBD. For example, several well-characterized proteins, including the pre-mRNA 3′ cleavage and polyadenylation specificity factor 5, NUDT21 (Nudix hydrolase domain and N-terminal region: [10]), histone stem-loop binding protein, SLBP (SLBP domain: [11], [12]), and Drosophila brain tumor protein, BRAT (NHL domain: [13], [14]), use unconventional RBDs—indicated in parentheses—for sequence-specific RNA recognition. In a genome-wide context, proteomic analyses of proteins UV cross-linked to mRNA in human HeLa, HEK293, and HuH7 cell lines have identified over 800 that associate with mRNA, but do not contain canonical RBDs [2], [15], [16]. We refer to these potentially sequence-specific RBPs as unconventional RBPs (ucRBPs). A similar analysis in Drosophila identified ∼300 ucRBPs, one of which (CG3800) was shown by CLIP-seq to bind RNA containing specific 5-mer sequences enriched for G and A residues [17]. It was not shown, however, whether CG3800 binds to these sequences autonomously.

RNAcompete is an in vitro method (see Fig. 1) that we developed [18] and have applied to hundreds of RBPs [3]. It recapitulates RNA-binding motifs for a diverse set of well-studied RBPs previously identified in vitro (e.g. SELEX experiments: Systematic Evolution of Ligands by Exponential Enrichment [19]) and in vivo (e.g. CLIP experiments: UV Cross-Linking and Immuno-Precipitation [20]). In RNAcompete experiments, purified epitope-tagged RBPs select RNA sequences from a designed (non-randomized) RNA pool. Bound RNAs are identified using microarray hybridizations and analyzed computationally to determine RBP-specific 7-mer RNA-binding profiles. Several in vitro methods have been described since the inception of RNAcompete, including RNA Bind-n-Seq (RNBS) [21], SEQRS [22], RNA-MaP [23], HiTS-RAP [24], and RNA-MITOMI [25]. Of these, only RNA Bind-n-seq has been used in a large-scale study (unpublished ENCODE online data). The RNAcompete methodology is attractive for several reasons including: (i) antibodies are not required; (ii) iterative selection or library preparation is not necessary; (iii) RBP-specific optimizations are not required; (iv) is relatively inexpensive at scale; (v) is amenable to large-scale studies [3]; and, (vi) has an established and validated uniform computational analysis pipeline.

In this report, we present a detailed protocol for the experimental and computational components of the RNAcompete system. We also highlight the utility of RNAcompete via analysis of the RNA-binding preferences of two human ucRBPs, NUDT21 and CNBP

Section snippets

Materials

Refer to Tables at the end of manuscript.

RNAcompete protocol

We have organized the procedures comprising the RNAcompete pipeline as follows: custom microarray design and synthesis (Section 3.1); DNA pool generation (Section 3.2); RNA pool generation (Section 3.3); cloning RBPs into E. coli expression vectors (Section 3.4); purification of GST-tagged RBPs (Section 3.5); RNA pulldown assay (Section 3.6); microarray analysis (Section 3.7); and, RNAcompete data analysis (Section 3.8). The following sections detail the experimental and computational methods

Results/discussion

We used RNAcompete to examine the sequence preferences of human ucRBPs NUDT21 and CNBP. Both proteins were recently identified in proteomic mRNA-binding screens [15], [16], lack canonical RBDs and have other existing RNA-binding data to compare with [10], [17]. NUDT21 is 227 amino acid protein that binds UGUA sequences typically located 40–50 nucleotides upstream of cleavage and polyadenylation signals in pre-mRNA and is an essential component of the 3′ pre-mRNA cleavage and polyadenylation

Conclusions

In this report, we present a detailed description of the experimental and computational methodologies encompassed by the RNAcompete system and show the value of RNAcompete for the analysis of ucRBPs. Given the expanding repertoire of both conventional RBPs and ucRBPs; large-scale characterization of RNA-binding specificity is critical. RNAcompete’s simplicity, scalability, and labour/cost-effectiveness make it an important tool for analyzing the RNA-binding preferences of proteins and

Acknowledgements

We would like to thank Hans-Hermann Wessels and Markus Landthaler for providing the CG3800 CLIP-seq dataset alignment files. This work was supported by the National Institutes of Health (grant number R01HG008613) to TRH, QDM, Jack Greenblatt, and Ben Blencowe, and by the Canadian Institute for Health Research (MOP-125894) to QDM and TRH. KCH was partially supported by an Ontario Graduate Scholarship and a CIHR Frederick Banting and Charles Best Canada Graduate Scholarship.

References (48)

  • K.M. Brown et al.

    A mechanism for the regulation of pre-mRNA 3' processing by human cleavage factor Im

    Mol. Cell

    (2003)
  • D.J. Kenan et al.

    RNA recognition: towards identifying determinants of specificity

    Trends Biochem. Sci.

    (1991)
  • S. Gerstberger et al.

    A census of human RNA-binding proteins

    Nat. Rev. Genet.

    (2014)
  • B.M. Beckmann et al.

    The RNA-binding proteomes from yeast to man harbour conserved enigmRBPs

    Nat. Commun.

    (2015)
  • D. Ray et al.

    A compendium of RNA-binding motifs for decoding gene regulation

    Nature

    (2013)
  • K.B. Cook et al.

    RBPDB: a database of RNA-binding specificities

    Nucleic Acids Res.

    (2011)
  • B.M. Lunde et al.

    RNA-binding proteins: modular design for efficient function

    Nat. Rev. Mol. Cell Biol.

    (2007)
  • K.B. Cook et al.

    High-throughput characterization of protein-RNA interactions

    Briefings Funct. Genomics

    (2015)
  • S.D. Auweter et al.

    Sequence-specific binding of single-stranded RNA: is there a code for recognition?

    Nucleic Acids Res.

    (2006)
  • J. Murn et al.

    Recognition of distinct RNA motifs by the clustered CCCH zinc fingers of neuronal protein Unkempt

    Nat. Struct. Mol. Biol.

    (2016)
  • Q. Yang et al.

    Structural basis of UGUA recognition by the Nudix protein CFI(m)25 and implications for a regulatory role in mRNA 3′ processing

    Proc. Natl. Acad. Sci. U.S.A.

    (2010)
  • Z.F. Wang et al.

    The protein that binds the 3′ end of histone mRNA: a novel RNA-binding protein required for histone pre-mRNA processing

    Genes Dev.

    (1996)
  • D. Tan et al.

    Structure of histone mRNA stem-loop, human stem-loop binding protein, and 3'hExo ternary complex

    Science

    (2013)
  • J.D. Laver et al.

    Brain tumor is a sequence-specific RNA-binding protein that directs maternal mRNA clearance during the Drosophila maternal-to-zygotic transition

    Genome Biol.

    (2015)
  • Cited by (46)

    • Regulation of alternative polyadenylation by the C2H2-zinc-finger protein Sp1

      2022, Molecular Cell
      Citation Excerpt :

      One-sided Z-scores were calculated for the motifs as described previously (Ray et al., 2013). Purification of GST-tagged recombinant proteins was performed as previously described (Ray et al., 2017). Constructs for GST-tagged recombinant protein expression (GST-full length SP1, GST-N-SP1, GST-C-SP1, or GST-CFIm25) were expressed in E.coli C41 bacteria (Lucigen Corporation catalogue number 60442-1) and recombinant protein expression was induced with 0.5mM IPTG (Sigma catalogue number 367-93-1).

    • Global Approaches in Studying RNA-Binding Protein Interaction Networks

      2020, Trends in Biochemical Sciences
      Citation Excerpt :

      A central question in these studies is determining the binding specificity and the set of targets of a given RBP. Building on earlier methods, in vitro binding assays coupled to high-throughput sequencing or microarrays have uncovered the binding preferences for many RBPs [31–33] (Figure 1B). In vivo targeting has been initially probed by RBP immunoprecipitation (IP) followed by microarray or sequencing (RIP-chip or RIP-seq), thus identifying whole transcripts that are bound by RBPs [34–36].

    • High throughput approaches to study RNA-protein interactions in vitro

      2020, Methods
      Citation Excerpt :

      Pairwise Interaction Matrix: A matrix of weights for each nucleotide combination at two given positions in an RNA sequence with n > 2 positions [47,49]. RNACompete was the first approach that simultaneously measured binding of large numbers of RNA species to RBPs [16,33,34]. The method is analogous to measurements of inherent DNA binding preferences by transcription factors (TFs) [35,36].

    View all citing articles on Scopus
    View full text