RNAcompete methodology and application to determine sequence preferences of unconventional RNA-binding proteins
Introduction
Hundreds of thousands of predicted RBPs are encoded by the 742 sequenced eukaryotic genomes. The vast majority of these RBPs, including those from well-studied organisms, have unknown RNA-binding preferences. For example, in human, recent evidence indicates that there are approximately 1200–1500 human proteins that associate with RNA, representing approximately 6–8% of the annotated human proteome [1], [2]; however, less than half of human RBPs that contain canonical RBDs (see below) have an established RNA-binding motif [3], [4]. This poor characterization of RBP sequence-binding preferences presents a significant barrier in the analysis of post-transcriptional gene regulation.
Most known sequence-specific RBPs contain canonical RBDs that mediate RNA-binding through protein-RNA interactions [5]. The most common sequence-specific eukaryotic RBDs are the RNA recognition motif (RRM) (∼246 in human), the Cys-Cys-Cys-His (CCCH-zf) type zinc finger domain (∼60 in human), and the HNRNP K homology (KH) domain (∼38 in human) [6]. Among these, the ∼90 amino acid RRM domain is the most extensively studied [5]. Here, RNA recognition typically occurs on the β-sheet surface and is often mediated by three exposed aromatic residues [7]. Eukaryotic KH domains span ∼70 amino acids and contain an RNA-binding cleft formed by a conserved GXXG motif flanked by two α-helices, a variable loop, and a β-strand [8]. Lastly, CCCH-zf domains are 12–30 amino acids long and bind RNA through stacking interactions, and hydrogen bonding between the protein backbone and Watson-Crick edges of bases [9]. Individual RBDs tend to bind RNA in the micromolar range [5]; however RNA-binding affinity and specificity can be significantly increased by combinations of RBDs or by additional RNA contacting residues located in regions outside of the RBD(s) [5].
Not all sequence-specific RBPs contain a canonical RBD. For example, several well-characterized proteins, including the pre-mRNA 3′ cleavage and polyadenylation specificity factor 5, NUDT21 (Nudix hydrolase domain and N-terminal region: [10]), histone stem-loop binding protein, SLBP (SLBP domain: [11], [12]), and Drosophila brain tumor protein, BRAT (NHL domain: [13], [14]), use unconventional RBDs—indicated in parentheses—for sequence-specific RNA recognition. In a genome-wide context, proteomic analyses of proteins UV cross-linked to mRNA in human HeLa, HEK293, and HuH7 cell lines have identified over 800 that associate with mRNA, but do not contain canonical RBDs [2], [15], [16]. We refer to these potentially sequence-specific RBPs as unconventional RBPs (ucRBPs). A similar analysis in Drosophila identified ∼300 ucRBPs, one of which (CG3800) was shown by CLIP-seq to bind RNA containing specific 5-mer sequences enriched for G and A residues [17]. It was not shown, however, whether CG3800 binds to these sequences autonomously.
RNAcompete is an in vitro method (see Fig. 1) that we developed [18] and have applied to hundreds of RBPs [3]. It recapitulates RNA-binding motifs for a diverse set of well-studied RBPs previously identified in vitro (e.g. SELEX experiments: Systematic Evolution of Ligands by Exponential Enrichment [19]) and in vivo (e.g. CLIP experiments: UV Cross-Linking and Immuno-Precipitation [20]). In RNAcompete experiments, purified epitope-tagged RBPs select RNA sequences from a designed (non-randomized) RNA pool. Bound RNAs are identified using microarray hybridizations and analyzed computationally to determine RBP-specific 7-mer RNA-binding profiles. Several in vitro methods have been described since the inception of RNAcompete, including RNA Bind-n-Seq (RNBS) [21], SEQRS [22], RNA-MaP [23], HiTS-RAP [24], and RNA-MITOMI [25]. Of these, only RNA Bind-n-seq has been used in a large-scale study (unpublished ENCODE online data). The RNAcompete methodology is attractive for several reasons including: (i) antibodies are not required; (ii) iterative selection or library preparation is not necessary; (iii) RBP-specific optimizations are not required; (iv) is relatively inexpensive at scale; (v) is amenable to large-scale studies [3]; and, (vi) has an established and validated uniform computational analysis pipeline.
In this report, we present a detailed protocol for the experimental and computational components of the RNAcompete system. We also highlight the utility of RNAcompete via analysis of the RNA-binding preferences of two human ucRBPs, NUDT21 and CNBP
Section snippets
Materials
Refer to Tables at the end of manuscript.
RNAcompete protocol
We have organized the procedures comprising the RNAcompete pipeline as follows: custom microarray design and synthesis (Section 3.1); DNA pool generation (Section 3.2); RNA pool generation (Section 3.3); cloning RBPs into E. coli expression vectors (Section 3.4); purification of GST-tagged RBPs (Section 3.5); RNA pulldown assay (Section 3.6); microarray analysis (Section 3.7); and, RNAcompete data analysis (Section 3.8). The following sections detail the experimental and computational methods
Results/discussion
We used RNAcompete to examine the sequence preferences of human ucRBPs NUDT21 and CNBP. Both proteins were recently identified in proteomic mRNA-binding screens [15], [16], lack canonical RBDs and have other existing RNA-binding data to compare with [10], [17]. NUDT21 is 227 amino acid protein that binds UGUA sequences typically located 40–50 nucleotides upstream of cleavage and polyadenylation signals in pre-mRNA and is an essential component of the 3′ pre-mRNA cleavage and polyadenylation
Conclusions
In this report, we present a detailed description of the experimental and computational methodologies encompassed by the RNAcompete system and show the value of RNAcompete for the analysis of ucRBPs. Given the expanding repertoire of both conventional RBPs and ucRBPs; large-scale characterization of RNA-binding specificity is critical. RNAcompete’s simplicity, scalability, and labour/cost-effectiveness make it an important tool for analyzing the RNA-binding preferences of proteins and
Acknowledgements
We would like to thank Hans-Hermann Wessels and Markus Landthaler for providing the CG3800 CLIP-seq dataset alignment files. This work was supported by the National Institutes of Health (grant number R01HG008613) to TRH, QDM, Jack Greenblatt, and Ben Blencowe, and by the Canadian Institute for Health Research (MOP-125894) to QDM and TRH. KCH was partially supported by an Ontario Graduate Scholarship and a CIHR Frederick Banting and Charles Best Canada Graduate Scholarship.
References (48)
- et al.
RRM-RNA recognition: NMR or crystallography ellipsis and new findings
Curr. Opin. Struct. Biol.
(2013) - et al.
The crystal structure of the NHL domain in complex with RNA reveals the molecular basis of drosophila brain-tumor-mediated gene regulation
Cell Rep.
(2015) - et al.
Insights into RNA biology from an atlas of mammalian mRNA-binding proteins
Cell
(2012) - et al.
The mRNA-bound proteome and its global occupancy profile on protein-coding transcripts
Mol. Cell
(2012) - et al.
Quantitative assessment of the sequence and structural binding specificity of RNA binding proteins
Mol. Cell
(2014) - et al.
Cooperativity in RNA-protein interactions: global analysis of RNA binding specificity
Cell Rep.
(2012) - et al.
ICLIP: protein-RNA interactions at nucleotide resolution
Methods
(2014) Mouse genomics: making sense of the sequence
Curr. Biol.
(2001)- et al.
Dissecting CNBP, a zinc-finger protein required for neural crest development, in its structural and functional domains
J. Mol. Biol.
(2008) - et al.
Eosinophilic myositis as first manifestation in a patient with type 2 myotonic dystrophy CCTG expansion mutation and rheumatoid arthritis
Neuromuscular Disord
(2015)
A mechanism for the regulation of pre-mRNA 3' processing by human cleavage factor Im
Mol. Cell
RNA recognition: towards identifying determinants of specificity
Trends Biochem. Sci.
A census of human RNA-binding proteins
Nat. Rev. Genet.
The RNA-binding proteomes from yeast to man harbour conserved enigmRBPs
Nat. Commun.
A compendium of RNA-binding motifs for decoding gene regulation
Nature
RBPDB: a database of RNA-binding specificities
Nucleic Acids Res.
RNA-binding proteins: modular design for efficient function
Nat. Rev. Mol. Cell Biol.
High-throughput characterization of protein-RNA interactions
Briefings Funct. Genomics
Sequence-specific binding of single-stranded RNA: is there a code for recognition?
Nucleic Acids Res.
Recognition of distinct RNA motifs by the clustered CCCH zinc fingers of neuronal protein Unkempt
Nat. Struct. Mol. Biol.
Structural basis of UGUA recognition by the Nudix protein CFI(m)25 and implications for a regulatory role in mRNA 3′ processing
Proc. Natl. Acad. Sci. U.S.A.
The protein that binds the 3′ end of histone mRNA: a novel RNA-binding protein required for histone pre-mRNA processing
Genes Dev.
Structure of histone mRNA stem-loop, human stem-loop binding protein, and 3'hExo ternary complex
Science
Brain tumor is a sequence-specific RNA-binding protein that directs maternal mRNA clearance during the Drosophila maternal-to-zygotic transition
Genome Biol.
Cited by (46)
RNA-Seq reveals that overexpression of TcUBP1 switches the gene expression pattern toward that of the infective form of Trypanosoma cruzi
2023, Journal of Biological ChemistryRegulation of alternative polyadenylation by the C2H2-zinc-finger protein Sp1
2022, Molecular CellCitation Excerpt :One-sided Z-scores were calculated for the motifs as described previously (Ray et al., 2013). Purification of GST-tagged recombinant proteins was performed as previously described (Ray et al., 2017). Constructs for GST-tagged recombinant protein expression (GST-full length SP1, GST-N-SP1, GST-C-SP1, or GST-CFIm25) were expressed in E.coli C41 bacteria (Lucigen Corporation catalogue number 60442-1) and recombinant protein expression was induced with 0.5mM IPTG (Sigma catalogue number 367-93-1).
What's new about CNBP? Divergent functions and activities for a conserved nucleic acid binding protein
2021, Biochimica et Biophysica Acta - General SubjectsGlobal Approaches in Studying RNA-Binding Protein Interaction Networks
2020, Trends in Biochemical SciencesCitation Excerpt :A central question in these studies is determining the binding specificity and the set of targets of a given RBP. Building on earlier methods, in vitro binding assays coupled to high-throughput sequencing or microarrays have uncovered the binding preferences for many RBPs [31–33] (Figure 1B). In vivo targeting has been initially probed by RBP immunoprecipitation (IP) followed by microarray or sequencing (RIP-chip or RIP-seq), thus identifying whole transcripts that are bound by RBPs [34–36].
High throughput approaches to study RNA-protein interactions in vitro
2020, MethodsCitation Excerpt :Pairwise Interaction Matrix: A matrix of weights for each nucleotide combination at two given positions in an RNA sequence with n > 2 positions [47,49]. RNACompete was the first approach that simultaneously measured binding of large numbers of RNA species to RBPs [16,33,34]. The method is analogous to measurements of inherent DNA binding preferences by transcription factors (TFs) [35,36].