RNAcompete-S: Combined RNA sequence/structure preferences for RNA binding proteins derived from a single-step in vitro selection
Introduction
RNA-binding proteins (RBPs) are important regulators of gene expression. Through binding to specific recognition sequences in RNA, RBPs control differential RNA processing (alternative splicing, alternative polyadenylation), RNA export and localization, RNA stability, and translation into proteins [1]. Thus, understanding RBP recognition of RNA targets is critical for characterizing their cellular function and physiological role [2], [3].
RBPs can recognize RNA primary sequences, secondary structures, or a combination of sequence and structure [4], [5], [6], [7], [8]. Current methods, however, have not been robustly or systematically used to identify the full primary sequence and secondary structure determinants of RNA binding. In vivo methods such as RIP, CLIP, PAR-CLIP, and iCLIP can be used to analyze RBP binding to RNA inside cells [9], [10], [11], [12], whereas in vitro selection experiments such as SELEX, SEQRS, and RNA Bind-N-Seq (RBNS) can partially characterize both sequence and structures that are bound in vitro [13], [14]. Challenges for in vivo methods include secondary effects such as cooperation or competition with other cellular factors, and the fact that cellular RNA sequences have biased base composition leading to uneven coverage of sequence and structural features. Most in vitro methods depend on constant primer regions on the RNA sequence to simplify sequencing library preparation, and the impact of this primer sequence on the secondary structure ensemble represented by the pool of RNA may be considerable. Importantly, for both in vivo and in vitro methods, there exists no established computational approach to derive easily interpretable sequence/structure models from the large number of sequences resulting from these experiments. To address these shortcomings, we developed RNAcompete-S.
RNAcompete-S uses a single-step competitive binding assay to determine the in vitro sequence specificity of RBPs from random RNA sequences using a high-throughput sequencing readout. RNAcompete-S is a modified version of RNAcompete, which has been applied successfully to hundreds of RNA-binding proteins [15], [16], [17]. In contrast to the microarray-programmed pool of ∼244,000 ∼30–40 nt RNAs used in RNAcompete, which has a limited representation of secondary structures, RNAcompete-S uses a random-sequence RNA pool that facilitates systematic queries of diverse RNA primary sequences and secondary structures. Unlike other in vitro approaches [13], [14], the RNAcompete-S RNAs have no common priming sequences. Another key feature is that the RNAcompete-S data analysis pipeline produces combined Sequence/Structure Models (SSMs) suitable for genome scanning. We show that the RNAcompete-S system detects known RNA sequence and structure specificities of a panel of RBPs, encompassing several proteins with diverse RNA primary and secondary structure preferences. In a striking example, RNAcompete-S detects the large sequence-structure motif recognized by SLBP de novo. RNAcompete-S thus represents a new uniform assay for characterizing diverse RNA binding specificities.
Section snippets
RNAcompete-S methodology
The overall methodology of RNAcompete-S is depicted in Fig. 1A. Briefly, a GST-tagged protein (20 nM) is incubated with a ∼70-fold excess of random 40-mer RNA pool. After an RNA pulldown assay, RBP-selected RNA is recovered and an Illumina library is prepared and analyzed using high-throughput sequencing. No common priming sequences (aside from the T7 ϕ2.5 AGA leader sequence) are present in the query RNA pool.
Results and discussion
We queried seven RBPs using RNAcompete-S — human ELAVL1 (HuR), PTBP1 (PTB), QKI, RBMY, SRSF1 (SF2/ASF), yeast Vts1p, and Drosophila SLBP — selected to encompass a variety of RNA-binding domain types and RNA secondary structure preferences. Because the number of sequences obtained per sample (∼100 million) is much smaller than the total number of RNAs in the pool either before or after selection, few sequences are observed more than once (Fig. S1E), after routine filtering steps (see Section 2.5.1
Conclusion
RNAcompete-S presents, to our knowledge, the first example of a single-step selection method that can accurately derive complex RNA sequence-structure preferences of an RBP. The laboratory component is similar to that used in RBNS and RNAcompete; like RNAcompete (and unlike RBNS), RNAcompete-S determines RNA-binding motifs, and not binding affinities, and thus only a single concentration of protein is required. There are two major differences between RNAcompete-S and RBNS: the absence of primer
Data availability
Sequence reads have been deposited at the Short Read Archive under accession SRX1333324. Processed data, motifs, and code are available at http://hugheslab.ccbr.utoronto.ca/supplementary-data/RNAcompete-S/index.html.
Acknowledgements
Thanks to Ally Yang for technical assistance, and Hilal Kazan and Xiao Li for providing code for processing RNA secondary structures and for helpful discussions. This work was funded by the US National Institutes of Health (1R01HG00570 to T.R.H. and Q.D.M.) and the Canadian Institutes of Health Research (CIHR) (MOP-49451 to T.R.H. and MOP-125894 to Q.D.M. and T.R.H.). T.R.H. is a scholar of the Canadian Institutes For Advanced Research. K.C.H. was supported by an Ontario Graduate Scholarship
References (43)
- et al.
RNA-binding proteins and post-transcriptional gene regulation
FEBS Lett.
(2008) - et al.
RNA-binding proteins in human genetic disease
Trends Genet.
(2008) - et al.
RNA-binding proteins in neurodegeneration: Seq and you shall receive
Trends Neurosci.
(2015) - et al.
Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP
Cell
(2010) - et al.
quantitative assessment of the sequence and structural binding specificity of RNA binding proteins
Mol. Cell
(2014) - et al.
Cooperativity in RNA-protein interactions: global analysis of RNA binding specificity
Cell Rep.
(2012) - et al.
RNAcompete methodology and application to determine sequence preferences of unconventional RNA-binding proteins
Methods
(2017) - et al.
LIN28 binds messenger RNAs at GGAGA motifs and regulates splicing factor abundance
Mol. Cell
(2012) - et al.
Nucleolin is a sequence-specific RNA-binding protein: characterization of targets on pre-ribosomal RNA
J. Mol. Biol.
(1996) - et al.
A combined sequence and structure based method for discovering enriched motifs in RNA from in vivo binding data
Methods
(2017)
RNA binding specificity of hnRNP A1: significance of hnRNP A1 high-affinity binding sites in pre-mRNA splicing
EMBO J.
Minor-groove recognition of double-stranded RNA by the double-stranded RNA-binding domain from the RNA-activated protein kinase PKR
Biochemistry
The RNA-binding SAM domain of Smaug defines a new family of post-transcriptional regulators
Nat. Struct. Biol.
The stem-loop binding protein forms a highly stable and specific complex with the 3' stem-loop of histone mRNAs
RNA
Selection of a subset of mRNAs from combinatorial 3' untranslated region libraries using neuronal RNA-binding protein Hel-N1
Proc. Natl. Acad. Sci. USA
Identifying mRNA subsets in messenger ribonucleoprotein complexes by using cDNA arrays
Proc. Natl. Acad. Sci. USA
HITS-CLIP yields genome-wide insights into brain alternative RNA processing
Nature
iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution
Nat. Struct. Mol. Biol.
Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins
Nat. Biotechnol.
A compendium of RNA-binding motifs for decoding gene regulation
Nature
Identification and purification of a 62,000-dalton protein that binds specifically to the polypyrimidine tract of introns
Genes Dev.
Cited by (30)
RBM45 is an m<sup>6</sup>A-binding protein that affects neuronal differentiation and the splicing of a subset of mRNAs
2022, Cell ReportsCitation Excerpt :These peaks are enriched internally and near the 3′ end of introns and are depleted at splice junctions (Figures S2C and S2D), consistent with a recent report of RBM45 binding to internal intronic regions in the parvovirus (B19) pre-mRNA (Wang et al., 2020). We identified GAC or bipartite GAC sequences as the top enriched motifs in RBM45 peak regions in both mHippoE-2 and HEK293T cells (Figures 2E and S2E), which is consistent with in vitro selection-based studies of RBM45 binding sequences (Dominguez et al., 2018; Cook et al., 2017). GAC is also the predominant minimal consensus sequence for m6A (Meyer et al., 2012; Desrosiers et al. 1974), suggesting that RBM45 binding to cellular RNAs may be driven at least in part by m6A.
Principles and pitfalls of high-throughput analysis of microRNA-binding thermodynamics and kinetics by RNA Bind-n-Seq
2022, Cell Reports MethodsCitation Excerpt :This design simplifies library preparation, avoids biases that can result from RNA ligation, and ensures that any RNA carried over from protein purification will not contaminate the sequenced library. However, fixed primer-binding sequences can affect the secondary structure ensemble of the RNA pool (Cook et al., 2017) and bias interpretation of RBNS assays with RBPs that recognize structured elements. miRISC binds single-stranded sequence motifs; therefore, secondary structures will have little effect on miRISC binding unless they occlude a site.
RNA-centric approaches to study RNA-protein interactions in vitro and in silico
2020, MethodsCitation Excerpt :The recovered RNA is then Cy5 labelled and hybridized into a microarray. A further improvement of the technique, called RNA-compete-S, makes use of deep sequencing instead of the microarray hybridization [26]. The use of a microarray free technology allows the design of a more variated library which in turn facilitates systematic queries of diverse RNA primary sequences and secondary structures [26].
High throughput approaches to study RNA-protein interactions in vitro
2020, MethodsCitation Excerpt :The method is analogous to measurements of inherent DNA binding preferences by transcription factors (TFs) [35,36]. For RNACompete, an affinity-tagged RBP is incubated with a pool of diverse RNA substrates, that were either sequence-optimized to minimize specific secondary structure and cross hybridization of RNA species (RNACompete) or contained 40 randomized nucleotides (RNACompete-S) [16,33,34,37] (Fig. 2a). In the binding reaction, the cumulative concentration of all RNAs exceeds the concentration of the protein, and the RNA species generally compete with each other for protein binding.
DM-RPIs: Predicting ncRNA-protein interactions using stacked ensembling strategy
2019, Computational Biology and ChemistryMotif models for RNA-binding proteins
2018, Current Opinion in Structural BiologyCitation Excerpt :The most expressive models are those that assign different free energies to all RNA oligos up to a given length k (aka k-mers) (RCK, kmer-SVM) [28,29]. The k-mer models have an unwieldy number of parameters for larger values of k [30•,31]. The number of parameters can be reduced through the use of gapped k-mer models which permit representation of larger, fixed-sized binding sites, or through kernels used with max-pooling in convolutional neural networks (CNNs), which perform k-mer selection during training (gkm-SVM, DeepBind) [32,33,34••,35].
- 1
Current address: Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.