Elsevier

Methods

Volume 126, 15 August 2017, Pages 18-28
Methods

RNAcompete-S: Combined RNA sequence/structure preferences for RNA binding proteins derived from a single-step in vitro selection

https://doi.org/10.1016/j.ymeth.2017.06.024Get rights and content

Highlights

  • Developed RNAcompete-S for analyzing RBP RNA sequence and structure preferences.

  • Developed SSM format for annotating RNA sequence and structure motifs.

  • Identified sequence/structure motifs for ELAVL1, PTBP1, QKI, Vts1p, RBMY, and SRSF1.

  • Defined SSM for SLBP that predicts binding to replication dependent histone mRNAs.

Abstract

RNA-binding proteins recognize RNA sequences and structures, but there is currently no systematic and accurate method to derive large (>12 base) motifs de novo that reflect a combination of intrinsic preference to both sequence and structure. To address this absence, we introduce RNAcompete-S, which couples a single-step competitive binding reaction with an excess of random RNA 40-mers to a custom computational pipeline for interrogation of the bound RNA sequences and derivation of SSMs (Sequence and Structure Models). RNAcompete-S confirms that HuR, QKI, and SRSF1 prefer binding sites that are single stranded, and recapitulates known 8–10 bp sequence and structure preferences for Vts1p and RBMY. We also derive an 18-base long SSM for Drosophila SLBP, which to our knowledge has not been previously determined by selections from pure random sequence, and accurately discriminates human replication-dependent histone mRNAs. Thus, RNAcompete-S enables accurate identification of large, intrinsic sequence-structure specificities with a uniform assay.

Introduction

RNA-binding proteins (RBPs) are important regulators of gene expression. Through binding to specific recognition sequences in RNA, RBPs control differential RNA processing (alternative splicing, alternative polyadenylation), RNA export and localization, RNA stability, and translation into proteins [1]. Thus, understanding RBP recognition of RNA targets is critical for characterizing their cellular function and physiological role [2], [3].

RBPs can recognize RNA primary sequences, secondary structures, or a combination of sequence and structure [4], [5], [6], [7], [8]. Current methods, however, have not been robustly or systematically used to identify the full primary sequence and secondary structure determinants of RNA binding. In vivo methods such as RIP, CLIP, PAR-CLIP, and iCLIP can be used to analyze RBP binding to RNA inside cells [9], [10], [11], [12], whereas in vitro selection experiments such as SELEX, SEQRS, and RNA Bind-N-Seq (RBNS) can partially characterize both sequence and structures that are bound in vitro [13], [14]. Challenges for in vivo methods include secondary effects such as cooperation or competition with other cellular factors, and the fact that cellular RNA sequences have biased base composition leading to uneven coverage of sequence and structural features. Most in vitro methods depend on constant primer regions on the RNA sequence to simplify sequencing library preparation, and the impact of this primer sequence on the secondary structure ensemble represented by the pool of RNA may be considerable. Importantly, for both in vivo and in vitro methods, there exists no established computational approach to derive easily interpretable sequence/structure models from the large number of sequences resulting from these experiments. To address these shortcomings, we developed RNAcompete-S.

RNAcompete-S uses a single-step competitive binding assay to determine the in vitro sequence specificity of RBPs from random RNA sequences using a high-throughput sequencing readout. RNAcompete-S is a modified version of RNAcompete, which has been applied successfully to hundreds of RNA-binding proteins [15], [16], [17]. In contrast to the microarray-programmed pool of ∼244,000 ∼30–40 nt RNAs used in RNAcompete, which has a limited representation of secondary structures, RNAcompete-S uses a random-sequence RNA pool that facilitates systematic queries of diverse RNA primary sequences and secondary structures. Unlike other in vitro approaches [13], [14], the RNAcompete-S RNAs have no common priming sequences. Another key feature is that the RNAcompete-S data analysis pipeline produces combined Sequence/Structure Models (SSMs) suitable for genome scanning. We show that the RNAcompete-S system detects known RNA sequence and structure specificities of a panel of RBPs, encompassing several proteins with diverse RNA primary and secondary structure preferences. In a striking example, RNAcompete-S detects the large sequence-structure motif recognized by SLBP de novo. RNAcompete-S thus represents a new uniform assay for characterizing diverse RNA binding specificities.

Section snippets

RNAcompete-S methodology

The overall methodology of RNAcompete-S is depicted in Fig. 1A. Briefly, a GST-tagged protein (20 nM) is incubated with a ∼70-fold excess of random 40-mer RNA pool. After an RNA pulldown assay, RBP-selected RNA is recovered and an Illumina library is prepared and analyzed using high-throughput sequencing. No common priming sequences (aside from the T7 ϕ2.5 AGA leader sequence) are present in the query RNA pool.

Results and discussion

We queried seven RBPs using RNAcompete-S — human ELAVL1 (HuR), PTBP1 (PTB), QKI, RBMY, SRSF1 (SF2/ASF), yeast Vts1p, and Drosophila SLBP — selected to encompass a variety of RNA-binding domain types and RNA secondary structure preferences. Because the number of sequences obtained per sample (∼100 million) is much smaller than the total number of RNAs in the pool either before or after selection, few sequences are observed more than once (Fig. S1E), after routine filtering steps (see Section 2.5.1

Conclusion

RNAcompete-S presents, to our knowledge, the first example of a single-step selection method that can accurately derive complex RNA sequence-structure preferences of an RBP. The laboratory component is similar to that used in RBNS and RNAcompete; like RNAcompete (and unlike RBNS), RNAcompete-S determines RNA-binding motifs, and not binding affinities, and thus only a single concentration of protein is required. There are two major differences between RNAcompete-S and RBNS: the absence of primer

Data availability

Sequence reads have been deposited at the Short Read Archive under accession SRX1333324. Processed data, motifs, and code are available at http://hugheslab.ccbr.utoronto.ca/supplementary-data/RNAcompete-S/index.html.

Acknowledgements

Thanks to Ally Yang for technical assistance, and Hilal Kazan and Xiao Li for providing code for processing RNA secondary structures and for helpful discussions. This work was funded by the US National Institutes of Health (1R01HG00570 to T.R.H. and Q.D.M.) and the Canadian Institutes of Health Research (CIHR) (MOP-49451 to T.R.H. and MOP-125894 to Q.D.M. and T.R.H.). T.R.H. is a scholar of the Canadian Institutes For Advanced Research. K.C.H. was supported by an Ontario Graduate Scholarship

References (43)

  • C.G. Burd et al.

    RNA binding specificity of hnRNP A1: significance of hnRNP A1 high-affinity binding sites in pre-mRNA splicing

    EMBO J.

    (1994)
  • P.C. Bevilacqua et al.

    Minor-groove recognition of double-stranded RNA by the double-stranded RNA-binding domain from the RNA-activated protein kinase PKR

    Biochemistry

    (1996)
  • T. Aviv et al.

    The RNA-binding SAM domain of Smaug defines a new family of post-transcriptional regulators

    Nat. Struct. Biol.

    (2003)
  • D.J. Battle et al.

    The stem-loop binding protein forms a highly stable and specific complex with the 3' stem-loop of histone mRNAs

    RNA

    (2001)
  • F.B. Gao et al.

    Selection of a subset of mRNAs from combinatorial 3' untranslated region libraries using neuronal RNA-binding protein Hel-N1

    Proc. Natl. Acad. Sci. USA

    (1994)
  • S.A. Tenenbaum et al.

    Identifying mRNA subsets in messenger ribonucleoprotein complexes by using cDNA arrays

    Proc. Natl. Acad. Sci. USA

    (2000)
  • D.D. Licatalosi et al.

    HITS-CLIP yields genome-wide insights into brain alternative RNA processing

    Nature

    (2008)
  • J. Konig et al.

    iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution

    Nat. Struct. Mol. Biol.

    (2010)
  • D. Ray et al.

    Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins

    Nat. Biotechnol.

    (2009)
  • D. Ray et al.

    A compendium of RNA-binding motifs for decoding gene regulation

    Nature

    (2013)
  • M.A. Garcia-Blanco et al.

    Identification and purification of a 62,000-dalton protein that binds specifically to the polypyrimidine tract of introns

    Genes Dev.

    (1989)
  • Cited by (30)

    • RBM45 is an m<sup>6</sup>A-binding protein that affects neuronal differentiation and the splicing of a subset of mRNAs

      2022, Cell Reports
      Citation Excerpt :

      These peaks are enriched internally and near the 3′ end of introns and are depleted at splice junctions (Figures S2C and S2D), consistent with a recent report of RBM45 binding to internal intronic regions in the parvovirus (B19) pre-mRNA (Wang et al., 2020). We identified GAC or bipartite GAC sequences as the top enriched motifs in RBM45 peak regions in both mHippoE-2 and HEK293T cells (Figures 2E and S2E), which is consistent with in vitro selection-based studies of RBM45 binding sequences (Dominguez et al., 2018; Cook et al., 2017). GAC is also the predominant minimal consensus sequence for m6A (Meyer et al., 2012; Desrosiers et al. 1974), suggesting that RBM45 binding to cellular RNAs may be driven at least in part by m6A.

    • Principles and pitfalls of high-throughput analysis of microRNA-binding thermodynamics and kinetics by RNA Bind-n-Seq

      2022, Cell Reports Methods
      Citation Excerpt :

      This design simplifies library preparation, avoids biases that can result from RNA ligation, and ensures that any RNA carried over from protein purification will not contaminate the sequenced library. However, fixed primer-binding sequences can affect the secondary structure ensemble of the RNA pool (Cook et al., 2017) and bias interpretation of RBNS assays with RBPs that recognize structured elements. miRISC binds single-stranded sequence motifs; therefore, secondary structures will have little effect on miRISC binding unless they occlude a site.

    • RNA-centric approaches to study RNA-protein interactions in vitro and in silico

      2020, Methods
      Citation Excerpt :

      The recovered RNA is then Cy5 labelled and hybridized into a microarray. A further improvement of the technique, called RNA-compete-S, makes use of deep sequencing instead of the microarray hybridization [26]. The use of a microarray free technology allows the design of a more variated library which in turn facilitates systematic queries of diverse RNA primary sequences and secondary structures [26].

    • High throughput approaches to study RNA-protein interactions in vitro

      2020, Methods
      Citation Excerpt :

      The method is analogous to measurements of inherent DNA binding preferences by transcription factors (TFs) [35,36]. For RNACompete, an affinity-tagged RBP is incubated with a pool of diverse RNA substrates, that were either sequence-optimized to minimize specific secondary structure and cross hybridization of RNA species (RNACompete) or contained 40 randomized nucleotides (RNACompete-S) [16,33,34,37] (Fig. 2a). In the binding reaction, the cumulative concentration of all RNAs exceeds the concentration of the protein, and the RNA species generally compete with each other for protein binding.

    • Motif models for RNA-binding proteins

      2018, Current Opinion in Structural Biology
      Citation Excerpt :

      The most expressive models are those that assign different free energies to all RNA oligos up to a given length k (aka k-mers) (RCK, kmer-SVM) [28,29]. The k-mer models have an unwieldy number of parameters for larger values of k [30•,31]. The number of parameters can be reduced through the use of gapped k-mer models which permit representation of larger, fixed-sized binding sites, or through kernels used with max-pooling in convolutional neural networks (CNNs), which perform k-mer selection during training (gkm-SVM, DeepBind) [32,33,34••,35].

    View all citing articles on Scopus
    1

    Current address: Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.

    View full text