Skip to main content

Advertisement

Log in

Development of a new oligonucleotide block location-based feature extraction (BLBFE) method for the classification of riboswitches

  • Original Article
  • Published:
Molecular Genetics and Genomics Aims and scope Submit manuscript

Abstract

As knowledge of genetics and genome elements increases, the demand for the development of bioinformatics tools for analyzing these data is raised. Riboswitches are genetic components, usually located in the untranslated regions of mRNAs, that regulate gene expression. Additionally, their interaction with antibiotics has been recently suggested, implying a role in antibiotic effects and resistance. Following a previously published sequential block finding algorithm, herein, we report the development of a new block location-based feature extraction strategy (BLBFE). This procedure utilizes the locations of family-specific sequential blocks on riboswitch sequences as features. Furthermore, the performance of other feature extraction strategies, including mono- and dinucleotide frequencies, k-mer, DAC, DCC, DACC, PC-PseDNC-General and SC-PseDNC-General methods, was investigated. KNN, LDA, naïve Bayes, PNN and decision tree classifiers accompanied by V-fold cross-validation were applied for all methods of feature extraction, and their performances based on the defined feature extraction strategies were compared. Performance measures of accuracy, sensitivity, specificity and F-score for each method of feature extraction were studied. The proposed feature extraction strategy resulted in classification of riboswitches with an average correct classification rate (CCR) of 90.8%. Furthermore, the obtained data confirmed the performance of the developed feature extraction method with an average accuracy of 96.1%, an average sensitivity of 90.8%, an average specificity of 97.52% and an average F-score of 90.69%. Our results implied that the proposed feature extraction (BLBFE) method can classify and discriminate riboswitch families with high CCR, accuracy, sensitivity, specificity and F-score values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Aghdam EM, Barzegar A, Hejazi MS (2014a) Evolutionary origin and conserved structural building blocks of riboswitches and ribosomal RNAs: riboswitches as probable target sites for aminoglycosides interaction. Adv Pharm Bull 4:225

    CAS  Google Scholar 

  • Aghdam EM, Hejazi ME, Hejazi MS, Barzegar A (2014b) Riboswitches as potential targets for aminoglycosides compared with rRNA molecules: in silico study. J Microb Biochem Technol 6:1–9

    Google Scholar 

  • Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79

    Google Scholar 

  • Baird NJ, Inglese J, Ferre-D’Amare AR (2015) Rapid RNA-ligand interaction analysis through high-information content conformational and stability landscapes. Nat Commun 6:8898

    CAS  PubMed  Google Scholar 

  • Barrick JE, Breaker RR (2007) The distributions, mechanisms, and structures of metabolite-binding riboswitches. Genome Biol 8:R239

    PubMed  PubMed Central  Google Scholar 

  • Bartel DP (2009) MicroRNAs: target recognition and regulatory functions. Cell 136:215–233

    CAS  PubMed  PubMed Central  Google Scholar 

  • Blount KF, Wang JX, Lim J, Sudarsan N, Breaker RR (2007) Antibacterial lysine analogs that target lysine riboswitches. Nat Chem Biol 3:44–49

    CAS  PubMed  Google Scholar 

  • Braga-Neto UM, Dougherty ER (2004) Is cross-validation valid for small-sample microarray classification? Bioinformatics 20:374–380

    CAS  PubMed  Google Scholar 

  • Breiman L, Spector P (1992) Submodel selection and evaluation in regression. The X-random case. Int Stat Rev 60:291–319

    Google Scholar 

  • Cech TR (1986) A model for the RNA-catalyzed replication of RNA. Proc Natl Acad Sci USA 83:4360–4363

    CAS  PubMed  PubMed Central  Google Scholar 

  • Chen J, Gottesman S (2014) Riboswitch regulates RNA. Science 345:876–877

    CAS  PubMed  Google Scholar 

  • Chen W, Zhang X, Brooker J, Lin H, Zhang L, Chou KC (2015) PseKNC-general: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 31:119–120

    CAS  PubMed  Google Scholar 

  • Dong Q, Zhou S, Guan J (2009) A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25:2655–2662

    CAS  PubMed  Google Scholar 

  • Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley-Interscience, New York

    Google Scholar 

  • Eddy SR (2001) Non-coding RNA genes and the modern RNA world. Nat Rev Genet 2:919–929

    CAS  PubMed  Google Scholar 

  • Eddy SR, Durbin R (1994) RNA sequence analysis using covariance models. Nucleic Acids Res 22:2079–2088

    CAS  PubMed  PubMed Central  Google Scholar 

  • Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27:861–874

    Google Scholar 

  • Friedel M, Nikolajewa S, Sühnel J, Wilhelm T (2009) DiProDB: a database for dinucleotide properties. Nucleic Acids Res 37:D37–D40

    CAS  PubMed  Google Scholar 

  • Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29:131–163

    Google Scholar 

  • Golabi F, Shamsi M, Sedaaghi MH, Barzegar A, Hejazi MS (2018) Development of a new sequential block finding strategy for detection of conserved sequences in riboswitches. Bioimpacts 8:15–24

    Google Scholar 

  • Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res 33:D121–D124

    CAS  PubMed  Google Scholar 

  • Guo Y, Yu L, Wen Z, Li M (2008) Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Res 36:3025–3030

    CAS  PubMed  PubMed Central  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York

    Google Scholar 

  • Havill J, Bhatiya C, Johnson S (2014) A new approach for detecting riboswitches in DNA sequences. Bioinformatics 30:3012–3019

    CAS  PubMed  PubMed Central  Google Scholar 

  • Heijden F, Duin RPW, de Ridder D, Tax DMJ (2004) Classification, parameter estimation and state estimation: an engineering approach using MATLAB. Wiley, Hoboken

    Google Scholar 

  • Isaacs FJ, Dwyer DJ, Ding C, Pervouchine DD, Cantor CR, Collins JJ (2004) Engineered riboregulators enable post-transcriptional control of gene expression. Nat Biotechnol 22:841–847

    CAS  PubMed  Google Scholar 

  • John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., Montreal, pp 338–345

  • Kang M, Peterson R, Feigon J (2009) Structural insights into riboswitch control of the biosynthesis of queuosine, a modified nucleotide found in the anticodon of tRNA. Mol Cell 33:784–790

    CAS  PubMed  Google Scholar 

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence, vol 2. Morgan Kaufmann Publishers Inc., Montreal, pp 1137–1143

  • Krogh A, Mian IS, Haussler D (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res 22:4768–4778

    CAS  PubMed  PubMed Central  Google Scholar 

  • Lee ER, Blount KF, Breaker RR (2009) Roseoflavin is a natural antibacterial compound that binds to FMN riboswitches and regulates gene expression. RNA Biol 6:187–194

    CAS  PubMed  Google Scholar 

  • Liu B, Liu F, Wang X, Chen J, Fang L, Chou K-C (2015) Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 43:W65–W71

    CAS  PubMed  PubMed Central  Google Scholar 

  • Liu B, Wu H, Chou K-C (2017) Pse-in-one 2.0: an improved package of web servers for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nat Sci 09:67–91

    CAS  Google Scholar 

  • Mandal M, Breaker RR (2004) Gene regulation by riboswitches. Nat Rev Mol Cell Biol 5:451–463

    CAS  PubMed  Google Scholar 

  • Nahvi A, Sudarsan N, Ebert MS, Zou X, Brown KL, Breaker RR (2002) Genetic control by a metabolite binding mRNA. Chem Biol 9:1043

    CAS  PubMed  Google Scholar 

  • Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, Floden EW, Gardner PP, Jones TA, Tate J (2014) Rfam 12.0: updates to the RNA families database. Nucleic Acids Res gku1063

  • Peselis A, Serganov A (2014) Themes and variations in riboswitch structure and function. Biochim Biophys Acta 1839:908–918

    CAS  PubMed  PubMed Central  Google Scholar 

  • Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15:1119–1125

    Google Scholar 

  • Quinlan JR (2014) C4.5: programs for machine learning. Elsevier, Amsterdam

    Google Scholar 

  • Robinson C, Vincent H, Wu M (2014) Modular riboswitch toolsets for synthetic genetic control in diverse bacterial species. J Am Chem Soc 136:10615–10624

    CAS  PubMed  Google Scholar 

  • Roth A, Breaker RR (2009) The structural and functional diversity of metabolite-binding riboswitches. Annu Rev Biochem 78:305–334

    CAS  PubMed  PubMed Central  Google Scholar 

  • Roth A, Winkler WC, Regulski EE, Lee BW, Lim J, Jona I, Barrick JE, Ritwik A, Kim JN, Welz R (2007) A riboswitch selective for the queuosine precursor preQ1 contains an unusually small aptamer domain. Nat Struct Mol Biol 14:308–317

    CAS  PubMed  Google Scholar 

  • Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26:544–548

    CAS  PubMed  PubMed Central  Google Scholar 

  • Serganov A, Nudler E (2013) A decade of riboswitches. Cell 152:17–24

    CAS  PubMed  PubMed Central  Google Scholar 

  • Serganov A, Huang L, Patel DJ (2009) Coenzyme recognition and gene regulation by a flavin mononucleotide riboswitch. Nature 458:233–237

    CAS  PubMed  PubMed Central  Google Scholar 

  • Singh S, Singh R (2016) Application of supervised machine learning algorithms for the classification of regulatory RNA riboswitches. Brief Funct Genom 16:99–105

    Google Scholar 

  • Singh P, Bandyopadhyay P, Bhattacharya S, Krishnamachari A, Sengupta S (2009) Riboswitch detection using profile hidden Markov models. BMC Bioinform 10:325

    Google Scholar 

  • Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45:427–437

    Google Scholar 

  • Specht DF (1990) Probabilistic neural networks. Neural Netw 3:109–118

    Google Scholar 

  • Storz G (2002) An expanding universe of noncoding RNAs. Science 296:1260–1263

    CAS  PubMed  Google Scholar 

  • Sudarsan N, Cohen-Chalamish S, Nakamura S, Emilsson GM, Breaker RR (2005) Thiamine pyrophosphate riboswitches are targets for the antimicrobial compound pyrithiamine. Chem Biol 12:1325–1335

    CAS  PubMed  Google Scholar 

  • Sun Y, Kamel MS, Wong AKC, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40:3358–3378

    Google Scholar 

  • Wei L, Liao M, Gao Y, Ji R, He Z, Zou Q (2014) Improved and promising identification of human microRNAs by incorporating a high-quality negative set. IEEE ACM Trans Comput Biol Bioinform 11:192–201

    Google Scholar 

  • Winkler WC, Nahvi A, Breaker RR (2002) Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature 419:952–956

    CAS  PubMed  Google Scholar 

  • Winkler WC, Nahvi A, Sudarsan N, Barrick JE, Breaker RR (2003) An mRNA structure that controls gene expression by binding S-adenosylmethionine. Nat Struct Mol Biol 10:701–707

    CAS  Google Scholar 

  • Winkler WC, Nahvi A, Roth A, Collins JA, Breaker RR (2004) Control of gene expression by a natural metabolite-responsive ribozyme. Nature 428:281–286

    CAS  PubMed  Google Scholar 

  • Yoon B-j, Vaidyanathan P (2004) HMM with auxiliary memory: a new tool for modeling RNA secondary structures. In: Proceedings of 38th Asilomar conference on signals, systems, and computers. Citeseer

  • Yoon B, Vaidyanathan P (2008) Structural alignment of RNAs using profile-csHMMs and its application to RNA homology search: overview and new results. IEEE Trans Autom Control 53:10–25

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mousa Shamsi or Mohammad Saeid Hejazi.

Ethics declarations

Conflict of interest

All authors (5 authors) declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 105 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Golabi, F., Shamsi, M., Sedaaghi, M.H. et al. Development of a new oligonucleotide block location-based feature extraction (BLBFE) method for the classification of riboswitches. Mol Genet Genomics 295, 525–534 (2020). https://doi.org/10.1007/s00438-019-01642-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00438-019-01642-z

Keywords

Navigation