Abstract
The identification of clinically relevant bacterial amino acid changes can be performed using different methods aimed at the identification of genes showing positively selected amino acid sites (PSS). Nevertheless, such analyses are time consuming, and the frequency of genes showing evidence for PSS can be low. Therefore, the development of a pipeline that allows the quick and efficient identification of the set of genes that show PSS is of interest. Here, we present Auto-PSS-Genome, a Compi-based pipeline distributed as a Docker image, that automates the process of identifying genes that show PSS using three different methods, namely codeML, FUBAR, and omegaMap. Auto-PSS-Genome accepts as input a set of FASTA files, one per genome, containing all coding sequences, thus minimizing the work needed to conduct positively selected sites analyses. The Auto-PSS-Genome pipeline identifies orthologous gene sets and corrects for multiple possible problems in input FASTA files that may prevent the automated identification of genes showing PSS. A FASTA file containing all coding sequences can also be given as an external global reference, thus easing the comparison of results across species, when gene names are different. In this work, we use Auto-PSS-Genome to analyse Mycobacterium leprae (that causes leprosy), and the closely related species M. haemophilum, that mainly causes ulcerating skin infections and arthritis in persons who are severely immunocompromised, and in children causes cervical and perihilar lymphadenitis. The genes identified in these two species as showing PSS may be those that are partially responsible for virulence and resistance to drugs.
Graphic Abstract
Similar content being viewed by others
Notes
References
Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Bioinformatics 13:555–556. https://doi.org/10.1093/bioinformatics/13.5.555
Murrell B, Moola S, Mabona A, Weighill T, Sheward D, Kosakovsky Pond SL, Scheffler K (2013) FUBAR: a fast, Unconstrained Bayesian AppRoximation for inferring selection. Mol Biol Evol 30:1196–1205. https://doi.org/10.1093/molbev/mst030
Wilson DJ, McVean G (2006) Estimating diversifying selection and functional constraint in the presence of recombination. Genetics 172:1411–1425. https://doi.org/10.1534/genetics.105.044917
López-Fernández H, Duque P, Vázquez N, Fdez-Riverola F, Reboiro-Jato M, Vieira CP, Vieira J (2020) Inferring positive selection in large viral datasets. In: Fdez-Riverola F, Rocha M, Mohamad MS, Zaki N, Castellanos-Garzón JA (eds) Practical applications of computational biology and bioinformatics, 13th international conference. Springer, Cham, pp 61–69. https://doi.org/10.1007/978-3-030-23873-5_8
López-Fernández H, Vieira CP, Fdez-Riverola F, Reboiro-Jato M, Vieira J (2021) Inferences on Mycobacterium Leprae host immune response escape and antibiotic resistance using genomic data and GenomeFastScreen. In: Panuccio G, Rocha M, Fdez-Riverola F, Mohamad MS, Casado-Vara R (eds) Practical applications of computational biology and bioinformatics, 14th international conference (PACBB 2020). Springer, Cham, pp 42–50. https://doi.org/10.1007/978-3-030-54568-0_5
Osório NS, Rodrigues F, Gagneux S, Pedrosa J, Pinto-Carbó M, Castro AG, Young D, Comas I, Saraiva M (2013) Evidence for diversifying selection in a set of mycobacterium tuberculosis genes in response to antibiotic- and nonantibiotic-related pressure. Mol Biol Evol 30:1326–1336. https://doi.org/10.1093/molbev/mst038
Reboiro-Jato D, Reboiro-Jato M, Fdez-Riverola F, Vieira CP, Fonseca NA, Vieira J (2012) ADOPS—automatic detection of positively selected sites. J Integr Bioinform 9:200. https://doi.org/10.2390/biecoll-jib-2012-200
Lindeboom JA, van Coppenraet LESB, van Soolingen D, Prins JM, Kuijper EJ (2011) Clinical manifestations, diagnosis, and treatment of Mycobacterium haemophilum infections. Clin Microbiol Revi 24:701–717. https://doi.org/10.1128/CMR.00020-11
Pin D, Guérin-Faublée V, Garreau V, Breysse F, Dumitrescu O, Flandrois J-P, Lina G (2012) Mycobacterium species related to M. leprae and M. lepromatosis from cows with bovine nodular thelitis. Emerg Infect Dis 20:2111–2114. https://doi.org/10.3201/eid2012.140184
Sievers F, Higgins DG (2018) Clustal omega for making accurate alignments of many protein sequences: clustal omega for many protein sequences. Protein Sci 27:135–145. https://doi.org/10.1002/pro.3290
Denoeud F, Carretero-Paulet L, Dereeper A, Droc G, Guyot R, Pietrella M, Zheng C, Alberti A, Anthony F, Aprea G, Aury J-M, Bento P, Bernard M, Bocs S, Campa C, Cenci A, Combes M-C, Crouzillat D, Silva CD, Daddiego L, Bellis FD, Dussert S, Garsmeur O, Gayraud T, Guignon V, Jahn K, Jamilloux V, Joët T, Labadie K, Lan T, Leclercq J, Lepelley M, Leroy T, Li L-T, Librado P, Lopez L, Muñoz A, Noel B, Pallavicini A, Perrotta G, Poncet V, Pot D, Priyono A, Rigoreau M, Rouard M, Rozas J, Tranchant-Dubreuil C, VanBuren R, Zhang Q, Andrade AC, Argout X, Bertrand B, de Kochko A, Graziosi G, Henry RJ, Jayarama S, Ming R, Nagai C, Rounsley S, Sankoff D, Giuliano G, Albert VA, Wincker P, Lashermes P (2014) The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science 345:1181–1184. https://doi.org/10.1126/science.1255274
Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539–542. https://doi.org/10.1093/sysbio/sys029
Glez-Peña D, Gómez-Blanco D, Reboiro-Jato M, Fdez-Riverola F, Posada D (2010) ALTER: program-oriented conversion of DNA and protein alignments. Nucleic Acids Res 38:W14-18. https://doi.org/10.1093/nar/gkq321
Shen W, Le S, Li Y, Hu F (2016) SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE 11:e0163962. https://doi.org/10.1371/journal.pone.0163962
López-Fernández H, Duque P, Henriques S, Vázquez N, Fdez-Riverola F, Vieira CP, Reboiro-Jato M, Vieira J (2019) Bioinformatics protocols for quickly obtaining large-scale data sets for phylogenetic inferences. Interdiscip Sci Comput Life Sci 11:1–9. https://doi.org/10.1007/s12539-018-0312-5
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797. https://doi.org/10.1093/nar/gkh340
Shimono N, Morici L, Casali N, Cantrell S, Sidders B, Ehrt S, Riley LW (2003) Hypervirulent mutant of Mycobacterium tuberculosis resulting from disruption of the mce1 operon. Proc Natl Acad Sci 100:15918–15923. https://doi.org/10.1073/pnas.2433882100
Demangel C, Brodin P, Cockle PJ, Brosch R, Majlessi L, Leclerc C, Cole ST (2004) Cell envelope protein PPE68 contributes to Mycobacterium tuberculosis RD1 Immunogenicity Independently of a 10-kilodalton culture filtrate protein and ESAT-6. Infect Immun 72:2170–2176. https://doi.org/10.1128/IAI.72.4.2170-2176.2004
Squeglia F, Romano M, Ruggiero A, Vitagliano L, De Simone A, Berisio R (2013) Carbohydrate recognition by RpfB from Mycobacterium tuberculosis unveiled by crystallographic and molecular dynamics analyses. Biophys J 104:2530–2539. https://doi.org/10.1016/j.bpj.2013.04.040
Thakur RS, Basavaraju S, Somyajit K, Jain A, Subramanya S, Muniyappa K, Nagaraju G (2013) Evidence for the role of Mycobacterium tuberculosis RecG helicase in DNA repair and recombination. FEBS J 280:1841–1860. https://doi.org/10.1111/febs.12208
Li C, Li Q, Zhang Y, Gong Z, Ren S, Li P, Xie J (2017) Characterization and function of Mycobacterium tuberculosis H37Rv Lipase Rv1076 (LipU). Microbiol Res 196:7–16. https://doi.org/10.1016/j.micres.2016.12.005
Ren H, Liu J (2006) AsnB is involved in natural resistance of Mycobacterium smegmatis to multiple drugs. AAC 50:250–255. https://doi.org/10.1128/AAC.50.1.250-255.2006
Brown AC, Parish T (2008) Dxr is essential in Mycobacterium tuberculosis and fosmidomycin resistance is due to a lack of uptake. BMC Microbiol 8:78. https://doi.org/10.1186/1471-2180-8-78
Virulence attenuation of two Mas-like polyketide synthase mutants of Mycobacterium tuberculosis | Microbiology Society. https://www.microbiologyresearch.org/content/journal/micro/10.1099/mic.0.26278-0. Accessed 13 Nov 2020
Koster K, Largen A, Foster JT, Drees KP, Qian L, Desmond EP, Wan X, Hou S, Douglas JT (2018) Whole genome SNP analysis suggests unique virulence factor differences of the Beijing and Manila families of Mycobacterium tuberculosis found in Hawaii. PLoS ONE 13:e0201146. https://doi.org/10.1371/journal.pone.0201146
Starks AM, Gumusboga A, Plikaytis BB, Shinnick TM, Posey JE (2009) Mutations at embB Codon 306 are an important molecular indicator of ethambutol resistance in Mycobacterium tuberculosis. AAC 53:1061–1066. https://doi.org/10.1128/AAC.01357-08
Chen JM, Zhang M, Rybniker J, Boy-Röttger S, Dhar N, Pojer F, Cole ST (2013) Mycobacterium tuberculosis EspB binds phospholipids and mediates EsxA-independent virulence. Mol Microbiol 89:1154–1166. https://doi.org/10.1111/mmi.12336
Acknowledgements
This work was funded by National Funds through FCT—Fundação para a Ciência e a Tecnologia, I.P., under the project UIDB/04293/2020. The SING group thanks the CITI (Centro de Investigación, Transferencia e Innovación) from the University of Vigo for hosting its IT infrastructure. This work was partially supported by the Consellería de Educación, Universidades e Formación Profesional (Xunta de Galicia) under the scope of the strategic funding ED431C2018/55-GRC Competitive Reference Group.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
López-Fernández, H., Vieira, C.P., Ferreira, P. et al. On the Identification of Clinically Relevant Bacterial Amino Acid Changes at the Whole Genome Level Using Auto-PSS-Genome. Interdiscip Sci Comput Life Sci 13, 334–343 (2021). https://doi.org/10.1007/s12539-021-00439-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-021-00439-2