Skip to main content

Advertisement

Log in

Bioinformatics Protocols for Quickly Obtaining Large-Scale Data Sets for Phylogenetic Inferences

  • Original Research Article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Useful insight into the evolution of genes and gene families can be provided by the analysis of all available genome datasets rather than just a few, which are usually those of model species. Handling and transforming such datasets into the desired format for downstream analyses is, however, often a difficult and time-consuming task for researchers without a background in informatics. Therefore, we present two simple and fast protocols for data preparation, using an easy-to-install, open-source, cross-platform software application with user-friendly, rich graphical user interface (SEDA; http://www.sing-group.org/seda/index.html). The first protocol is a substantial improvement over one recently published (López-Fernández et al. Practical applications of computational biology and bioinformatics, 12th International conference. Springer, Cham, pp 88–96 (2019)[1]), which was used to study the evolution of GULO, a gene that encodes the enzyme responsible for the last step of vitamin C synthesis. In this paper, we show how the sequence data file used for the phylogenetic analyses can now be obtained much faster by changing the way coding sequence isoforms are removed, using the newly implemented SEDA operation “Remove isoforms”. This protocol can be used to easily show that putative functional GULO genes are present in several Prostotomian groups such as Molluscs, Priapulida and Arachnida. Such findings could have been easily missed if only a few Protostomian model species had been used. The second protocol allowed us to identify positively selected amino acid sites in a set of 19 primate HLA immunity genes. Interestingly, the proteins encoded by MHC class II genes can show just as many positively selected amino acid sites as those encoded by classical MHC class I genes. Although a significant percentage of codons, which can be as high as 14.8%, are evolving under positive selection, the main mode of evolution of HLA immunity genes is purifying selection. Using a large number of primate species, the probability of missing the identification of positively selected amino acid sites is lower. Both projects were performed in less than one week, and most of the time was spent running the analyses rather than preparing the files. Such protocols can be easily adapted to answer many other questions using a phylogenetic approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. López-Fernández H, Duque P, Henriques S, Vázquez N, Fdez-Riverola F, Vieira CP, Reboiro-Jato M, Vieira J (2019) A bioinformatics protocol for quickly creating large-scale phylogenetic trees. In: Fdez-Riverola F, Mohamad MS, Rocha M, De Paz JF, González P (eds) Practical applications of computational biology and bioinformatics, 12th International conference. Springer, Cham, pp 88–96

  2. Wintergerst ES, Maggini S, Hornig DH (2006) Immune-enhancing role of vitamin C and zinc and effect on clinical conditions. Ann Nutr Metab 50:85–94

    Article  PubMed  CAS  Google Scholar 

  3. Englard S, Seifter S (1986) The biochemical functions of ascorbic acid. Annu Rev Nutr 6:365–406

    Article  PubMed  CAS  Google Scholar 

  4. Hansen S, Tveden-Nyborg P, Lykkesfeldt J (2014) Does vitamin C deficiency affect cognitive development and function? Nutrients 6:3818–3846

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. Drouin G, Godin J-R, Page B (2011) The genetics of vitamin C loss in vertebrates. Curr Genom 12:371–378

    Article  CAS  Google Scholar 

  6. Klein J, Huigin C, Deutsch J (1994) MHC polymorphism and parasites. Philos Trans R Soc Lond B Biol Sci 346:351–358

    Article  PubMed  CAS  Google Scholar 

  7. Hedrick PW (2002) Pathogen resistance and genetic variation at MHC loci. Evolution 56:1902–1908

    Article  PubMed  Google Scholar 

  8. Pyo C-W, Williams LM, Moore Y, Hyodo H, Li SS, Zhao LP, Sageshima N, Ishitani A, Geraghty DE (2006) HLA-E, HLA-F, and HLA-G polymorphism: genomic sequence defines haplotype structure and variation spanning the nonclassical class I genes. Immunogenetics 58:241–251

    Article  PubMed  CAS  Google Scholar 

  9. Pierini F, Lenz TL (2018) Divergent allele advantage at human MHC genes: signatures of past and ongoing selection. Mol Biol Evol 35:2145–2158

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  10. Vandiedonck C, Knight JC (2009) The human major histocompatibility complex as a paradigm in genomics research. Brief Funct Genom Proteom 8:379–394

    Article  CAS  Google Scholar 

  11. Hewitt EW (2003) The MHC class I antigen presentation pathway: strategies for viral immune evasion. Immunology 110:163–169

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Roche PA, Furuta K (2015) The ins and outs of MHC class II-mediated antigen processing and presentation. Nat Rev Immunol 15:203–216

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Leferink NGH, Jose MDF, van den Berg WAM, van Berkel WJH (2009) Functional assignment of Glu386 and Arg388 in the active site of l-galactono-γ-lactone dehydrogenase. FEBS Lett 583:3199–3203

    Article  PubMed  CAS  Google Scholar 

  14. Reboiro-Jato D, Reboiro-Jato M, Fdez-Riverola F, Vieira CP, Fonseca NA, Vieira J (2012) ADOPS–automatic detection of positively selected sites. J Integr Bioinform 9:200

    Article  PubMed  Google Scholar 

  15. Kumar S, Stecher G, Tamura K (2016) MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for bigger datasets. Mol Biol Evol 33:1870–1874

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  16. Vázquez N, Vieira CP, Amorim BSR, Torres A, López-Fernández H, Fdez-Riverola F, Sousa JLR, Reboiro-Jato M, Vieira J (2018) Large scale analyses and visualization of adaptive amino acid changes projects. Interdiscip Sci Comput Life Sci 10:24–32

    Article  CAS  Google Scholar 

  17. Geraghty DE (1990) Human leukocyte antigen F (HLA-F). An expressed HLA gene composed of a class I coding sequence linked to a novel transcribed repetitive element. J Exp Med 171:1–18

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

This article is a result of the project Norte-01-0145-FEDER-000008-Porto Neurosciences and Neurologic Disease Research Initiative at I3S, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (FEDER). SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from University of Vigo for hosting its IT infrastructure. This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia) and FEDER (European Union). H. López-Fernández is supported by a post-doctoral fellowship from Xunta de Galicia (ED481B 2016/068 − 0).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hugo López-Fernández.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 729 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

López-Fernández, H., Duque, P., Henriques, S. et al. Bioinformatics Protocols for Quickly Obtaining Large-Scale Data Sets for Phylogenetic Inferences. Interdiscip Sci Comput Life Sci 11, 1–9 (2019). https://doi.org/10.1007/s12539-018-0312-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-018-0312-5

Keywords

Navigation