jModelTest 2: more models, new heuristics and parallel computing

Darriba, Diego; Taboada, Guillermo L; Doallo, Ramón; Posada, David

doi:10.1038/nmeth.2109

Download PDF

Correspondence
Published: 30 July 2012

jModelTest 2: more models, new heuristics and parallel computing

Diego Darriba^1,2,
Guillermo L Taboada²,
Ramón Doallo² &
…
David Posada¹

Nature Methods volume 9, page 772 (2012)Cite this article

28k Accesses
12k Citations
27 Altmetric
Metrics details

Subjects

To the Editor:

The statistical selection of best-fit models of nucleotide substitution is routine in the phylogenetic analysis of DNA sequence alignments¹. With the advent of next-generation sequencing technologies, most researchers are moving from phylogenetics to phylogenomics, in which large sequence alignments typically include hundreds or thousands of loci. Phylogenetic resources therefore need to be adapted to a high-performance computing paradigm so as to allow demanding analyses at the genomic level. Here we introduce jModelTest 2, a program for nucleotide-substitution model selection that incorporates more models, new heuristics, efficient technical optimizations and parallel computing.

jModelTest 2 includes important features not present in the previous versions^2,3 (Supplementary Table 1). We expanded the set of candidate models from 88 to 1,624, and we implemented two heuristics for model selection: a greedy, hill-climbing hierarchical clustering approach (Supplementary Note 1) and a filtering algorithm based on similarity among parameter estimates (Supplementary Note 2). jModelTest 2 is written in Java, and it can run on Windows, Macintosh and Linux platforms. Source code and binaries are freely available from https://code.google.com/p/jmodeltest2/. The package includes detailed documentation and examples, and a discussion group is available at https://groups.google.com/forum/#!forum/jmodeltest/.

We evaluated the accuracy of jModelTest 2 using 10,000 data sets simulated under a large variety of conditions (Supplementary Note 3). Using the Bayesian information criterion⁴ for model selection, jModelTest 2 identified the generating model 89% of the time (Supplementary Table 2); in the remaining cases, jModelTest 2 selected a model similar to the generating one. Accordingly, model-averaged estimates of model parameters were highly precise (Supplementary Table 3). In these simulations, the two selection heuristics that we developed were accurate and efficient. Using the hierarchical clustering heuristic, we found the same best-fit model as the full search 95% of the time. With the similarity filtering approach, we reduced the number of models evaluated by 60% on average and found the global best-fit model 99% of the time (Fig. 1 and Supplementary Note 2).

**Figure 1: Benchmarking of the filtering heuristic in jModelTest 2.**

jModelTest 2 can be executed in high-performance computing environments as (i) a desktop version with a user-friendly interface for multicore processors, (ii) a cluster version that distributes the computational load among nodes, and (iii) as a hybrid version that can take advantage of a cluster of multicore nodes. An experimental study with real and simulated data sets showed remarkable computational speedups compared to previous versions (Supplementary Note 4). For example, the hybrid approach executed on the Amazon EC2 cloud with 256 processes was 182–211 times faster. For relatively large alignments (138 sequences and 10,693 sites), this could be equivalent to a reduction of the running time from nearly 8 days to around 1 hour.

References

Sullivan, J. & Joyce, P. Annu. Rev. Ecol. Evol. Syst. 36, 445–466 (2005).
Article Google Scholar
Posada, D. & Crandall, K.A. Bioinformatics 14, 817–818 (1998).
Article CAS Google Scholar
Posada, D. Mol. Biol. Evol. 25, 1253–1256 (2008).
Article CAS Google Scholar
Schwarz, G. Ann. Stat. 6, 461–464 (1978).
Article Google Scholar
Akaike, H. IEEE Trans. Automat. Contr. 19, 716–723 (1974).
Article Google Scholar

Download references

Acknowledgements

This work was financially supported by the European Research Council (ERC-2007-Stg 203161-PHYGENOM to D.P.), Spanish Ministry of Science and Education (BFU2009-08611 to D.P.) and Xunta de Galicia (Galician Thematic Networks RGB 2010/90 to D.P. and GHPC2 2010/53 to R.D.).

Author information

Authors and Affiliations

Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
Diego Darriba & David Posada
Computer Architecture Group, University of A Coruña, A Coruña, Spain
Diego Darriba, Guillermo L Taboada & Ramón Doallo

Authors

Diego Darriba
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo L Taboada
View author publications
You can also search for this author in PubMed Google Scholar
Ramón Doallo
View author publications
You can also search for this author in PubMed Google Scholar
David Posada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Posada.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 1–3 and Supplementary Notes 1–4 (PDF 667 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Darriba, D., Taboada, G., Doallo, R. et al. jModelTest 2: more models, new heuristics and parallel computing. Nat Methods 9, 772 (2012). https://doi.org/10.1038/nmeth.2109

Download citation

Published: 30 July 2012
Issue Date: August 2012
DOI: https://doi.org/10.1038/nmeth.2109

This article is cited by

A taxonomic revision of the genus Angelica (Apiaceae) in Taiwan with a new species A. aliensis
- Jenn-Che Wang
- Hung-Hsin Chen
- Chi-Chun Huang
Botanical Studies (2024)
Ancient reindeer mitogenomes reveal island-hopping colonisation of the Arctic archipelagos
- Katharina Hold
- Edana Lord
- Nicolas Dussex
Scientific Reports (2024)
Genomic epidemiology reveals geographical clustering of multidrug-resistant Escherichia coli ST131 associated with bacteraemia in Wales
- Rhys T. White
- Matthew J. Bull
- Scott A. Beatson
Nature Communications (2024)
Ancient and modern DNA track temporal and spatial population dynamics in the European fallow deer since the Eemian interglacial
- K. H. Baker
- H. W. I. Gray
- A. R. Hoelzel
Scientific Reports (2024)
The complete plastome sequences of invasive weed Parthenium hysterophorus: genome organization, evolutionary significance, structural features, and comparative analysis
- Lubna
- Sajjad Asaf
- Ahmed AL-Harrasi
Scientific Reports (2024)

jModelTest 2: more models, new heuristics and parallel computing

Subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Text and Figures

Rights and permissions

About this article

Cite this article

This article is cited by

A taxonomic revision of the genus Angelica (Apiaceae) in Taiwan with a new species A. aliensis

Ancient reindeer mitogenomes reveal island-hopping colonisation of the Arctic archipelagos

Genomic epidemiology reveals geographical clustering of multidrug-resistant Escherichia coli ST131 associated with bacteraemia in Wales

Ancient and modern DNA track temporal and spatial population dynamics in the European fallow deer since the Eemian interglacial

The complete plastome sequences of invasive weed Parthenium hysterophorus: genome organization, evolutionary significance, structural features, and comparative analysis

Search

Quick links

Subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Supplementary Text and Figures

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A taxonomic revision of the genus Angelica (Apiaceae) in Taiwan with a new species A. aliensis

Ancient reindeer mitogenomes reveal island-hopping colonisation of the Arctic archipelagos

Genomic epidemiology reveals geographical clustering of multidrug-resistant Escherichia coli ST131 associated with bacteraemia in Wales

Ancient and modern DNA track temporal and spatial population dynamics in the European fallow deer since the Eemian interglacial

The complete plastome sequences of invasive weed Parthenium hysterophorus: genome organization, evolutionary significance, structural features, and comparative analysis

Search

Quick links