The EnteroBase user's guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity

  1. Mark Achtman
  1. Warwick Medical School, University of Warwick, Coventry CV4 7AL, United Kingdom;
  2. 3Scottish Salmonella Reference Laboratory, Glasgow G31 2ER, UK;
  3. 4Public Health England (PHE), Colindale, London NW9 5EQ, UK;
  4. 5National Wildlife Management Centre, APHA, Sand Hutton, York YO41 1LZ, UK;
  5. 6Austrian Agency for Health and Food Safety (AGES), Institute for Medical Microbiology and Hygiene, 8010 Graz, Austria;
  6. 7German Federal Institute for Risk Assessement, D-10589 Berlin, Germany (Study Centre for Genome Sequencing and Analysis);
  7. 8Animal and Plant Health Agency (APHA), Addlestone KT15 3NB, UK;
  8. 9Environment and Sustainability Institute, University of Exeter, Penryn TR10 9FE, UK;
  9. 10Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK;
  10. 11Institut Pasteur, 75724 Paris cedex, France;
  11. 12Department of Epidemiology and Population Health, Institute of Infection and Global Health, University of Liverpool, Neston CH64 7TE, UK
  1. Corresponding author: m.achtman{at}warwick.ac.uk
  2. Abstract

    EnteroBase is an integrated software environment that supports the identification of global population structures within several bacterial genera that include pathogens. Here, we provide an overview of how EnteroBase works, what it can do, and its future prospects. EnteroBase has currently assembled more than 300,000 genomes from Illumina short reads from Salmonella, Escherichia, Yersinia, Clostridioides, Helicobacter, Vibrio, and Moraxella and genotyped those assemblies by core genome multilocus sequence typing (cgMLST). Hierarchical clustering of cgMLST sequence types allows mapping a new bacterial strain to predefined population structures at multiple levels of resolution within a few hours after uploading its short reads. Case Study 1 illustrates this process for local transmissions of Salmonella enterica serovar Agama between neighboring social groups of badgers and humans. EnteroBase also supports single nucleotide polymorphism (SNP) calls from both genomic assemblies and after extraction from metagenomic sequences, as illustrated by Case Study 2 which summarizes the microevolution of Yersinia pestis over the last 5000 years of pandemic plague. EnteroBase can also provide a global overview of the genomic diversity within an entire genus, as illustrated by Case Study 3, which presents a novel, global overview of the population structure of all of the species, subspecies, and clades within Escherichia.

    Footnotes

    • 1 Coequal first author

    • 2 A complete list of the Agama Study Group coauthors appears at the end of this paper.

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.251678.119.

    • Freely available online through the Genome Research Open Access option.

    • Received April 20, 2019.
    • Accepted December 3, 2019.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server