The microbial pan-genome

https://doi.org/10.1016/j.gde.2005.09.006Get rights and content

A decade after the beginning of the genomic era, the question of how genomics can describe a bacterial species has not been fully addressed. Experimental data have shown that in some species new genes are discovered even after sequencing the genomes of several strains. Mathematical modeling predicts that new genes will be discovered even after sequencing hundreds of genomes per species. Therefore, a bacterial species can be described by its pan-genome, which is composed of a ‘core genome’ containing genes present in all strains, and a ’dispensable genome’ containing genes present in two or more strains and genes unique to single strains. Given that the number of unique genes is vast, the pan-genome of a bacterial species might be orders of magnitude larger than any single genome.

Section snippets

Introduction to the pan-genome

Ten years after the first sequence of a free-living organism was revealed, public databases contain 239 complete bacterial genomes. However, as shown in Table 1, in 83% and 8% of the cases, only one or two genomes per bacterial species have been sequenced, respectively. In a recent work [1••], eight genomes representative of the serogroup (see Glossary) diversity among group B Streptococcus (GBS) strains were analyzed to answer the question of how many genomes are needed to fully describe a

A large microbial gene pool driving evolution

Much indirect evidence had already hinted at the concept of the pan-genome, even before it was properly defined by mathematical quantification [1••]. Several studies of subtractive hybridization and comparative genome hybridization (CGH) using multiple isolates of the same species had shown that bacterial species such as Helicobacter pylori, Staphylococcus aureus and Escherichia coli display an extensive genetic diversity, with an average of 20–35% of genes being specific for a single strain [2

Core and dispensable genes

In general, the core genome includes all genes responsible for the basic aspects of the biology of a species and its major phenotypic traits. By contrast, dispensable genes contribute to the species diversity and might encode supplementary biochemical pathways and functions that are not essential for bacterial growth but which confer selective advantages, such as adaptation to different niches, antibiotic resistance, or colonization of a new host. Such genes are generally clustered on large

Serotypes and sequence types do not correlate with genomic diversity

Classical methods to catalogue bacterial species are based on knowledge convenient phenotypic traits. The most popular is the agglutination of bacterial cells by specific antisera against the capsular polysaccharide surrounding many pathogens. For a variety of encapsulated bacteria, this method has been widely used for epidemiology studies and vaccine design, assuming that all strains belonging to the same serogroup are similar. More recently, techniques such as multilocus enzyme

Challenging the concept of species

Species can have an open or a closed pan-genome. An open pan-genome is typical of those species that colonize multiple environments and have multiple ways of exchanging genetic material. Streptococci, Meningococci, H. pylori, Salmonellae and E. coli have these properties and are likely to have an open pan-genome. By contrast, other species such as B. anthracis, Mycobacterium tuberculosis and Chlamydia trachomatis, which are known to be more conserved, live in isolated niches with limited access

Conclusions and practical implications

The need to sequence multiple genomes from each species to better understand the diversity of bacterial species is not just a theoretical exercise. Recently, it has been shown that the design of a universal vaccine against GBS was only possible using dispensable genes [26]. In addition, sequencing of multiple genomes was instrumental in discovering the presence of the pilus in GBS, group A Streptococcus, and Pneumococcus, an essential virulence factor that had been missed by all conventional

References and recommended reading

Papers of particular interest, published within the annual period of review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

Acknowledgements

The authors would like to thank Michael Cieslewicz, Antonello Covacci and John Telford for their contribution to the pan-genome concept. They are also grateful to the IRIS Bioinformatic group, to the TIGR information technology and database server groups led by Vadim Sapiro and Michael Heaney, respectively, and to Giorgio Corsi for artwork.

Glossary

Core genome
The pool of genes shared by all the strains of the same bacterial species.
Dispensable genome
The pool of genes present in some — but not all — strains of the same bacterial species.
Lateral gene transfer
Mechanism by which an individual of one species transfers genetic material (i.e. DNA) to an individual of a different species.
Pan-genome
The global gene repertoire of a bacterial species: core genome + dispensable genome.
Serogroup
Group of related bacterial strains characterized by the

References (26)

  • J.G. Lawrence et al.

    Lateral gene transfer: when will adolescence end?

    Mol Microbiol

    (2003)
  • Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones A, Durkin AS et...
  • J.R. Fitzgerald et al.

    Evolutionary genomics of Staphylococcus aureus: insights into the origin of methicillin-resistant strains and the toxic shock syndrome epidemic

    Proc Natl Acad Sci USA

    (2001)
  • N. Dorrell et al.

    Whole genome comparison of Campylobacter jejuni human isolates using a low-cost microarray reveals extensive genetic diversity

    Genome Res

    (2001)
  • S. Fukiya et al.

    Extensive genomic diversity in pathogenic Escherichia coli and Shigella strains revealed by comparative genomic hybridization microarray

    J Bacteriol

    (2004)
  • J.C. Venter et al.

    Environmental genome shotgun sequencing of the Sargasso sea

    Science

    (2004)
  • P.B. Eckburg et al.

    Diversity of the human intestinal microbial flora

    Science

    (2005)
  • J. Gans et al.

    Computational improvements reveal great bacterial diversity and high metal toxicity in soil

    Science

    (2005)
  • R. Sandaa et al.

    Analysis of bacterial communities in heavy metal-contaminated soils at different levels of resolution

    FEMS Microbiol Ecol

    (1999)
  • T.P. Curtis et al.

    Exploring microbial diversity — a vast below

    Science

    (2005)
  • J.R. Thompson et al.

    Genotypic diversity within a natural coastal bacterioplankton population

    Science

    (2005)
  • C.G. Kurland et al.

    Horizontal gene transfer: a critical view

    Proc Natl Acad Sci USA

    (2003)
  • E.V. Koonin

    Horizontal gene transfer: the path to maturity

    Mol Microbiol

    (2003)
  • Cited by (927)

    View all citing articles on Scopus
    View full text