Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Comparative genomics, minimal gene-sets and the last universal common ancestor

Key Points

  • A minimal set of genes that is necessary and sufficient for sustaining a functional cell can be delineated either by computational comparisons of microbial genomes or experimentally by knocking out genes in simple microbes.

  • The minimal gene-set needs to be defined together with the environmental conditions under which these genes are sufficient to support a cell. For the most favourable conditions, with all nutrients provided and no environmental stress, computational and experimental approaches agree on 250–300 genes as the size of the minimal set.

  • For most essential cellular functions, two or more unrelated or distantly related proteins have evolved; only 60 proteins — primarily those involved in translation and the basic aspects of transcription — are conserved in all cellular life-forms. Therefore, even for the same conditions, there can be many versions of the minimal gene-set.

  • The reconstruction of ancestral life-forms is based on the principle of evolutionary parsimony: the simplest scenario is developed so as to reconcile the observed distribution of genes among species with the species tree. The size and composition of the reconstructed ancestral gene repertoires depend on relative rates of gene loss and horizontal gene-transfer, two phenomena that have been central to microbial evolution.

  • The parsimony approach suggests that the last universal common ancestor (LUCA) of all extant life forms might have had as few as 500–600 genes. The gene set of LUCA that is derived in this fashion might resemble the minimal gene-set for a free-living prokaryote. However, arguments have also been made for a more complex LUCA.

  • The experimental investigation of various versions of the minimal gene-set for cellular life and reconstructed ancestral life-forms might be an important research direction in the second and third decades of the twenty-first century.

Abstract

Comparative genomics, using computational and experimental methods, enables the identification of a minimal set of genes that is necessary and sufficient for sustaining a functional cell. For most essential cellular functions, two or more unrelated or distantly related proteins have evolved; only about 60 proteins, primarily those involved in translation, are common to all cellular life. The reconstruction of ancestral life-forms is based on the principle of evolutionary parsimony, but the size and composition of the reconstructed ancestral gene-repertoires depend on relative rates of gene loss and horizontal gene-transfer. The present estimate suggests a simple last universal common ancestor with only 500–600 genes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: How to derive minimal gene-sets by genome comparison: a scheme.
Figure 2: Protein functions encoded in the mimimal gene-set.
Figure 3: Gene conservation (phyletic spread) in the set of essential genes of Bacillus subtilis, the computational minimal gene-set, and the complete set of conserved genes.
Figure 4: Parsimonious evolutionary scenarios for two genes.
Figure 5: Dependence of the number of genes in a reconstructed last universal common ancestor on the relative rates of gene loss and horizontal gene transfer.
Figure 6: Essential functions for different versions of the last universal common ancestor: glycolysis and gluconeogenesis.

Similar content being viewed by others

References

  1. Fleischmann, R. D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995). The first bacterial genome sequenced.

    CAS  PubMed  Google Scholar 

  2. Fraser, C. M. et al. The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403 (1995). The second bacterial genome sequenced, and still the smallest.

    Article  CAS  PubMed  Google Scholar 

  3. Fraser, C. M., Eisen, J. A. & Salzberg, S. L. Microbial genome sequencing. Nature 406, 799–803 (2000).

    Article  CAS  PubMed  Google Scholar 

  4. Koonin, E. V., Aravind, L. & Kondrashov, A. S. The impact of comparative genomics on our understanding of evolution. Cell 101, 573–576 (2000).

    Article  CAS  PubMed  Google Scholar 

  5. Alberts, B. et al. Molecular Biology of the Cell (Garland Science, New York, 2002).

    Google Scholar 

  6. Gerstein, M. & Hegyi, H. Comparing genomes in terms of protein structure: surveys of a finite parts list. FEMS Microbiol. Rev. 22, 277–304 (1998).

    Article  CAS  PubMed  Google Scholar 

  7. Mushegian, A. R. & Koonin, E. V. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc. Natl Acad. Sci. USA 93, 10268–10273 (1996). The first attempt to derive a minimal gene-set using a comparative-genomic computational approach (comparing the gene sets of H. influenzae and M. genitalium , the only two bacterial genomes sequenced at the time).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Maniloff, J. The minimal cell genome: 'on being the right size'. Proc. Natl Acad. Sci. USA 93, 10004–10006 (1996).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Mushegian, A. The minimal genome concept. Curr. Opin. Genet. Dev. 9, 709–714 (1999).

    Article  CAS  PubMed  Google Scholar 

  10. Koonin, E. V. How many genes can make a cell: the minimal-gene-set concept. Annu. Rev. Genomics Hum. Genet. 1, 99–116 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Zimmer, C. Tinker, tailor: can Venter stitch together a genome from scratch? Science 299, 1006–1007 (2003). The closest so far to a scientific publication on the brave new project of minimal-genome construction.

    Article  CAS  PubMed  Google Scholar 

  12. Katinka, M. D. et al. Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 414, 450–453 (2001).

    Article  CAS  PubMed  Google Scholar 

  13. Huber, H. et al. A new phylum of Archaea represented by a nanosized hyperthermophilic symbiont. Nature 417, 63–67 (2002).

    Article  CAS  PubMed  Google Scholar 

  14. Deckert, G. et al. The complete genome of the hyperthermophilic bacterium Aquifex aeolicus. Nature 392, 353–358 (1998).

    Article  CAS  PubMed  Google Scholar 

  15. Gerdes, S. Y. et al. Experimental determination and system-level analysis of essential genes in Escherichia coli MG1655. J. Bacteriol. 185, 5673–5684. A thorough experimental and theoretical analysis of the essential genes of E. coli.

  16. Rottem, S. Interaction of mycoplasmas with host cells. Physiol. Rev. 83, 417–432 (2003).

    Article  CAS  PubMed  Google Scholar 

  17. Pauling, L. & Zuckerkandl, E. Chemical paleogenetics. Molecular 'restoration studies' of extinct forms of life. Acta Chemica Scandinavica 17, S9–S16 (1963).

    Article  CAS  Google Scholar 

  18. Fitch, W. M. Distinguishing homologous from analogous proteins. Systematic Zoology 19, 99–106 (1970).

    Article  CAS  PubMed  Google Scholar 

  19. Fitch, W. M. Homology: a personal view on some of the problems. Trends Genet. 16, 227–231 (2000).

    Article  CAS  PubMed  Google Scholar 

  20. Sonnhammer, E. L. & Koonin, E. V. Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet. 18, 619–620 (2002).

    Article  CAS  PubMed  Google Scholar 

  21. Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (1997).

    Article  CAS  PubMed  Google Scholar 

  22. Huynen, M. A. & Bork, P. Measuring genome evolution. Proc. Natl Acad. Sci. USA 95, 5849–5856 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Koonin, E. V., Mushegian, A. R. & Bork, P. Non-orthologous gene displacement. Trends Genet. 12, 334–336 (1996).

    Article  CAS  PubMed  Google Scholar 

  24. Koonin, E. V. & Galperin, M. Y. Sequence — Evolution — Function. Computational Approaches in Comparative Genomics (Kluwer Academic, New York, 2002).

    Google Scholar 

  25. Gil, R. et al. The genome sequence of Blochmannia floridanus: comparative analysis of reduced genomes. Proc. Natl Acad. Sci. USA 100, 9388–9393 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Itaya, M. An estimation of minimal genome size required for life. FEBS Lett. 362, 257–260 (1995). A prescient attempt to estimate the minimal genome size in the pre-genomic era. The estimate comes uncannily close to those based on computational and experimental analysis of complete genomes.

    Article  CAS  PubMed  Google Scholar 

  27. Venter, J. C., Levy, S., Stockwell, T., Remington, K. & Halpern, A. Massive parallelism, randomness and genomic advances. Nature Genet. 33 (Suppl.), 219–227 (2003).

    Article  CAS  PubMed  Google Scholar 

  28. Judson, N. & Mekalanos, J. J. Transposon-based approaches to identify essential bacterial genes. Trends Microbiol. 8, 521–526 (2000).

    Article  CAS  PubMed  Google Scholar 

  29. Vagner, V., Dervyn, E. & Ehrlich, S. D. A vector for systematic gene inactivation in Bacillus subtilis. Microbiology 144, 3097–3104 (1998).

    Article  CAS  PubMed  Google Scholar 

  30. Ji, Y., Woodnutt, G., Rosenberg, M. & Burnham, M. K. Identification of essential genes in Staphylococcus aureus using inducible antisense RNA. Methods Enzymol. 358, 123–128 (2002).

    Article  CAS  PubMed  Google Scholar 

  31. Hutchison, C. A. et al. Global transposon mutagenesis and a minimal Mycoplasma genome. Science 286, 2165–2169 (1999). The first attempt to identify essential genes at the whole-genome level.

    CAS  PubMed  Google Scholar 

  32. Akerley, B. J. et al. A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae. Proc. Natl Acad. Sci. USA 99, 966–971 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Kobayashi, K. et al. Essential Bacillus subtilis genes. Proc. Natl Acad. Sci. USA 100, 4678–4683 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Tatusov, R. L. et al. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29, 22–28 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003 Sep 11; [epub ahead of print].

  36. Yu, B. J. et al. Minimization of the Escherichia coli genome using a Tn5-targeted Cre/loxP excision system. Nature Biotechnol. 20, 1018–1023 (2002).

    Article  CAS  Google Scholar 

  37. Mills, D. R., Peterson, R. L. & Spiegelman, S. An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule. Proc. Natl Acad. Sci. USA 58, 217–224 (1967).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Jordan, I. K., Rogozin, I. B., Wolf, Y. I. & Koonin, E. V. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12, 962–968 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Lazcano, A. & Forterre, P. The molecular search for the last common ancestor. J. Mol. Evol. 49, 411–412 (1999). Introduction to a special issue on the last universal common ancestor, which provides an excellent overview of the state of this field at the end of the twentieth century.

    Article  CAS  PubMed  Google Scholar 

  40. Woese, C. The universal ancestor. Proc. Natl Acad. Sci. USA 95, 6854–6859 (1998). A profound discussion of the nature of the last universal common ancestor. The two principal ideas are that the last universal common ancestor did not comprise a unique species, but rather a community of organisms that engaged in rampant gene exchange, and that the different cellular systems 'crystallized' asynchronously during the early evolution of life.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Snel, B., Bork, P. & Huynen, M. A. Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res. 12, 17–25 (2002). The first earnest attempt to construct evolutionary scenarios on the basis of genome comparisons, taking into account gene loss and HGT.

    Article  CAS  PubMed  Google Scholar 

  42. Mirkin, B. G., Fenner, T. I., Galperin, M. Y. & Koonin, E. V. Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3 [online], (cited 22 Sept. 2003), <http://www.biomedcentral.com/1471-2148/3/2> (2003). A detailed analysis of parsimony algorithms for reconstruction of ancestral life forms and an attempt to use the feedback from examination of essential functional niches to adjust the parameters of the algorithms — the relative rates of gene loss and HGT.

  43. Kunin, V. & Ouzounis, C. A. The balance of driving forces during genome evolution in prokaryotes. Genome Res. 13, 1589–1594 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Doolittle, W. F. Phylogenetic classification and the universal tree. Science 284, 2124–2129 (1999).

    Article  CAS  PubMed  Google Scholar 

  45. Doolittle, W. F. Lateral genomics. Trends Cell Biol. 9, M5–M8 (1999).

    Article  CAS  PubMed  Google Scholar 

  46. Gogarten, J. P., Doolittle, W. F. & Lawrence, J. G. Prokaryotic evolution in light of gene transfer. Mol. Biol. Evol. 19, 2226–2238 (2002). A veritable manifesto for HGT. Makes the case for numerous instances of hidden HGT.

    Article  CAS  PubMed  Google Scholar 

  47. Doolittle, W. F. Uprooting the tree of life. Sci. Am. 282, 90–95 (2000).

    Article  CAS  PubMed  Google Scholar 

  48. Pennisi, E. Genome data shake tree of life. Science 280, 672–674 (1998).

    Article  CAS  PubMed  Google Scholar 

  49. Pennisi, E. Is it time to uproot the tree of life? Science 284, 1305–1307 (1999).

    Article  CAS  PubMed  Google Scholar 

  50. Kurland, C. G., Canback, B. & Berg, O. G. Horizontal gene transfer: a critical view. Proc. Natl Acad. Sci. USA 100, 9658–9662 (2003). A useful counterpoint to reference 46. Makes the argument that numerous apparent cases of HGT are artefacts.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Clarke, G. D., Beiko, R. G., Ragan, M. A. & Charlebois, R. L. Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores. J. Bacteriol. 184, 2072–2080 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Wolf, Y. I., Rogozin, I. B., Grishin, N. V., Tatusov, R. L. & Koonin, E. V. Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol. Biol. 1 [online], (cited 22 Sept. 2003), < http://www.biomedcentral.com/1471-2148/1/8> (2003).

  53. Wolf, Y. I., Rogozin, I. B., Grishin, N. V. & Koonin, E. V. Genome trees and the tree of life. Trends Genet. 18, 472–479 (2002).

    Article  CAS  PubMed  Google Scholar 

  54. Korbel, J. O., Snel, B., Huynen, M. A. & Bork, P. SHOT: a web server for the construction of genome phylogenies. Trends Genet. 18, 158–162 (2002).

    Article  CAS  PubMed  Google Scholar 

  55. Daubin, V., Gouy, M. & Perriere, G. A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history. Genome Res. 12, 1080–1090 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Nei, M. & Kumar, S. Molecular Evolution and Phylogenetics (Oxford Univ. Press, Oxford, 2001).

    Google Scholar 

  57. Moran, N. A. Microbial minimalism: genome reduction in bacterial pathogens. Cell 108, 583–586 (2002).

    Article  CAS  PubMed  Google Scholar 

  58. Glansdorff, N. About the last common ancestor, the universal life-tree and lateral gene transfer: a reappraisal. Mol. Microbiol. 38, 177–185 (2000).

    Article  CAS  PubMed  Google Scholar 

  59. Brochier, C., Philippe, H. & Moreira, D. The evolutionary history of ribosomal protein RpS14: horizontal gene transfer at the heart of the ribosome. Trends Genet. 16, 529–533 (2000).

    Article  CAS  PubMed  Google Scholar 

  60. Matte-Tailliez, O., Brochier, C., Forterre, P. & Philippe, H. Archaeal phylogeny based on ribosomal proteins. Mol. Biol. Evol. 19, 631–639 (2002).

    Article  CAS  PubMed  Google Scholar 

  61. Makarova, K. S., Ponomarev, V. A. & Koonin, E. V. Two C or not two C: recurrent disruption of Zn-ribbons, gene duplication, lineage–specific gene loss, and horizontal gene transfer in evolution of bacterial ribosomal proteins. Genome Biology 2 [online], (cited 11 Sept. 2003), <http://genomebiology.com/2001/2/9/research/0033> (2001).

  62. Woese, C. R. On the evolution of cells. Proc. Natl Acad. Sci. USA 99, 8742–8747 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Harris, J. K., Kelley, S. T., Spiegelman, G. B. & Pace, N. R. The genetic core of the universal ancestor. Genome Res. 13, 407–412 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Leipe, D. D., Aravind, L. & Koonin, E. V. Did DNA replication evolve twice independently? Nucleic Acids Res. 27, 3389–3401 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Forterre, P. The origin of DNA genomes and DNA replication proteins. Curr. Opin. Microbiol. 5, 525–532 (2002).

    Article  CAS  PubMed  Google Scholar 

  66. Forterre, P. Displacement of cellular proteins by functional analogues from plasmids or viruses could explain puzzling phylogenies of many DNA informational proteins. Mol. Microbiol. 33, 457–465 (1999).

    Article  CAS  PubMed  Google Scholar 

  67. Delaye, L., Vazquez, H. & Lazcano, A. in First Step in the Origin of Life in the Universe (ed. Chela-Flores, J.) 223–230 (Kluwer Academic, Amsterdam, 2001).

    Book  Google Scholar 

  68. Dworkin, J. P., Lazcano, A. & Miller, S. L. The roads to and from the RNA world. J. Theor. Biol. 222, 127–134 (2003).

    Article  CAS  PubMed  Google Scholar 

  69. Olsen, G. J., Woese, C. R. & Overbeek, R. The winds of (evolutionary) change: breathing new life into microbiology. J. Bacteriol. 176, 1–6 (1994).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Giaever, G. et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 387–391 (2002).

    Article  CAS  PubMed  Google Scholar 

  71. Kamath, R. S. et al. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421, 231–237 (2003).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

I gratefully acknowledge my intellectual debt to A. Mushegian (minimal gene-set analysis) and B. Mirkin (reconstruction of evolutionary scenarios) and constructive discussions with F. Doolittle and P. Forterre on the nature of the last universal common ancestor. I thank A. Osterman for useful discussions on minimal gene-sets and for providing me with his data prior to publication.

Author information

Authors and Affiliations

Authors

Related links

Related links

FURTHER INFORMATION

Kyoto Encyclopedia of Genes and Genomes

NCBI COGs database

NCBI Entrez Genome database

The Tree of Life Web Project

TIGR Genome Projects Database

Glossary

ESSENTIAL GENE

A gene for which knockout is lethal under certain conditions.

ORTHOLOGUES

Homologous genes in different species that originate from the same ancestral gene in the last common ancestor of the species compared.

NON-ORTHOLOGOUS GENE DISPLACEMENT

Displacement of a gene responsible for a particular biological function in a certain set of species by a non-orthologous (unrelated or paralogous) gene in a different set of species.

PHYLETIC PATTERN

The pattern of presence or absence (representation by orthologues) of a gene in different lineages across the species tree.

SYNTHETIC LETHALS

Genes for which simultaneous knockout is lethal, whereas individual knockouts are viable.

SPECIES TREE

A phylogenetic tree that represents evolutionary relationships between species as a whole, as opposed to phylogenetic trees for individual genes.

EVOLUTIONARY PARSIMONY

A methodological approach in evolutionary biology that aims to explain an observed distribution of character states (for example, the phyletic pattern of a gene in a species tree) by postulating the minimal number of events in the course of evolution that could have led to that distribution.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koonin, E. Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat Rev Microbiol 1, 127–136 (2003). https://doi.org/10.1038/nrmicro751

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrmicro751

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing