Key Points
-
The Gene Ontology (GO) has a structure that allows powerful comparisons and inferences about gene functions, but its structure is often misunderstood or ignored in practice.
-
Evidence codes, annotations for unknown functions and annotation qualifiers are vital aspects of GO annotations, but these crucial features of GO annotation are often overlooked.
-
Functional profiling using GO annotations is often performed in an incorrect or inappropriate way. Important issues related to this include a tendency to perform only enrichment testing, using an incorrect reference set, lack of or an inappropriate correction for multiple comparisons, indiscriminate propagation of annotations through the hierarchy, and ignoring the correlations between GO terms.
-
Any analysis using GO annotations should cite data sources, including the version of ontology, date of annotation files, numbers and types of annotations used, and the versions and parameters of software, to ensure that results are fully reproducible.
-
Pie charts are not appropriate for displaying GO functional categorization because of the GO structure and annotation practices. Functional characterization studies should indicate the number of genes that are not mapped to any slim term, are mapped directly to the root node, or are unannotated.
Abstract
The Gene Ontology (GO) project is a collaboration among model organism databases to describe gene products from all organisms using a consistent and computable language. GO produces sets of explicitly defined, structured vocabularies that describe biological processes, molecular functions and cellular components of gene products in both a computer- and human-readable manner. Here we describe key aspects of GO, which, when overlooked, can cause erroneous results, and address how these pitfalls can be avoided.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Bard, J. B. & Rhee, S. Y. Ontologies in biology: design, applications and future challenges. Nature Rev. Genet. 5, 213–222 (2004). This paper provides a more detailed overview of types and uses of ontologies in biology, with an emphasis on GO.
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000). This paper includes more details about the Gene Ontology.
Adams, M. D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).
Hirschman, L., Yeh, A., Blaschke, C. & Valencia, A. Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 6 (Suppl. 1), S1 (2005).
Camon, E. B. et al. An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatics 6 (Suppl. 1), S17 (2005).
Liu, M. et al. Network-based analysis of affected biological processes in type 2 diabetes models. PLoS Genet. 3, e96 (2007).
Dressman, H. K. et al. Gene expression signatures that predict radiation exposure in mice and humans. PLoS Med. 4, e106 (2007).
The Gene Ontology Consortium. Creating the gene ontology resource: design and implementation. Genome Res. 11, 1425–1433 (2001). This paper describes in more detail how the GO ontology is built and maintained in more detail.
Camon, E., Barrell, D., Lee, V., Dimmer, E. & Apweiler, R. The Gene Ontology Annotation (GOA) Database — an integrated resource of GO annotations to the UniProt Knowledgebase. In Silico Biol. 4, 5–6 (2004).
Cai, S. & Lashbrook, C. C. Stamen abscission zone transcriptome profiling reveals new candidates for abscission control: enhanced retention of floral organs in transgenic plants overexpressing Arabidopsis zinc finger protein 2. Plant Physiol. 146, 1305–1321 (2008).
Datu, B. J. et al. Transcriptional changes in the hookworm, Ancylostoma caninum, during the transition from a free-living to a parasitic larva. PLoS Negl. Trop. Dis. 2, e130 (2008).
Faustino, R. S., Behfar, A., Perez-Terzic, C. & Terzic, A. Genomic chart guiding embryonic stem cell cardiopoiesis. Genome Biol. 9, R6 (2008).
Ginos, M. A. et al. Identification of a gene expression signature associated with recurrent disease in squamous cell carcinoma of the head and neck. Cancer Res. 64, 55–63 (2004).
Li, Y. & Sarkar, F. H. Gene expression profiles of genistein-treated PC3 prostate cancer cells. J. Nutr. 132, 3623–3631 (2002).
Okada, H. et al. Genome-wide expression of azoospermia testes demonstrates a specific profile and implicates ART3 in genetic susceptibility. PLoS Genet. 4, e26 (2008).
Uddin, M. et al. Sister grouping of chimpanzees and humans as revealed by genome-wide phylogenetic analysis of brain gene expression profiles. Proc. Natl Acad. Sci. USA 101, 2957–2962 (2004).
van der Pouw Kraan, T. C. et al. Expression of a pathogen-response program in peripheral blood cells defines a subgroup of rheumatoid arthritis patients. Genes Immun. 9, 16–22 (2008).
Zhang, X. et al. Whole-genome analysis of histone H3 lysine 27 trimethylation in Arabidopsis. PLoS Biol. 5, e129 (2007).
Draghici, S., Khatri, P., Martins, R. P., Ostermeier, G. C. & Krawetz, S. A. Global functional profiling of gene expression. Genomics 81, 98–104 (2003). This paper describes how the significance of enriched or depleted terms is calculated using a number of alternative models in GO profiling.
Man, M. Z., Wang, X. & Wang, Y. POWER_SAGE: comparing statistical tests for SAGE experiments. Bioinformatics 16, 953–959 (2000).
Alexa, A., Rahnenfuhrer, J. & Lengauer, T. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22, 1600–1607 (2006). This paper explains some of the problems related to the structure of GO and proposes an approach that can be used to address them.
Grossmann, S., Bauer, S., Robinson, P. N. & Vingron, M. Improved detection of overrepresentation of Gene Ontology annotations with parent child analysis. Bioinformatics 23, 3024–3031 (2007).
Schlicker, A., Rahnenfuhrer, J., Albrecht, M., Lengauer, T. & Domingues, F. S. GOTax: investigating biological processes and biochemical activities along the taxonomic tree. Genome Biol. 8, R33 (2007).
McCarthy, F. M., Bridges, S. M. & Burgess, S. C. GOing from functional genomics to biological significance. Cytogenet. Genome Res. 117, 278–287 (2007).
Khatri, P. & Draghici, S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587–3595 (2005). This includes a detailed comparison of 14 functional profiling tools using a number of different criteria, including scope of the analysis, visualization capabilities, statistical model(s) used, correction for multiple comparisons, reference microarrays available, installation issues and sources of annotation data.
Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. (Ser. B) 57, 289–300 (1995).
Draghici, S. Data Analysis Tools for DNA Microarrays (Chapman & Hall/CRC, Boca Raton, Florida, 2003).
Farcomeni, A. A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat. Methods Med. Res. 14 Aug 2007 (doi:101177/0962280206079046).
Marcotte, E. & Date, S. Exploiting big biology: integrating large-scale biological data for function inference. Brief. Bioinform. 2, 363–374 (2001).
Markowetz, F. & Troyanskaya, O. G. Computational identification of cellular networks and pathways. Mol. Biosyst. 3, 478–482 (2007).
Srinivasan, B. S. et al. Current progress in network research: toward reference networks for key model organisms. Brief. Bioinform. 8, 318–332 (2007).
Khatri, P., Done, B., Rao, A., Done, A. & Draghici, S. A semantic analysis of the annotations of the human genome. Bioinformatics 21, 3416–3421 (2005).
Wong, S. L., Zhang, L. V. & Roth, F. P. Discovering functional relationships: biochemistry versus genetics. Trends Genet. 21, 424–427 (2005).
Myers, C. L., Barrett, D. R., Hibbs, M. A., Huttenhower, C. & Troyanskaya, O. G. Finding function: evaluation methods for functional genomic data. BMC Genomics 7, 187 (2006).
Yu, J. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79–92 (2002).
Kawai, J. et al. Functional annotation of a full-length mouse cDNA collection. Nature 409, 685–690 (2001).
Whitfield, C. W. et al. Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee. Genome Res. 12, 555–566 (2002).
Perrin, R. M. et al. Transcriptional regulation of chemical diversity in Aspergillus fumigatus by LaeA. PLoS Pathog. 3, e50 (2007).
Qin, X., Ahn, S., Speed, T. P. & Rubin, G. M. Global analyses of mRNA translational control during early Drosophila embryogenesis. Genome Biol. 8, R63 (2007).
Bender, M. A., Farach-Colton, M., Pemmasani, G., Skiena, S. & Sumazin, P. Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms 57, 75–94 (2005).
Acknowledgements
We are grateful to the GO Consortium for their efforts in developing, maintaining and making accessible the GO ontology and annotations. We thank S. Carbon and C. Mungall for their help with SQL queries to the GO database and the following individuals for feedback on this manuscript: M. Ashburner, E. Camon, P. D'Eustachio, E. Dimmer, P. Gaudet, R. Huntley, R. Lovering, C. Mungall, S. Twigger, and K. Van Auken.
Author information
Authors and Affiliations
Corresponding authors
Supplementary information
Related links
Rights and permissions
About this article
Cite this article
Yon Rhee, S., Wood, V., Dolinski, K. et al. Use and misuse of the gene ontology annotations. Nat Rev Genet 9, 509–515 (2008). https://doi.org/10.1038/nrg2363
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrg2363
This article is cited by
-
A novel approach for predicting upstream regulators (PURE) that affect gene expression
Scientific Reports (2023)
-
The challenge of managing the evolution of genomics data over time: a conceptual model-based approach
BMC Bioinformatics (2022)
-
Thousands of human non-AUG extended proteoforms lack evidence of evolutionary selection among mammals
Nature Communications (2022)
-
Delayed processing of blood samples impairs the accuracy of mRNA-based biomarkers
Scientific Reports (2022)
-
Black pepper and tarragon essential oils suppress the lipolytic potential and the type II secretion system of P. psychrophila KM02
Scientific Reports (2022)