Keywords
gene ontology, high school students, genomics
This article is included in the Teaching and communicating science in a digital age collection.
gene ontology, high school students, genomics
In this 4th version of the manuscript, we have added an optional exercise for a gene enrichment analysis with a step by step instructional video and a corresponding figure. There are also changes to the introduction which have been added for clarification based upon reviewer comments.
See the authors' detailed response to the review by Ruth Lovering
See the authors' detailed response to the review by Pascale Gaudet
Genomics is the branch of biology concerned with the study of genes and their functions (see the National Institutes of Health Frequently Asked Questions about Genetic and Genomic Science). Genomics arose from the acceleration of genetic research which was fueled by the development of rapid and affordable DNA sequencing technologies (Shendure et al., 2017). This opened the door to the sequencing of entire genomes. Presently, the DNA codes for thousands of genomes from diverse species have been sequenced and studied (see the National Center for Biotechnology Information Genome database).
The goals in genomics research are to address all genes and their inter-relationships in order to understand the combined influence on the function of an organism. With this newfound knowledge of the staggering number of genes that make up an organism, the Gene Ontology (GO) classification system was created by the Gene Ontology Consortium to organize genes by their similarities and differences (see Gene Ontology Consortium ‘About’ page). “Ontology” is not a commonly encountered term and there are several definitions that are related to philosophical concepts.
In the context of information science, as described here, “ontology” is concerned with the representation, formal naming and classification system with the purpose of describing the relationship categories and properties of the data. This is similar to Wikipedia which is also based on a controlled vocabulary, categories to group material by like subject matter, and parent-child terms.
The Gene Ontology information is curated, collected, validated, and annotated by the Gene Ontology Consortium in collaboration with their partners which consist of research groups, research communities and other databases (see http://geneontology.org/docs/go-consortium/
This GO classification system provides the scientific community with a structured vocabulary for defining genes (Ashburner et al., 2000; du Plessis et al., 2011; Hastings, 2017; Thomas, 2017). GO terms are commonly used in most, if not all, databases and analysis tools relevant to bioinformatics, systems biology (Wanjek, 2011), and genomics studies (du Plessis et al., 2011). GO terms are species specific and are updated monthly as biological knowledge is obtained (Gaudet et al., 2017). GO terms describe how a gene functions at the molecular level, its location within the cell, and what biological programs it is involved with. Each GO annotation is associated with an evidence code that is comprised of six categories: experimental evidence, phylogenetic evidence, computational evidence, author statements, curatorial statements, and automatically generated annotations (http://geneontology.org/docs/guide-go-evidence-codes/)
The importance of the GO term system becomes apparent when analyzing the organization of genomes and coding regions, the distribution of genes involved in specific processes and the conservation of genes across species (Gaudet et al., 2017; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5821137/). This classification system is also quite powerful when analyzing data from large scale gene expression studies (du Plessis et al., 2011) that consider co-expression data from specific tissues obtained under defined circumstances such as treatment with pharmaceutical agents, or with neurodevelopmental disorders, cancer, or diabetes as examples. GO terms are instrumental for understanding the functions of these genes.
Introducing GO terms and the gene classification system to high school students will bring them up to speed on a commonly used research tool in current genomics methods and expose them to the vast amounts of data that have been derived from genomics and systems biology studies.
In the subsequent sections we show an example of how to extract information about a gene from its associated GO terms and then provide instruction for a practical exercise which will enable students to profile a list of genes using GO terms in the bioinformatics resource DAVID, The Database for Annotation, Visualization and Integrated Discovery. This is a protocol that we teach to our high school student interns when they are evaluating gene expression data for their summer projects (Crusio et al., 2017, see BioScience Project student posters). The student research internship projects are in the context of behavioral neuroscience. Students typically work with gene expression data associated with a specific brain region or brain disorder. As an example for projects related to learning and memory, gene expression data for the hippocampus would be used. Concerning a neurodevelopmental disorder like Schizophrenia, gene expression data for the prefrontal cortex would be considered. There are many online databases that have freely available gene expression data and this could be a way to expand the scope of this tutorial. We use the Allen Brain Atlas for our primary source of gene expression data in the student internship projects.
The overall structure of GO is hierarchical and is based on parent-child terms where the parent term is broader and child term is more specialized.
GO terms group genes according to 3 categories, each of which are considered a distinct ontology: Molecular Function (MF, molecular-level activities performed by gene products), Biological Process (BP, the larger processes, or biological programs accomplished by multiple molecular activities), and Cellular Component (CC, the locations relative to cellular structures in which a gene product performs a function).
As an example, consider the GO term classification for the RAB5A gene (Figure 1). RAB5A belongs to a family of genes called RAB GTPases that are key regulators of intracellular membrane trafficking. Rabs are involved in the formation of transport vesicles and their fusion with membranes. They are enzymes and mediate their function by cycling between a GDP bound inactive and a GTP bound active state. Because of their fundamental and ubiquitous role, this family of genes are associated with many biological processes and diseases.
The GO term classification for the RAB5A gene gives:
GOTERM_BP: endocytosis, phagocytosis, small GTPase mediated signal transduction, blood coagulation, protein transport, regulation of endocytosis, synaptic vesicle recycling, viral RNA genome replication, early endosome to late endosome transport, positive regulation of exocytosis, regulation of endosome size, regulation of filopodium assembly, receptor internalization involved in canonical Wnt signaling pathway, regulation of synaptic vesicle exocytosis, regulation of autophagosome assembly
GOTERM_CC: ruffle, intracellular, cytoplasm, endosome, early endosome, cytosol, plasma membrane, synaptic vesicle, endosome membrane, actin cytoskeleton, endocytic vesicle, axon, dendrite, phagocytic vesicle membrane, somatodendritic compartment, melanosome, neuronal cell body, terminal bouton, axon terminus, membrane raft, phagocytic vesicle, extracellular exosome, cytoplasmic side of early endosome membrane.
GOTERM_MF: GTPase activity, protein binding, GTP binding, GDP binding
From the RAB5A related GO terms, we get the overall impression that this gene encodes an enzyme that is involved in signaling, transport and vesicle dynamics and is associated with cell membranes. How do we arrive at this description?
In this example, the information obtained from the MF category is that the protein product of the RAB5A gene binds to guanine nucleotides: GTP and GDP (Guanosine tri and di phosphate, respectively) and that it is an enzyme. This is evident by the “GTPase activity” term. Whenever the suffix “ase” is used in the context of a gene or protein, it refers to an enzyme, something that catalyzes a chemical reaction. For the BP category, there are several terms associated with intracellular transport, signaling, and endocytosis. Finally, the terms associated with CC include endosome and endosome-like organelles (melanosomes, synaptic vesicles, phagocytic vesicles), as well as membrane structures (ruffles, rafts).
DAVID is a database with a suite of analysis tools that groups genes based on different criteria related to GO terms. DAVID also links to other databases that contain primary source information like The Gene Ontology as well as complementary information related to pathways and human disease. In this exercise, students will use the sample gene lists (DEMOLIST1 or DEMOLIST2) that are accessible from the DAVID database to see how the Gene Ontology classification partitions a set of genes based on GO Terms. Screenshots and videos are provided for step by step instruction. We also provide a video to instruct students on analyzing a gene list in DAVID obtained from a random gene list generator.
Screenshot 1 (Figure 2). DAVID landing page. The start analysis link is accessed here and is circled in red in this image. (Video 1, Delprato et al., 2019c)
Screenshot 2 (Figure 3). Submitting a gene list. Select either DEMOLIST 1 or DEMOLIST 2 (left panel). The identifier will come up automatically because this is a demonstration list. If you are submitting your own gene list then, the identifier will have to be specified from the dropdown menu (Video 2, Delprato et al., 2019c). Typically the identifier is the “Official Gene Symbol”. Click “Gene List”, then “Submit List” (Video 1; (Delprato et al., 2019c)).
Screenshot 3 (Figure 4). Species selection. You will see a notice: “Multiple Species, have been Detected”, Highlight “Homo Sapiens” in the window, Select “Homo Sapiens” below the window (Example - DEMOLIST 1: 149 genes, highlighted in grey, left panel). Next, you will see the message “Submission Successful” (Video 1; Delprato et al., 2019c).
Screenshot 4 (Figure 5). Obtaining the results. Select “Functional Annotation Tool”, beneath the blue arrow. Next, select “Functional Annotation Table”, Bottom of the page (Video 1, Delprato et al., 2019c).
Screenshot 5 (Figure 6). Reading the output. The gene ID and the full gene name are shown in the blue bars above each entry. The GO Term BP (Biological Process), GO Term CC (Cellular Component), and GO Term MF (Molecular Function), terms are clickable descriptors and link to the Gene Ontology website. See above for a complete description of the GO categories (Video 1, Delprato et al., 2019c).
Screenshot 6 (Figure 7). Keyword search. When selecting terms for a keyword search, a more complete outcome is achieved if just a few letters are specified. For example, -”neur” will capture terms both starting with neuro and neural (Video 1, Delprato et al., 2019c). DAVID output can be searched for genes related to other process and diseases as well. Have students evaluate the gene list based on their interest. They can identify genes related to a particular process. Students may work individually or in groups.
Students may wish to try this with their own gene lists. This online gene list generator will enable students to generate a random list of genes for evaluation. (See also Video 2 for instruction; Delprato et al., 2019c)
Step 1. Specify species: Human is the default
Step 2. Specify list length: 200–500 is a good representative number. Note that DAVID will not evaluate lists with more than 2000 genes. An error message stating this will be received.
Step 3. Select “Generate”
Step 4. Copy the gene list using the “Select All” option and paste the list directly into DAVID for evaluation as described above. Make sure to select “Official Gene Symbol” as the identifier when submitting the gene list.
Students may perform a gene enrichment analysis exercise based on GO annotation of their gene list directly at the Gene Ontology site. (Video 3). The results can be viewed as a pie chart which makes for a great visual representation of the GO categories (Figure 8). Clicking on the individual sections of the pie will result in more inclusive and specific annotation. Clicking on the legends will redirect to a table that contains the gene names associated with a specific GO category (For a more in depth explanation of the results page, see http://geneontology.org/docs/go-enrichment-analysis/).
Step 1. Paste a gene list in the window of the Gene Ontology landing page (http://geneontology.org/).
Step 2. Specify the ontology (MF, CC, BP) that you would like to analyze from the drop down menu. This can be specified later as well on the results page. (Biological Process is the default).
Step 3. Select species (Homo sapiens is the default).
Step 4. Select “Launch”.
Step 5: Select the clickable number that represents your list of genes relative to the reference list to which is compared.
Step 6. Click on the pie chart above the table and specify the desired ontology from the drop down menu. Pie chart personalization and options for extended information are described in the text above the pie chart. Pie chart and legends may be saved as screenshots.
We polled 12 student interns from our summer program for feedback on their experience working with the GO system and DAVID. The survey consists of 5 direct yes or no response questions and two open ended questions. The responses are show in Table 1, Table 2, and Table 3.
For the direct response questions, 100% of the students had not worked or heard of the DAVID database (Ques1, Table 1.) 83% of the students had not heard of the GO system. Two students had heard of the GO system in an advanced placement biology class but had not explored it further (Ques2, Table1). 100% of the students responded that they benefited intellectually from working with the GO system and DAVID tools (Ques3, Table1) and also that they had enjoyed the experience (Ques 4, Table 1). 92% of the students thought that the experience would better prepare them for future database use. One student indicated that this was not applicable (Quest 5, Table 1).
For the open ended survey questions 3a and 4a (Table 2 and Table 3), students were asked to explain how they benefited from working with the databases and tools and what about the experience they had enjoyed. A two step process was used to analyze their answers. First, the responses to each question were grouped and an online text analyzer text analyzer was used to assess the words occurring with the highest frequency (Workbooks 1 and 2) (Delprato et al., 2019a; Delprato et al., 2019b). Words of 2 or fewer characters were not considered in the analysis. In a subsequent step, a spreadsheet for coding open ended survey questions was used to organize the results from the text analysis (Workbooks 1 and 2).
Response category words chosen represent replies to the question asked. Words containing the same root such as learn, learned, learnt, learning were grouped. The response categories selected for question 3a that may provide insight into why students believe they have benefited intellectually are: expose (5), analyze (4) understand (3), help (3), experience (2), and gained (2) (Workbook 1).
For question 4a, the response categories chosen which may provide insight into why students enjoyed the experience are: interesting (6), gained (2), new (2) and explore (2). Other adjectives used in single responses were: interactive, rewarding, and refreshing (Workbook 2).
A basic entry and exit ticket method is suggested to determine what students know about genomics and genes before the lesson as well as what they have learned: main points, questions they may have and what they found most interesting. Sample questions are provided in what follows.
A sequence of DNA or RNA which codes for a molecule that has a function. A gene is the basic physical and functional unit of heredity
Study of the full set of the genes and DNA in an organism
~20,000
Yes, but people have different alleles. Alleles are the variation of a gene resulting from mutations. As an example consider eye color. We all have the gene for eye color but some of us have brown, blue or green eyes and there are different shades and hues within those categories.
1. What were the main points of the lesson?
2. Do you have any questions?
3. What aspect of this lesson did you find most interesting?
4. Make up a gene and describe it using Gene Ontology classifiers for the 3 categories Molecular Function (MF), Biological Process (BP), and Cellular Component CC)
5. Why do some genes have many classifiers while others do not?
6. For the Biological Process – BP category, what are the classifiers based on, i.e., how are they derived?
7. How do you think you could use this database in a high school research project?
We describe a procedure for students to become acquainted with the Gene Ontology classification system which is widely used in genomics and systems biology research to characterize gene function. Grouping genes with GO Terms and the DAVID database is based on a protocol that we use with our summer interns to profile gene expression data related to behavioral neuroscience studies (Crusio et al., 2017). Grouping genes in this way identifies genes that function in like processes and also provides information about the overall distribution of a set of genes associated with a particular tissue or process. This tutorial will familiarize early stage students with a biological database and teach them how to mine it and extract useful information from a sample list of genes.
Survey response data from twelve students indicate that they believe they have benefited intellectually from this work and that they enjoy this type of learning experience. Based on coding of the open ended survey responses, the underlying reasons are because they have learned something beneficial and that they find it interesting. The majority of the students (92%) also state that as a result of this experience, they are better prepared for future database use. Entry and exit ticket questions designed to assess student prior and post knowledge as well as stimulate ideas on how these databases and tools may be used in a research project, are included as a formative assessment strategy.
Figshare: Workbook 1. Coding of student responses to question 3a. https://doi.org/10.6084/m9.figshare.8166611.v1 (Delprato et al., 2019a)
This project contains the following underlying data:
Coding of student responses to question 3a.xlsx (Coding of responses to open ended student survey question 3a)
Figshare: Workbook 2. Coding of student responses to question 4a. https://doi.org/10.6084/m9.figshare.8166650.v1 (Delprato et al., 2019b)
This project contains the following underlying data:
Coding of student responses to question 4a.xlsx (Coding of responses to open ended student survey question 4a)
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
Extended data is available from figshare
Figshare: Extended data 1. Video 1: GeneSetProfiling Analysis Instructional video for using DAVID to obtain Gene Ontology classifiers for a sample geneset which is provided by the DAVID site https://doi.org/10.6084/m9.figshare.8166650.v1 (Delprato et al., 2019c)
Figshare: Extended data 2. Video 2. UploadGeneSet Instructional video for generating a random geneset and submitting this geneset to DAVID for Gene Ontology classification, https://doi.org/10.6084/m9.figshare.7649231.v1 (Delprato et al., 2019d)
Figshare: Extended data 3. Video 3. GeneEnrichment Instructional video for performing a gene enrichment exercise at the Gene Ontology site (http://geneontology.org/) https://doi.org/10.6084/m9.figshare.9172121 (Delprato et al., 2019e)
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Biocuration, gene ontology.
Is the rationale for developing the new method (or application) clearly explained?
Partly
Is the description of the method technically sound?
No
Are sufficient details provided to allow replication of the method development and its use by others?
No
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?
Partly
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Biocuration, gene ontology.
Is the rationale for developing the new method (or application) clearly explained?
Yes
Is the description of the method technically sound?
Yes
Are sufficient details provided to allow replication of the method development and its use by others?
Partly
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?
Yes
References
1. Lovering RC, Roncaglia P, Howe DG, Laulederkind SJF, et al.: Improving Interpretation of Cardiac Phenotypes and Enhancing Discovery With Expanded Knowledge in the Gene Ontology.Circ Genom Precis Med. 11 (2): e001813 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Biocuration, Gene Ontology, molecular genetics, functional gene analysis.
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Molecular microbiology; STEM outreach.
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Neuroscience and pedagogy
Is the rationale for developing the new method (or application) clearly explained?
Yes
Is the description of the method technically sound?
No
Are sufficient details provided to allow replication of the method development and its use by others?
Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?
No source data required
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?
No
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Molecular microbiology; STEM outreach
Is the rationale for developing the new method (or application) clearly explained?
Yes
Is the description of the method technically sound?
Partly
Are sufficient details provided to allow replication of the method development and its use by others?
Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?
No source data required
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Neuroscience and pedagogy
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||||
---|---|---|---|---|
1 | 2 | 3 | 4 | |
Version 4 (revision) 06 Aug 19 |
read | |||
Version 3 (revision) 24 May 19 |
read | read | ||
Version 2 (revision) 15 Apr 19 |
read | read | ||
Version 1 01 Mar 19 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
We would like to thank the reviewers for reviewing our article and for providing insightful and constructive feedback. We have taken your recommendations into account and have revised ... Continue reading Response to Reviewers
We would like to thank the reviewers for reviewing our article and for providing insightful and constructive feedback. We have taken your recommendations into account and have revised our article accordingly. We believe that the revised version is much stronger and of more value to the academic community than the originally submitted article.
Reviewer 1
1) The use of Entry Ticket and Exit Ticket questions is an excellent pedagogical idea. The questions, however, seem a bit simplistic.
Response: We agree with the reviewer’s comment that the Entry and Exit Tickets were simplistic. To address this we have added additional questions that we think are more challenging.
2) A bigger problem with the article is that there are no data showing how using the DAVID tool in teaching has produced changes in measurable ways. Do students do better on tests of content knowledge? Are students more sophisticated in their thinking as a function of using the DAVID tool? Are they better able to navigate other genomic tools? Do they simply enjoy the lessons more than with traditional instruction? Many of these questions can be answered with a simple pre- and posttest quiz or even comparing current scores on tests/evaluations to historical scores. This addition would make this article of much more value to educators who could then see if incorporating this tool into their pedagogy is worthwhile.
Response: To address this point, we have conducted and provided the results of a student survey designed to determine the viability of the method. The survey questions and responses are provided in the newly added Tables, 1, 2, and 3. To summarize the results, students report that they benefited intellectually from this tutorial, they found it enjoyable and feel better prepared for future database work.
Reviewer 2
1) This article presents an interesting way to introduce gene ontologies and related vocabulary to high school students to enhance biological education. While the proposal describes the method in detail, there is no data presented to show that the method actually works. While this is an interesting protocol, it is not a fully fleshed out methods article. The authors will need to present data to show the viability of the method, or resubmit potentially as a protocol.
Response: This point has been addressed in our response to Reviewer 1.
A few more specific comments about the article are listed below:
1) The abstract is worded as more of a proposal rather than a paper.
Response: The abstract has been revised to make it sound less like a proposal and we have incorporated the information obtained from the survey responses.
2) The protocol seems very much like following a set of directions (not very thought-provoking or inquiry-based), without relating it to the neuroscience. I think the paper would be much stronger if the authors were to relate it back to the behavioral neuroscience projects that are mentioned in the abstract.
Response: The method presented in this article is intended to introduce students to an unfamiliar and sophisticated topic. This instruction may serve as a stepping stone for inquiry-based projects.
We expanded the set of Exit Ticket questions to stimulate ideas on how these databases and tools may be used in a research project.
A paragraph to relate the tutorial back to the behavioral neuroscience has been added.
We also provide a suggestion for expanding the scope of the tutorial into an inquiry-based project.
We would like to thank the reviewers for reviewing our article and for providing insightful and constructive feedback. We have taken your recommendations into account and have revised our article accordingly. We believe that the revised version is much stronger and of more value to the academic community than the originally submitted article.
Reviewer 1
1) The use of Entry Ticket and Exit Ticket questions is an excellent pedagogical idea. The questions, however, seem a bit simplistic.
Response: We agree with the reviewer’s comment that the Entry and Exit Tickets were simplistic. To address this we have added additional questions that we think are more challenging.
2) A bigger problem with the article is that there are no data showing how using the DAVID tool in teaching has produced changes in measurable ways. Do students do better on tests of content knowledge? Are students more sophisticated in their thinking as a function of using the DAVID tool? Are they better able to navigate other genomic tools? Do they simply enjoy the lessons more than with traditional instruction? Many of these questions can be answered with a simple pre- and posttest quiz or even comparing current scores on tests/evaluations to historical scores. This addition would make this article of much more value to educators who could then see if incorporating this tool into their pedagogy is worthwhile.
Response: To address this point, we have conducted and provided the results of a student survey designed to determine the viability of the method. The survey questions and responses are provided in the newly added Tables, 1, 2, and 3. To summarize the results, students report that they benefited intellectually from this tutorial, they found it enjoyable and feel better prepared for future database work.
Reviewer 2
1) This article presents an interesting way to introduce gene ontologies and related vocabulary to high school students to enhance biological education. While the proposal describes the method in detail, there is no data presented to show that the method actually works. While this is an interesting protocol, it is not a fully fleshed out methods article. The authors will need to present data to show the viability of the method, or resubmit potentially as a protocol.
Response: This point has been addressed in our response to Reviewer 1.
A few more specific comments about the article are listed below:
1) The abstract is worded as more of a proposal rather than a paper.
Response: The abstract has been revised to make it sound less like a proposal and we have incorporated the information obtained from the survey responses.
2) The protocol seems very much like following a set of directions (not very thought-provoking or inquiry-based), without relating it to the neuroscience. I think the paper would be much stronger if the authors were to relate it back to the behavioral neuroscience projects that are mentioned in the abstract.
Response: The method presented in this article is intended to introduce students to an unfamiliar and sophisticated topic. This instruction may serve as a stepping stone for inquiry-based projects.
We expanded the set of Exit Ticket questions to stimulate ideas on how these databases and tools may be used in a research project.
A paragraph to relate the tutorial back to the behavioral neuroscience has been added.
We also provide a suggestion for expanding the scope of the tutorial into an inquiry-based project.