the Gene Ontology

  • Open menus
  • Home
  • FAQ
  • Downloads
  • Ontologies
  • Annotations
  • Database
  • Mappings to GO
  • Teaching Resources
  • Other files
  • FTP and CVS downloads
  • Tools
  • Browsers
  • Microarray tools
  • Annotation tools
  • Other tools
  • Submit New Tools
  • Documentation
  • Introduction
  • Annotation Guide
  • Evidence Code Guide
  • Component Ontology
  • Function Ontology
  • Process Ontology
  • File Format Guide
  • GO Database Guide
  • GO Slim Guide
  • Meeting minutes
  • Editorial Style Guide
  • About GO
  • GO Consortium
  • Publications
  • Citation Policy
  • Mailing lists
  • Interest Groups
  • GO People
  • Funding
  • Acknowledgements
  • Newsletter
  • Projects
  • Cardiovascular
  • Immunology
  • Reference Genomes
  • Contact GO
  • Site Map

An Introduction to the Gene Ontology

  • What does the Gene Ontology Consortium do?
  • Terms in the Gene Ontology
  • Obsolete terms
  • The Ontologies
  • Cellular component
  • Biological process
  • Molecular function
  • Ontology structure
  • Topology
  • Term-Term Relationships
  • Relationship Transitivity
  • What GO is NOT
  • Annotation and tools
  • Downloads
  • Beyond GO
  • Cross-products
  • Mappings to other classification systems
  • Contributing to GO

What does the Gene Ontology Consortium do?

Biologists currently waste a lot of time and effort in searching for all of the available information about each small area of research. This is hampered further by the wide variations in terminology that may be common usage at any given time, which inhibit effective searching by both computers and people. For example, if you were searching for new targets for antibiotics, you might want to find all the gene products that are involved in bacterial protein synthesis, and that have significantly different sequences or structures from those in humans. If one database describes these molecules as being involved in 'translation', whereas another uses the phrase 'protein synthesis', it will be difficult for you - and even harder for a computer - to find functionally equivalent terms.

The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The project began as a collaboration between three model organism databases, FlyBase [external website] (Drosophila), the Saccharomyces Genome Database [external website] (SGD) and the Mouse Genome Database [external website] (MGD), in 1998. Since then, the GO Consortium has grown to include many databases, including several of the world's major repositories for plant, animal and microbial genomes. See the GO Consortium page for a full list of member organizations.

The GO project has developed three structured controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner. There are three separate aspects to this effort: first, the development and maintenance of the ontologies themselves; second, the annotation of gene products, which entails making associations between the ontologies and the genes and gene products in the collaborating databases; and third, development of tools that facilitate the creation, maintenance and use of ontologies.

The use of GO terms by collaborating databases facilitates uniform queries across them. The controlled vocabularies are structured so that they can be queried at different levels: for example, you can use GO to find all the gene products in the mouse genome that are involved in signal transduction, or you can zoom in on all the receptor tyrosine kinases. This structure also allows annotators to assign properties to genes or gene products at different levels, depending on the depth of knowledge about that entity.

Back to top

Terms in the Gene Ontology

The building blocks of the Gene Ontology are the terms, so what makes up a GO term?

Each entry in GO has a unique numerical identifier of the form GO:nnnnnnn, and a term name, e.g. cell, fibroblast growth factor receptor binding or signal transduction. Each term is also assigned to one of the three ontologies, molecular function, cellular component or biological process.

The majority of terms have a textual definition, with references stating the source of the definition. If any clarification of the definition or remarks about term usage are required, these are held in a separate comments field.

Many GO terms have synonyms; GO uses 'synonym' in a loose sense, as the names within the synonyms field may not mean exactly the same as the term they are attached to. Instead, a GO synonym may be broader or narrower than the term string; it may be a related phrase; it may be alternative wording, spelling or use a different system of nomenclature; or it may be a true synonym. This flexibility allows GO synonyms to serve as valuable search aids, as well as being useful for applications such as text mining and semantic matching. The relationship of the synonym to the term is recorded within the GO file.

The scope of the Gene Ontology overlaps with a number of other databases, and in cases where a GO term is identical in meaning to an object in another database, a database cross reference is added to the term. These cross references can also be downloaded from the mappings to GO page.

Obsolete terms

Occasionally, a term is found that is outside the scope of GO, is misleadingly named or defined, or describes a concept that would be better represented in another way. Rather than delete the term, it is deprecated or made obsolete. The term and ID still exist in the GO database, but the term is marked as obsolete, and a comment is often added, giving a reason for the obsoletion. A replacement term is usually also suggested.

Back to top

The Ontologies

The three organizing principles of GO are cellular component, biological process and molecular function. A gene product might be associated with or located in one or more cellular components; it is active in one or more biological processes, during which it performs one or more molecular functions. For example, the gene product cytochrome c can be described by the molecular function term oxidoreductase activity, the biological process terms oxidative phosphorylation and induction of cell death, and the cellular component terms mitochondrial matrix and mitochondrial inner membrane.

Cellular component

A cellular component is just that, a component of a cell, but with the proviso that it is part of some larger object; this may be an anatomical structure (e.g. rough endoplasmic reticulum or nucleus) or a gene product group (e.g. ribosome, proteasome or a protein dimer). See the documentation on the cellular component ontology for more details.

Biological process

A biological process is series of events accomplished by one or more ordered assemblies of molecular functions. Examples of broad biological process terms are cellular physiological process or signal transduction. Examples of more specific terms are pyrimidine metabolic process or alpha-glucoside transport. It can be difficult to distinguish between a biological process and a molecular function, but the general rule is that a process must have more than one distinct steps.

A biological process is not equivalent to a pathway; at present, GO does not try to represent the dynamics or dependencies that would be required to fully describe a pathway.

Further information can be found in the process ontology documentation.

Molecular function

Molecular function describes activities, such as catalytic or binding activities, that occur at the molecular level. GO molecular function terms represent activities rather than the entities (molecules or complexes) that perform the actions, and do not specify where or when, or in what context, the action takes place. Molecular functions generally correspond to activities that can be performed by individual gene products, but some activities are performed by assembled complexes of gene products. Examples of broad functional terms are catalytic activity, transporter activity, or binding; examples of narrower functional terms are adenylate cyclase activity or Toll receptor binding.

It is easy to confuse a gene product name with its molecular function, and for that reason many GO molecular functions are appended with the word "activity". The documentation on gene products explains this confusion in more depth. The documentation on the function ontology explains more about GO functions and the rules governing them.

Back to top

Ontology structure

Topology

The ontologies are structured as directed acyclic graphs, which are similar to hierarchies but differ in that a more specialized term (child) can be related to more than one less specialized term (parent). For example, the biological process term hexose biosynthetic process has two parents, hexose metabolic process and monosaccharide biosynthetic process. This is because biosynthetic process is a type of metabolic process and a hexose is a type of monosaccharide. When any gene involved in hexose biosynthetic process is annotated to this term, it is automatically annotated to both hexose metabolic process and monosaccharide biosynthetic process.

Term-Term Relationships

GO terms can be linked by five types of relationships: is_a, part_of, regulates, positively_regulates and negatively_regulates.

is_a

The is_a relationship is a simple class-subclass relationship, where A is_a B means that A is a subclass of B; for example, nuclear chromosome is_a chromosome.

GO:0043232 : intracellular non-membrane-bound organelle
[i] GO:0005694 : chromosome
---[i] GO:0000228 : nuclear chromosome

part_of

The part_of relationship is slightly more complex; C part_of D means that whenever C is present, it is always a part of D, but C does not always have to be present. An example would be periplasmic flagellum part_of periplasmic space:

GO:0044464 : cell part
[i] GO:0042995 : cell projection
---[i] GO:0019861 : flagellum
------[i] GO:0009288 : flagellin-based flagellum
---------[i] GO:0055040 : periplasmic flagellum
[i] GO:0042597 : periplasmic space
---[p] GO:0055040 : periplasmic flagellum

When a periplasmic flagellum is present, it is always part_of a periplasmic space. However, every periplasmic space does not necessarily have a periplasmic flagellum.

regulates, positively_regulates and negatively_regulates

The regulates, positively_regulates and negatively_regulates relationships describe interactions between biological processes and other biological processes, molecular functions or biological qualities. When a biological process E regulates a function or a process F, it modulates the occurrence of F. If F is a biological quality, then E modulates the value of F. An example of the regulation of a biological process would be the term regulation of transcription. When regulation of transcription occurs, it always alters the rate, extent or frequency at which a gene is transcribed.

Relationship Transitivity

is_a and part_of

The is_a and part_of relationships are transitive, which means that the relationships are propagated from children terms to parent terms. An example of is_a transitivity is shown in the nuclear chromosome example previously used:

GO:0043232 : intracellular non-membrane-bound organelle
[i] GO:0005694 : chromosome
---[i] GO:0000228 : nuclear chromosome

All nuclear chromosomes must be intracellular non-membrane-bound organelles.

An example of part_of transitivity is shown below:

GO:0048869 : cellular developmental process
[i] GO:0030154 : cell differentiation
---[p] GO:0048468 : cell development
------[p] GO:0000904 : cellular morphogenesis during differentiation

Every occurrence of cellular morphogenesis during differentiation must be a part of an occurrence of cell differentiation.

regulates, positively_regulates and negatively_regulates

The regulates relationships are transitive over both the part_of and is_a relationships.

GO:0010467 : gene expression
[r] GO:0010468 : regulation of gene expression
---[i] GO:0045449 : regulation of transcription
[p] GO:0006350 : transcription
---[r] GO:0045449 : regulation of transcription

part_of transitivity: If process Y exists in the GO biological process ontology and it is a part_of child of process X then any process that regulates process Y also regulates process X.

In the example above, regulation of transcription regulates transcription which is part_of gene expression. Therefore, regulation of transcription also regulates gene expression.

is_a transitivity: If process B exists in the GO biological process ontology and it is an is_a child of process A then any process that regulates process B also regulates process A.

In the example above, regulation of transcription is_a form of regulation of gene expression, which regulates gene expression. Therefore, regulation of transcription also regulates gene expression.

Transitivity of regulates

The regulates relationship is transitive over both the is_a and part_of relationships.

is_a transitivity: If process B exists in the GO biological process ontology and it is an is_a child of process A then any process that regulates process B also regulates process A. For example:

GO:0016049 : cell growth
[i] GO:0042815 : bipolar cell growth
---[r] GO:0051516 : regulation of bipolar cell growth

Due to is_a transitivity, we can say that any process that regulates bipolar cell growth also regulates cell growth.

part_of transitivity: If process Y exists in the GO biological process ontology and it is a part_of child of process X then any process that regulates process Y also regulates process X.

GO:0001754 : eye photoreceptor cell differentiation
[p] GO:0042462 : eye photoreceptor cell development
---[r] GO:0042478 : regulation of eye photoreceptor cell development

Every GO term must obey the true path rule: if the child term describes the gene product, then all its parent terms must also apply to that gene product.

Back to top

What GO is NOT

It is important to clearly state the scope of GO, and what it does and does not cover. The ontologies section explains the domains covered by GO; the following areas are outside the scope of GO, and terms in these domains would not appear in the ontologies.

  • Gene products: e.g. cytochrome c is not in the ontologies, but attributes of cytochrome c, such as oxidoreductase activity, are.
  • Processes, functions or components that are unique to mutants or diseases: e.g. oncogenesis is not a valid GO term because causing cancer is not the normal function of any gene.
  • Attributes of sequence such as intron/exon parameters: these are not attributes of gene products and will be described in a separate sequence ontology (see the OBO website [external website] for more information).
  • Protein domains or structural features.
  • Protein-protein interactions.
  • Environment, evolution and expression.
  • Anatomical or histological features above the level of cellular components, including cell types.

GO is not a database of gene sequences, nor a catalog of gene products. Rather, GO describes how gene products behave in a cellular context.

GO is not a dictated standard, mandating nomenclature across databases. Groups participate because of self-interest, and cooperate to arrive at a consensus.

GO is not a way to unify biological databases (i.e. GO is not a 'federated solution'). Sharing vocabulary is a step towards unification, but is not, in itself, sufficient. Reasons for this include the following:

  • Knowledge changes and updates lag behind.
  • Individual curators evaluate data differently. While we can agree to use the word 'kinase', we must also agree to support this by stating how and why we use 'kinase', and consistently apply it. Only in this way can we hope to compare gene products and determine whether they are related.
  • GO does not attempt to describe every aspect of biology; its scope is limited to the domains described above.

Back to top

Annotation and tools

How do the terms in GO become associated with their appropriate gene products? Collaborating databases annotate their genes or gene products with GO terms, providing references and indicating what kind of evidence is available to support the annotations. More information can be found in the GO Annotation Guide.

If you browse any of the contributing databases, you'll find that each gene or gene product has a list of associated GO terms. Each database also publishes downloadable files containing these associations; these can be downloaded from the GO annotations page. You can browse the ontologies using a range of web-based browsers. A full list of these, and other tools for analyzing gene function using GO, is available on the GO Tools section.

In addition, the GO consortium has prepared GO slims, 'slimmed down' versions of the ontologies that allow you to annotate genomes or sets of gene products to gain a high-level view of gene functions. Using GO slims you can, for example, work out what proportion of a genome is involved in signal transduction, biosynthesis or reproduction. See the GO Slim Guide for more information.

Back to top

Downloads

All data from the GO project is freely available. You can download the ontology data in a number of different formats, including XML and mySQL, from the GO Downloads page. For more information on the syntax of these formats, see the GO File Format Guide.

If you need lists of the genes or gene products that have been associated with a particular GO term, the Current Annotations table tracks the number of annotations and provides links to the gene association files for each of the collaborating databases is available.

Back to top

Beyond GO

GO allows us to annotate genes and their products with a limited set of attributes. For example, GO does not allow us to describe genes in terms of which cells or tissues they're expressed in, which developmental stages they're expressed at, or their involvement in disease. It is not necessary for GO to do these things because other ontologies are being developed for these purposes. The GO consortium supports the development of other ontologies and makes its tools for editing and curating ontologies freely available. A list of freely available ontologies that are relevant to genomics and proteomics and are structured similarly to GO can be found at the Open Biomedical Ontologies website [external website]. A larger list, which includes the ontologies listed at OBO and also other controlled vocabularies that do not fulfill the OBO criteria is available at the Ontology Working Group [external website] section of the Microarray Gene Expression Data (MGED) Network site [external website].

Cross-products

The existence of several ontologies will also allow us to create 'cross-products' that maximize the utility of each ontology while avoiding redundancy. For example, by combining the developmental terms in the GO process ontology with a second ontology that describes Drosophila anatomical structures, we could create an ontology of fly development. We could repeat this process for other organisms without having to clutter up GO with large numbers of species-specific terms. Similarly, we could create an ontology of biosynthetic pathways by combining the biosynthesis terms in the GO process ontology with a chemical ontology.

Mappings to other classification systems

GO is not the only attempt to build structured controlled vocabularies for genome annotation, nor is it the only such series of catalogs in current use. The GO project provides mappings between GO and these other systems, although we caution that these mappings are neither complete nor exact and should only to be used as a guide.

Back to top

Contributing to GO

The GO project is constantly evolving, and we welcome feedback from all users. If you need a new term or definition, or would like to suggest that we reorganize a section of one of the ontologies, please do so through the GO curator requests tracker [external website]. Any errors or omissions in annotations should be reported to the GO annotation mailing list.

Any other questions or suggestions should be addressed to the GO helpdesk.

Back to top


Open Biomedical Ontologies logo

Last modified Tuesday, 25-Nov-2008 07:10:15 PST
Cite GO • Terms of use • GO helpdesk
Copyright © 1999-Friday, 05-Dec-2008 01:11:32 PST the Gene Ontology