the Gene Ontology

  • Open menus
  • Home
  • FAQ
  • Downloads
  • Ontologies
  • Annotations
  • Database
  • Mappings to GO
  • Teaching Resources
  • Other files
  • FTP and CVS downloads
  • Tools
  • Browsers
  • Microarray tools
  • Annotation tools
  • Other tools
  • Submit New Tools
  • Documentation
  • Introduction
  • Annotation Guide
  • Evidence Code Guide
  • Component Ontology
  • Function Ontology
  • Process Ontology
  • File Format Guide
  • GO Database Guide
  • GO Slim Guide
  • Meeting minutes
  • Editorial Style Guide
  • About GO
  • GO Consortium
  • Publications
  • Citation Policy
  • Mailing lists
  • Interest Groups
  • GO People
  • Funding
  • Acknowledgements
  • Newsletter
  • Projects
  • Cardiovascular
  • Immunology
  • Reference Genomes
  • Contact GO
  • Site Map

The GO Flat File Format

This format is now deprecated and the use of the OBO format is recommended.

  • Introduction
  • File front matter
  • GO format ontology files
  • Relationships between terms
  • Line syntax
  • GO format definition files

Introduction

The structure of the old GO flat files was designed with an eye towards ease of editing in a plain text editor. The indentation scheme allowed curators to easily see the structure of the DAG, and a fair amount of redundant information allowed a curator to visualize the term they were working on without having to constantly review the entire file. The individual ontologies are held in separate files and the definitions are kept in a further separate file:

Biological Process (process.ontology)
Molecular Function (function.ontology)
Cellular Component (component.ontology)
Definitions (GO.defs)

Back to top

File front matter

The first lines of each file carry information about the version, the date of last update, (optionally) the source of the file, the name of the database, the domain of the file and the editors of the file. Comment lines start with a !. These lines are present in both the ontology files and the definitions file.

Here's an example of the front matter of a GO flat file:

!autogenerated-by:  DAG-Edit version 1.315
!saved-by:          midori
!date:              Fri Jan 03 17:14:37 GMT 2003
!version:           $ Revision: 1.17 $
!type: % ISA Is a
!type: < PARTOF Part of

Back to top

GO format ontology files

Following the comments in the ontology files is a line beginning with a $, reflecting the domain and aspect of the ontology:

$Gene_Ontology ; GO:0003673

Relationships between terms

In the GO flat files, the symbol % is used to represent an is_a relationship and the symbol < a part_of relationship. For more information on the relationships used in GO, please see the documentation on relationships.

Parent-child relationships between terms are represented by indentation:

parent_term
 child_term

is-a relationships

%term0
 %term1

means that term1 is_a (is a subclass of) term0

%term0
 %term1 % term2

means that term1 is_a term0 and is_a term2.

part-of relationships

%term0
 <term1

means that term1 is part_of term0.

%term0
 <term1 < term2 < term3

means that term1 is part_of term0 and part_of term2 and term3.

Line syntax

Each line of the flat file contains, at mininum, a GO term string and ID, the relationship type and a certain level of indentation.

Secondary IDs are shown after the primary ID:

%term name ; termID, secondaryID, secondaryID

If a term has synonyms, they are written after the term information:

%term name ; termID ; synonym:[synonym1] ; synonym:[synonym2]

The syntax for database cross-references is

%term name ; termID ; database_abbreviation:identifier ; database_abbreviation:identifier

The syntax for relationships to other terms is

%term name ; termID [R] parentTerm1 ; parentTermID1 [R] parentTerm2 ; parentTermID2

where [R] represents the relationship symbol % or <

The order in which items appear on a line (where [item] indicates optional items, (X|Y) are alternatives, and * means one or more may be present) is:

(<|%)term ; primaryID[, secondaryID]* [; db cross ref]* [; synonym:text]* [ (<|%) term]*

An example from the molecular function ontology (would appear as a single line in the file):

%peroxidase activity ; GO:0004601, GO:0016685 ; EC:1.11.1.7 ; MetaCyc:PEROXID-RXN ; synonym:myeloperoxidase activity ; synonym:peroxidase reaction % antioxidant activity ; GO:0016209

  • peroxidase activity ; GO:0004601 is the term name and ID
  • GO:0016685 is a secondary ID for GO:0004601
  • EC:1.11.1.7 and MetaCyc:PEROXID-RXN are cross-references to equivalent objects in other databases
  • myeloperoxidase activity and peroxidase reaction are synonyms for peroxidase activity
  • % antioxidant activity ; GO:0016209 indicates the term is an is_a child of antioxidant activity

Back to top

GO format definition files

The definitions for terms in all three ontology files are stored in the GO.defs file. Each definition must contain the following:

term
the name of the term to which the definition refers
goid
the term's unique identifier
definition
the definition of the term
definition_reference
one or more references for the definition

A definition may also have a comment:

comment
text [see comment syntax]

An example definition:

term: unfolded protein response
goid: GO:0030968
definition: The series of molecular signals generated as a consequence of the presence of unfolded proteins in the endoplasmic reticulum (ER) or other ER-related stress; results in changes in the regulation of transcription and translation.
definition_reference: GOC:mah
definition_reference: PMID:12042763
comment: Note that this term should not be confused with 'response to unfolded protein ; GO:0006986', which refers to any cellular response to the presence of unfolded proteins anywhere in the cell. Also see 'ER-associated protein catabolism ; GO:0030433'.

Back to top


Open Biomedical Ontologies logo

Last modified Thursday, 10-May-2007 00:40:40 PDT
Cite GO • Terms of use • GO helpdesk
Copyright © 1999-Monday, 05-Jan-2009 22:27:09 PST the Gene Ontology