Description

The GENCODE Genes track (version 7, May 2011) shows high-quality manual annotations merged with evidence-based automated annotations across the entire human genome generated by the GENCODE project. The GENCODE gene set presents a full merge between HAVANA manual annotation and ENSEMBL automatic annotation. Priority is given to the manually curated HAVANA annotation, using predicted ENSEMBL annotations when there are no corresponding manual annotations. The annotation was carried out on genome assembly GRCh37 (hg19).

NOTE: Due to UCSC Genome Browser using the NC_001807 mitochondrial genome sequence (chrM) and GENCODE annotating the NC_012920 mitochondrial sequence, the GENCODE mitochondrial sequences are not available in the UCSC Genome Browser. These annotations are available for download in the GENCODE GTF files.

NOTE: We try and synchronize the release cycles for GENCODE, Havana and Ensembl. This GENCODE version 7 corresponds to Ensembl 62 from 13 April 2011 and Vega 23-03-2011. Also see: GENCODE project.

Display Conventions and Configuration

The annotations are divided into separate tracks based on type of annotation. The basic set of coding and non-coding transcripts is a subset of the comprehensive set selected to provide a simplified view of the transcript set designed to suit the needs of a majority of users. The selection algorithm is described in the next section. The available tracks are:

GENCODE basic set selection

The GENCODE basic set is intended to provide a simplified subset of the GENCODE transcript annotations that will be useful to the majority of users. Selection for the GENCODE annotations to include in the basic set is done on a per-locus basis and then for coding and non-coding transcripts within that locus. The goal is to use the better quality transcript annotations while still having some annotation present for each locus.

The selection criteria for a given locus is:

Non-coding transcript categories

Non-coding transcripts are categorized using their BioType and the following criteria:

Filtering

Items in the GENCODE Basic, Comprehensive and Pseudogene tracks can be filter using the following criteria:

Coloring

The gene annotations are colored based on the annotation type:

Manual and automatic coding non-coding pseudogene problem
2-way pseudogene all
PolyA annotations all

Methods

We aim to annotate all evidence-based gene features at high accuracy on the human reference sequence. This includes identifying all protein-coding loci with associated alternative variants, non-coding loci which have transcript evidence, and pseudogenes. We integrate computational approaches (including comparative methods), manual annotation and targeted experimental verification.

For a detailed description of the methods and references used, see Harrow et al (2006).

Verification

See Harrow et al. (2006) for information on verification techniques.

Selected transcript models are verified experimentally by RT-PCR amplification followed by sequencing. Those experiments can be found at GEO:

Credits

This GENCODE release is the result of a collaborative effort among the following laboratories: (contact: GENCODE at the Sanger Institute. )

Lab/Institution Contributors
GENCODE Principal Investigator Tim Hubbard
HAVANA manual annotation group, Wellcome Trust Sanger Insitute (WTSI), Hinxton, UK Adam Frankish, Jose Manuel Gonzalez, Mike Kay, Alexandra Bignell, Gloria Despacio-Reyes, Garaub Mukherjee, Gary Sanders, Veronika Boychenko, Jennifer Harrow
Genome Bioinformatics Lab (CRG), Barcelona, Spain Thomas Derrien, Tyler Alioto, Andrea Tanzer, Roderic Guigó
Genome Bioinformatics, University of California Santa Cruz (UCSC), USA Rachel Harte, Mark Diekhans, Robert Baertsch, David Haussler
Comp. Genomics Lab, Washington University St. Louis (WUSTL), USA Jeltje van Baren, Charlie Comstock, David Lu, Michael Brent
Computer Science and Artificial Intelligence Lab, Broad Institute of MIT and Harvard, USA Mike Lin, Manolis Kellis
Computational Biology and Bioinformatics, Yale University (Yale), USA Philip Cayting, Suganthi Balasubramanian, Baikang Pei, Cristina Sisu, Mark Gerstein
Center for Integrative Genomics, University of Lausanne, Switzerland Cedric Howald, Alexandre Reymond
ENSEMBL genebuild group, Wellcome Trust Sanger Insitute (WTSI), Hinxton, UK Steve Searle, Bronwen Aken, Amonida Zadissa, Daniel Barrell
Structural Computational Biology Group, Centro Natcional de Investigaciones Oncologicas (CNIO), Madrid, Spain José Manuel Rodríguez, Michael Tress, Alfonso Valencia

References

Flicek et al. Ensembl 2011. Nucleic Acids Research. 2011;39 Database issue:D800-D806

Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9.

Data Release Policy

GENCODE data are available for use without restrictions. The full data release policy for ENCODE is available here.