The aim of the GENCODE Genes project (Harrow et al., 2006) is to produce a set of highly accurate annotations of evidence-based gene features on the human reference genome. This includes the identification of all protein-coding loci with associated alternative splice variants, non-coding with transcript evidence in the public databases (NCBI/EMBL/DDBJ) and pseudogenes. A high quality set of gene structures is necessary for many research studies such as comparative or evolutionary analyses, or for experimental design and interpretation of the results.
The GENCODE Genes tracks display the high-quality manual annotations merged with evidence-based automated annotations across the entire human genome. The GENCODE gene set presents a full merge between HAVANA manual annotation and Ensembl automatic annotation. Priority is given to the manually curated HAVANA annotation using predicted Ensembl annotations when there are no corresponding manual annotations. With each release, there is an increase in the number of annotations that have undergone manual curation. This annotation was carried out on the GRCh37 (hg19) genome assembly.
Experimental verification details are given in each descriptions for each track. Transcript Support Levels were determined for version 10 onwards based on evidence provided by GenBank mRNA and EST sequences. Versions 7 and 10 are being used in data analysis by the ENCODE consortium.
NOTE: Due to the UCSC Genome Browser using the NC_001807 mitochondrial genome sequence (chrM) and GENCODE annotating the NC_012920 mitochondrial sequence, the GENCODE mitochondrial sequences are not available in the UCSC Genome Browser. These annotations are available for download in the GENCODE GTF files.These are multi-view composite tracks that contain differing data sets (views). Instructions for configuring multi-view tracks are here. Only some subtracks are shown by default. The user can select which subtracks are displayed via the display controls on the track details pages. Further details on display conventions and data interpretation are available in the track descriptions.
GENCODE version 14
corresponds to Ensembl 69 from October 2012
and Vega 49 from September 2012 and is the most current release.
GENCODE version 12
corresponds to Ensembl 67 from May 2012
and Vega 47 from April 2012 and is used in ENCODE analysis.
GENCODE version 10
corresponds to Ensembl 65 from December 2011
and Vega 45 from October 2011 and is used in ENCODE analysis.
GENCODE version 7
corresponds to Ensembl 62 from April 2011 and Vega 42
from March 2011 and is used in ENCODE analysis.
See also: The GENCODE Project Release History.
These GENCODE releases are the result of a collaborative effort among the following laboratories:
Lab/Institution | Contributors |
GENCODE Principal Investigator | Tim Hubbard |
HAVANA manual annotation group, Wellcome Trust Sanger Insitute (WTSI), Hinxton, UK | Adam Frankish, Jose Manuel Gonzalez, Mike Kay, Alexandra Bignell, Gloria Despacio-Reyes, Garaub Mukherjee, Gary Sanders, Veronika Boychenko, Jennifer Harrow |
Genome Bioinformatics Lab (CRG), Barcelona, Spain | Thomas Derrien, Tyler Alioto, Andrea Tanzer, Roderic Guigó |
Genome Bioinformatics, University of California Santa Cruz (UCSC), USA | Rachel Harte, Mark Diekhans, Robert Baertsch, David Haussler |
Computational Genomics Lab, Washington University, St. Louis (WUSTL), USA | Jeltje van Baren, Charlie Comstock, David Lu, Michael Brent |
Computer Science and Artificial Intelligence Lab, Broad Institute of MIT and Harvard, USA | Mike Lin, Manolis Kellis |
Computational Biology and Bioinformatics, Yale University (Yale), USA | Philip Cayting, Suganthi Balasubramanian, Baikang Pei, Cristina Sisu, Mark Gerstein |
Center for Integrative Genomics, University of Lausanne, Switzerland | Cedric Howald, Alexandre Reymond |
Ensembl genebuild group, Wellcome Trust Sanger Insitute (WTSI), Hinxton, UK | Steve Searle, Bronwen Aken, Amonida Zadissa, Daniel Barrell |
Structural Computational Biology Group, Centro Nacional de Investigaciones Oncologicas (CNIO), Madrid, Spain | José Manuel Rodríguez, Michael Tress, Alfonso Valencia |
Contact: GENCODE at the Sanger Institute
Coffey AJ, Kokocinski F, Calafato MS, Scott CE, Palta P, Drury E, Joyce CJ, Leproust EM, Harrow J, Hunt S et al. The GENCODE exome: sequencing the complete human exome. Eur J Hum Genet. 2011 Jul;19(7):827-31.
Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S et al. Ensembl 2012. Nucleic Acids Res. 2012 Jan;40(Database issue):D84-90.
Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9.
Djebali S, Lagarde J, Kapranov P, Lacroix V, Borel C, Mudge JM, Howald C, Foissac S, Ucla C, Chrast J et al. Evidence for transcript networks composed of chimeric RNAs in human cells. PLoS One. 2012;7(1):e28213.
Ezkurdia I, Del Pozo A, Frankish A, Rodriguez JM, Harrow J, Ashman K, Valencia A, Tress ML. Comparative Proteomics Reveals a Significant Bias Toward Alternative Protein Isoforms with Conserved Structure and Function. Mol Biol Evol. 2012 Apr 17;.
Publications link for the GENCODE projectGENCODE data are available for use without restrictions. The full data release policy for ENCODE is available here.