The GENCODE Genes track (version 4, May 2010) shows high-quality manual annotations merged with evidence-based automated annotations across the entire human genome generated by the GENCODE project. The GENCODE gene set presents a full merge between HAVANA and ENSEMBL. Priority is given to the manually curated HAVANA annotation, using predicted ENSEMBL annotations when there are no corresponding manual annotations. The annotation was carried out on genome assembly GRCh37 (hg19).
NOTE: We try and synchronize the release cycles for GENCODE, Havana and Ensembl. This GENCODE version 4 corresponds to Ensembl 58 and Vega 38. Also see: GENCODE project.
The gene annotations are colored based on the annotation type and the confidence level. See the table below for the color key, as well as more detail about the transcript and feature types.
Class | Color | Description | Transcript Types (see Vega Transcript Types) |
---|---|---|---|
Validated_coding | Dark Orange | Level 1 Validated: coding regions | protein_coding |
Validated_processed | Light Orange | Level 1 Validated: processed | processed_transcript |
Validated_processed_pseudogene | Dark Pink | Level 1 Validated: processed pseudogenes | processed_pseudogene, processed_transcript, transcribed_processed_pseudogene |
Validated_unprocessed_pseudogene | Medium Pink | Level 1 Validated: unprocessed pseudogenes | transcribed_unprocessed_pseudogene, unprocessed_pseudogene |
Validated_pseudogene | Light Pink | Level 1 Validated: pseudogenes | IG_pseudogene, polymorphic_pseudogene, pseudogene, retrotransposed, unitary_pseudogene |
Havana_coding | Dark Orange | Level 2 Manual annotation: coding | IG_C_gene,IG_D_gene,IG_J_gene,IG_V_gene,protein_coding |
Havana_nonsense | Medium Orange | Level 2 Manual annotation: nonsense | nonsense_mediated_decay |
Havana_non_coding | Light Orange | Level 2 Manual annotation: non-coding | ambiguous_orf, antisense, non_coding, processed_transcript, retained_intron |
Havana_polyA | Black | Level 2 Manual annotation: polyA | polyA_signal, polyA_site, pseudo_polyA |
Havana_processed_pseudogene | Dark Pink | Level 2 Manual annotation: processed pseudogene | processed_pseudogene, transcribed_processed_pseudogene |
Havana_unprocessed_pseudogene | Medium Pink | Level 2 Manual annotation: unprocessed pseudogene | transcribed_unprocessed_pseudogene, unprocessed_pseudogene |
Havana_pseudogene | Light Pink | Level 2 Manual annotation: pseudogene | IG_pseudogene, TR_pseudogene, polymorphic_pseudogene, pseudogene, retrotransposed, unitary_pseudogene |
Havana_TEC | Grey | Level 2 Manual annotation: TEC | TEC, artifact |
Ensembl_coding | Dark Red | Level 3 Automated annotation: coding | IG_C_gene, IG_D_gene, IG_J_gene, IG_V_gene, protein_coding |
Ensembl_non_coding | Light Orange | Level 3 Automated annotation: non-coding | antisense, non_coding, processed_transcript, retained_intron |
Ensembl_pseudogene | Dark Pink | Level 3 Automated annotation: pseudogene | IG_pseudogene, miRNA_pseudogene, misc_RNA_pseudogene, pseudogene, retrotransposed, unitary_pseudogene |
Ensembl_processed_pseudogene | Medium Pink | Level 3 Automated annotation: processed pseudogene | processed_pseudogene |
Ensembl_unprocessed_pseudogene | Light Pink | Level 3 Automated annotation: unprocessed pseudogene | unprocessed_pseudogene |
Ensembl_RNA | Light Red | Level 3 Automated annotation: RNA transcripts | Mt_rRNA, Mt_tRNA, Mt_tRNA_pseudogene, miRNA, misc_RNA, rRNA, rRNA_pseudogene, scRNA_pseudogene, snRNA, snRNA_pseudogene, snoRNA, snoRNA_pseudogene, tRNA_pseudogene, tRNAscan |
2way_pseudogene | Dark Purple | Level 3 Automated annotation: pseudogenes | pseudogenes |
This track uses filtering by category to select subsets of transcripts and has additional advanced features. Help with these features can be found here.
We aim to annotate all evidence-based gene features at high accuracy on the human reference sequence. This includes identifying all protein-coding loci with associated alternative variants, non-coding loci which have transcript evidence, and pseudogenes. We integrate computational approaches (including comparative methods), manual annotation and targeted experimental verification.
For a detailed description of the methods and references used, see Harrow et al (2006).
See Harrow et al. (2006) for information on verification techniques.
Selected transcript models are verified experimentally by RT-PCR amplification followed by sequencing. Those experiments can be found at GEO:
This GENCODE release is the result of a collaborative effort among the following laboratories: (contact: GENCODE at the Sanger Institute. )
Lab/Institution |
Contributors |
HAVANA annotation group, Wellcome Trust Sanger Insitute (WTSI), Hinxton, UK | Adam Frankish, James Gilbert, Jennifer Harrow,
Felix Kokocinski, Stephen Trevanion, Tim Hubbard (GENCODE Principal Investigator) |
Genome Bioinformatics Lab (CRG), Barcelona, Spain | Thomas Derrien, Tyler Alioto, Andrea Tanzer, Roderic Guigó |
Genome Bioinformatics, University of California Santa Cruz (UCSC), USA | Rachel Harte, Mark Diekhans, Robert Baertsch, David Haussler |
Comp. Genomics Lab, Washington University St. Louis (WUSTL), USA | Jeltje van Baren, Charlie Comstock, David Lu, Michael Brent |
Computer Science and Artificial Intelligence Lab, Broad Institute of MIT and Harvard, USA | Mike Lin, Manolis Kellis |
Computational Biology and Bioinformatics, Yale University (Yale), USA | Philip Cayting, Suganthi Balasubramanian, Baikang Pei, Cristina Sisu, Mark Gerstein |
Center for Integrative Genomics, University of Lausanne, Switzerland | Cedric Howald, Alexandre Reymond |
ENSEMBL genebuild group, Wellcome Trust Sanger Insitute (WTSI), Hinxton, UK | Steve Searle, Bronwen Aken, Amonida Zadissa, Daniel Barrell |
Structural Computational Biology Group, Centro Natcional de Investigaciones Oncologicas (CNIO), Madrid, Spain | José Manuel Rodríguez, Michael Tress, Alfonso Valencia |
Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006;7 Suppl 1:S4.1-9.
Flicek et al. Ensembl 2011 Nucleic Acids Research 2011 39 Database issue:D800-D806
GENCODE data are available for use without restrictions. The full data release policy for ENCODE is available here.