Description

The UCSC Known Genes track shows known protein-coding genes based on protein data from SWISS-PROT, TrEMBL and TrEMBL-NEW and their corresponding mRNAs from GenBank.

Display Conventions and Configuration

This track follows the display conventions for gene prediction tracks. Black coloring indicates features that have corresponding entries in the Protein Databank (PDB). Blue indicates features associated with mRNAs from NCBI RefSeq or (dark blue) items having associated proteins in the SWISS-PROT database. The variation in blue shading of RefSeq items corresponds to the level of review the RefSeq record has undergone: predicted (light), provisional (medium), or reviewed (dark).

This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. Go to the Coloring Gene Predictions and Annotations by Codon page for more information about this feature.

Methods

mRNA sequences were aligned against the $organism genome using blat. When a single mRNA aligned in multiple places, only alignments having at least 98% base identity with the genomic sequence were kept. This set of mRNA alignments was further reduced by keeping only those mRNAs referenced by a protein in SWISS-PROT, TrEMBL or TrEMBL-NEW.

Among multiple mRNAs referenced by a single protein, the best mRNA was selected, based on a quality score derived from its length, the level of the match between its translation and the protein sequence, and its release date. The resulting mRNA and protein pairs were further filtered by removing short invalid entries and consolidating entries with identical CDS regions.

Finally, RefSeq entries derived from DNA sequences instead of mRNA sequences were added to produce the final data set shown in this track. Disease annotations were obtained from SWISS-PROT.

Credits

The Known Genes track was produced at UCSC based primarily on cross-references between proteins from SWISS-PROT (including TrEMBL and TrEMBL-NEW) and mRNAs from GenBank contributed by scientists worldwide. NCBI RefSeq data were also included in this track.

Data Use Restrictions

The UniProt data have the following terms of use, UniProt copyright(c) 2002 - 2004 UniProt consortium:

For non-commercial use, all databases and documents in the UniProt FTP directory may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy.

For commercial use, all databases and documents in the UniProt FTP directory except the files

may be copied and redistributed freely, without advance permission, provided that this copyright statement is reproduced with each copy. More information for commercial users can be found at the UniProt License & disclaimer page.

From January 1, 2005, all databases and documents in the UniProt FTP directory may be copied and redistributed freely by all entities, without advance permission, provided that this copyright statement is reproduced with each copy.

References

Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32:D23-6.

Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D. The UCSC Known Genes. Bioinformatics. 2006 May 1;22(9):1036-46.

Kent WJ. BLAT - The BLAST-Like Alignment Tool. Genome Res. 2002 Apr;12(4):656-64.