Description

Using TIGR gene indices clustering tools (Pertea et al.2003), 249,200 ESTs (170,059 cDNA clones) were clustered, generating 58,713 consensuses and singletons. NIA consensuses and singletons were further clustered with Ensembl transcripts, RIKEN transcripts, and RefSeq transcripts and transcript predictions. Alignments of these sequences to the mouse genome (UCSC February 2002 freeze data) using BLAT helped to avoid false clustering of similar sequences at nonmatching genome locations. Erroneous clusters were reassembled based on the analysis of genome alignment. A total 94,039 putative transcripts (called NAP) were thus generated and then grouped into 39,678 putative genes (called U-clusters) based on their overlap in the genome on the same chromosome strand and on clone-linking information. Using criteria of an ORF greater than 100 amino acids or of multiple exons (excluding sequences that are potentially located in a wrong strand), 29,810 mouse genes were identified. Finally, 977 genes unique to the NIA database were identified.

References

NIA Gene Index
Sharov et al. 2003 PLoS Biology 1: 410-419