ECgene (gene prediction by EST clustering) predicts genes by combining genome-based EST clustering and a transcript assembly procedure in a coherent and consistent fashion. Specifically, ECgene takes alternative splicing events into consideration. The positions of splice sites (i.e. exon-intron boundaries) in the genome map are utilized as critical information in the whole procedure. Sequences that share splice sites in the genomic alignment are grouped together to define an EST cluster. Transcript assembly, based on graph theory, produces gene models and clone evidence, which is essentially identical to sub-clustering according to splice variants.
For more detailed information, see the ECgene website.
This track follows the display conventions for gene prediction tracks.
The track description page offers the following filter and configuration options:
The following is a brief summary of the ECgene algorithm:
Coding potential of gene models: Peptide sequences are available only for those gene models judged to have good coding potential. ORF and CDS were determined based on the number of exons, the ORF length, the presence of the start codon (Met), and the CDS length. ORFs (defined as the region between two adjacent stop codons) were classified into four groups:
Initially, the first group was searched for the ORF with the longest CDS. Coding sequences were accepted if they were longer than 30 amino acids (93 bp) or they were identical to one of SwissProt proteins excluding fragmented entries. If such an ORF could not be identified in the first group, the other groups were examined sequentially for the presence of an ORF using the same criteria. Genes lacking an apparent ORF were defined as non-coding RNA genes.
This algorithm and the predictions for this track were developed by Professor Sanghyuk Lee's Lab of Bioinformatics at Ewha Womans Univeristy, Seoul, KOREA.