Description
ECgene (gene prediction by EST clustering) predicts genes by combining
genome-based EST clustering and transcript
assembly methods. The EST clustering is based on genomic alignment of mRNA
and ESTs similar to that of NCBI's UniGene for the human genome. The
transcript assembly procedure yields gene models for each cluster that
include alternative splicing variants. This algorithm was developed by Prof.
Sanghyuk Lee's Lab of Bioinformatics at Ewha Womans University in Seoul,
Korea.
For more detailed information, see the
ECgene website.
Display Conventions
This track follows the display conventions for
gene prediction
tracks.
Methods
The following is a brief summary of the ECgene algorithm:
-
Genomic alignment of mRNA and ESTs: Input sequences are aligned against the
genome using the Blat program developed by Jim Kent. Blat alignments are corrected for
valid splice sites, and the SIM4 program is used for suspicious alignments if necessary.
-
Sequences that share more than one splice site are clustered together. This produces the
primary clusters without unspliced sequences (singletons).
-
The genomic alignment of exons in each spliced sequence is represented as a directed
acyclic graph (DAG), and all possible gene models are derived by the depth-first-search
(DFS) method.
-
Sequences compatible with each gene model are grouped together as sub-clusters. Gene
models without sufficient evidence are discarded at this stage. Sensitive detection of
polyA tails is achieved by analyzing genomic alignment of mRNA and EST sequences,
and specifically used to determine the gene boundary.
-
Finally, unspliced sequences are added so as not to change the splice sites of the
existing gene model.
Credits
The predictions for this track were produced by Namshin Kim and Sanghyuk Lee
at Ewha Womans Univeristy, Seoul, KOREA.