Description

ECgene (gene prediction by EST clustering) predicts genes by combining genome-based EST clustering and transcript assembly methods. The EST clustering is based on genomic alignment of mRNA and ESTs similar to that of NCBI's UniGene for the human genome. The transcript assembly procedure yields gene models for each cluster that include alternative splicing variants. This algorithm was developed by Prof. Sanghyuk Lee's Lab of Bioinformatics at Ewha Womans University in Seoul, Korea.

For more detailed information, see the ECgene website.

Display Conventions

This track follows the display conventions for gene prediction tracks.

Methods

The following is a brief summary of the ECgene algorithm:
  1. Genomic alignment of mRNA and ESTs: Input sequences are aligned against the genome using the Blat program developed by Jim Kent. Blat alignments are corrected for valid splice sites, and the SIM4 program is used for suspicious alignments if necessary.
  2. Sequences that share more than one splice site are clustered together. This produces the primary clusters without unspliced sequences (singletons).
  3. The genomic alignment of exons in each spliced sequence is represented as a directed acyclic graph (DAG), and all possible gene models are derived by the depth-first-search (DFS) method.
  4. Sequences compatible with each gene model are grouped together as sub-clusters. Gene models without sufficient evidence are discarded at this stage. Sensitive detection of polyA tails is achieved by analyzing genomic alignment of mRNA and EST sequences, and specifically used to determine the gene boundary.
  5. Finally, unspliced sequences are added so as not to change the splice sites of the existing gene model.

Credits

The predictions for this track were produced by Namshin Kim and Sanghyuk Lee at Ewha Womans Univeristy, Seoul, KOREA.