Description

This track shows gene predictions submitted for the ENCODE Gene Annotation Assessment Project (EGASP) Gene Prediction Workshop 2005 that cover only a partial set of the 44 ENCODE regions. The partial set excludes the 13 ENCODE regions for which high-quality annotations were released in late 2004. The following gene predictions are included:

The EGASP Full companion track shows original gene prediction submissions for the full set of 44 ENCODE regions using Gene Prediction algorithms other than those used here; the EGASP Update track shows updated versions of some of the submitted predictions.

Display Conventions and Configuration

Data for each gene prediction method within this composite annotation track is displayed in a separate subtrack. See the top of the track description page for a complete list of the subtracks available for this annotation. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide.

The individual subtracks within this annotation follow the display conventions for gene prediction tracks. The track description page offers the option to color and label codons in a zoomed-in display of the subtracks to facilitate validation and comparison of gene predictions. To enable this feature, select the genomic codons option from the "Color track by codons" menu. Click the Help on codon coloring link for more information about this feature.

Color differences among the subtracks are arbitrary. They provide a visual cue for distinguishing the different gene prediction methods.

Methods

ACEScan

ACEScan (Alternative Conserved Exons Scan) indicates alternative splicing that is evolutionarily conserved in human and mouse/rat. The Conserved Alternative Exon Predictions subtrack shows predicted alternative conserved exons. The Unconserved Alternative and Constitutive Exon Predictions subtrack shows exons that are predicted to be constitutive or may have species-specific alternative splicing.

Augustus

Augustus uses a generalized hidden Markov model (GHMM) that models coding and non-coding sequence, splice sites, the branch point region, translation start and end, and lengths of exons and introns. The track contains four different sets of predictions. Ab initio single genome predictions are based solely on the input sequence. EST and protein evidence predictions were generated using AGRIPPA hints based on alignments of human sequence from the dbEST and nr databases. Mouse homology gene predictions were produced using mouse genomic sequence only; BLAST, CHAOS, DIALIGN were used to generate the hints for Augustus. The combined EST/protein evidence and mouse homology gene predictions were created using human sequence from the dbEST and nr databases and mouse genomic sequence to generate hints for Augustus. Additional predictions and methods for this subtrack are available in the EGASP Updates track.

GeneZilla

GeneZilla is a program for the computational prediction of protein-coding genes in eukaryotic DNA, based on the generalized hidden Markov model (GHMM) framework. These predictions were generated using GeneZilla and IsoScan, which uses a four-state hidden Markov model to predict isochores (regions of homogeneous G+C content) in genomic DNA.

SAGA

SAGA is an ab initio multiple-species gene-finding program based on the Gibbs sampling-based method described in Chatterji et al. (2004). In addition to sampling parameters, SAGA also uses a phyloHMM based model to boost the scores, similar to the method described in Siepel et al. (2004).

Credits

The gene prediction data sets were submitted by the following individuals and institutions:

References

Chatterji, S. and Pachter, L. Multiple organism gene finding by collapsed Gibbs sampling. Proc. 8th Int'l Conf. on Research in Computational Molecular Biology, 187-193 (2004).

Siepel, A. and Haussler, D. Computational identification of evolutionarily conserved exons. Proc. 8th Int'l Conf. on Research in Computational Molecular Biology, 177-186 (2004).