Description

This track shows predicted "novel" single exon genes (SEGs) from several sources -- i.e., predicted SEGs that do not overlap known genes or spliced mRNAs/ESTs.

Display Conventions and Configuration

This is a composite track, with sub-tracks for the source gene predictions. The sub-tracks follow the ordinary display conventions for gene prediction tracks. At the bottom are two additional sub-tracks that summarize information across sources. The first indicates "loci" (maximal intervals spanned by overlapping predictions on the same strand) supported by predictions from two or more independent sources. The second indicates loci containing predictions that have passed all validation filters (see below).

Methods

For each source, we extracted all predictions of genes having single CDS exons, then discarded those overlapping genes from the RefSeq, Vega, UCSC Known Genes, and MGC sets, and those overlapping spliced mRNAs or ESTs from Genbank. Predictions that fell in introns of known genes on the same strand were also discarded (because they are more likely to be alternative exons than separate genes), as were predictions that overlapped predicted processed pseudogenes. The predictions labeled "best" also have: (1) at least 80% overlap with the syntenic nets between human and mouse, rat, and dog; (2) at least 40% of their CDS bases in "most conserved" regions according to phastCons; (3) "end-to-end" homology with a known vertebrate gene (i.e., a BLASTP alignment covering >85% of both query and target); and (4) no frame-shift indels or nonsense mutations in human/mouse/rat/dog alignments.

Credits

Thanks to Michael Brent's Computational Genomics Group at Washington University St. Louis for providing the N-SCAN and Twinscan predictions, the ENSEMBL project for providing their predictions, and the Grup de Recerca en Informàtica Biomèdica (GRIB) at Institut Municipal d'Investigació Mèdica (IMIM) in Barcelona for providing the SGP predictions. TransMap was developed by Mark Diekhans at UC Santa Cruz and ExoniPhy was developed by Adam Siepel at UC Santa Cruz (now Cornell).

References

Gross SS, Brent MR. Using multiple alignments to improve gene prediction. In Proc. 9th Int'l Conf. on Research in Computational Molecular Biology (RECOMB '05):374-388 and J Comput Biol. 2006 Mar;13(2):379-93.

Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al.. The Ensembl genome database project. Nucleic Acids Research. 2002 Jan 1;30(1):38-41.

Korf I, Flicek P, Duan D, Brent MR. Integrating genomic homology into gene structure prediction. Bioinformatics. 2001 Jun 1;17(90001)S140-8.

Parra, G. et al.. Comparative gene prediction in human and mouse. Genome Res, 13:108-117 (2003).

Siepel, A. and Haussler, D. Computational identification of evolutionarily conserved exons. RECOMB '04 (2004).