Description

This track shows pseudogenes identified by the Yale Pseudogene Pipeline in the ENCODE regions. Pseudogenes are defined in this analysis as genomic sequences that are similar to known genes with various inactivating disablements (e.g. premature stop codons or frameshifts) in their "putative" protein coding regions. Pseudogenes are flagged as either recently processed, recently duplicated, or of uncertain origin (either ancient fragments or resulting from a single-exon parent).

Methods

Briefly, the protein sequences of known human genes (as annotated by ENSEMBL) were used to search for similarities, not overlapping with known genes. It was determined whether the matching sequences were disabled copies of genes based on the occurrences of premature stop codons or frameshifts. The intron-exon structure of the functional gene was further used to infer whether a pseudogene was recently duplicated or processed. A duplicated pseudogene retains the intron-exon structure of its parent functional gene, whereas a processed pseudogene shows evidence that this structure has been spliced out. Small pseudogene sequences that cannot be confidently assigned to either the processed or duplicated category may be ancient fragments. Further details are in the references below.

Verification

All pseudogenes in the list have been manually checked.

Credits

These data were generated by the pseudogene annotation group in the Gerstein Lab at Yale University, including particularly Deyou Zheng.

References

More information is available from Pseudogene.org.

Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome.
Z Zhang, PM Harrison, Y Liu, M Gerstein (2003) Genome Res 13: 2541-58.

Integrated pseudogene annotation for human chromosome 22: evidence for transcription.
D Zheng, Z Zhang, PM Harrison, J Karro, N Carriero, M Gerstein (2005) J Mol Biol 349: 27-45.