Description

The polyA_DB database is a set of $organism mRNA polyadenlyation sites based on EST/cDNA evidence. A site is a single base denoting the beginning of a poly(A) tail in a nascent mRNA transcript and is typically 10-30 nucleotides downstream of a polyadenylation signal (most commonly AAUAAA). The polyA_DB web server is found at http://exon.umdnj.edu/polya_db.

The Poly(A) composite track consists of two subtracks: a polyA_DB subtrack that displays reported poly(A) sites, and a poly(A) prediction subtrack that displays poly(A) sites predicted using a support vector machine (SVM).

The poly(A) predictions are made using 1500-base DNA sequences centered at the end of each RefSeq gene. The sequences serve as input into the SVM described in Cheng et al., 2006. The SVM scores each base using a model derived from 15 different cis-elements and reports an E-value for a region of DNA between 0 (excellent) and 0.5 (worst). This E-value is then normalized to an integer value between 0 (worst) and 1000 (excellent). High-scoring regions are highlighted, with the highest-scoring base indicated by a thicker line. The median length of these regions is 48 bases.

References

Cheng Y, Miura RM, Tian B. Prediction of mRNA polyadenylation sites by support vector machine. Bioinformatics. 2006 Oct 1;22(19):2320-5.

Zhang H, Hu J, Recce M, Tian B. PolyA_DB: a database for mammalian mRNA polyadenylation. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D116-120.