wgEncodeRnaSeqSuper.html

Description

RNA sequencing, or RNA-Seq, is a method for mapping and quantifying the transcriptome of any organism that has a genomic DNA sequence assembly. Compared to microarrays, RNA-Seq is especially well-suited for de novo discovery of RNA splicing patterns, and for determining unequivocally the presence or absence of lower abundance class RNAs.

RNA-Seq is performed by reverse-transcribing an RNA sample into cDNA, followed by high throughput DNA sequencing. Most data is produced in one of two formats: single reads, each of which comes from one end of a randomly primed cDNA molecule; and paired-end reads, which are obtained as pairs from both ends cDNAs resulting from random priming. The resulting sequence reads are then informatically mapped onto the genome sequence (Alignments).
Those that don't map to the genome are mapped to known RNA splice junctions (Splice Sites).

These mapped reads are then counted to determine their frequency of occurrence at known gene models.

Some RNA-Seq protocols do not specify the coding strand. As a result, there can be ambiguity at loci where both strands are transcribed.

Display Conventions

This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. The following views are in this track:

Signal, Raw Signal, or RPKM

Density graph (wiggle) of signal enrichment based on a normalized aligned read density, indicating RNA abundance. In some tracks, this is divided further by strand, to indicate the abundance of RNA transcribed on each strand.

Splice Sites

Reads mapped to the genome that do not map in one contiguous block. Often, extra steps are taken to ensure that these reads represent splice sites, such as ensuring that they align to some sequence in a catalog of spliced RNAs,

Alignments

Reads mapped to the genome. For some tracks, these reads are available for viewing. For others, they cannot be viewed but are available as downloadable files.

Credits

These data were generated and analyzed as part of the ENCODE project, a genome-wide consortium project with the aim of cataloging all functional elements in the human genome. This effort includes collecting a variety of data over a specific set of cell types. Consequently, data related to these tracks be available under ENCODE tracks.

References

Morozova O, Hirst M, Marra MA. Applications of new sequencing technologies for transcriptome analysis. Annual Review of Genomics and Human Genetics. 2009;10:135-51.

Metzker ML. Sequencing technologies - the next generation. Nature Reviews: Genetics. 2010 Jan;11(1):31-46

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.