Description
RNA sequencing, or RNA-Seq, is a method for mapping and quantifying the
transcriptome of any organism that has a genomic DNA sequence
assembly. Compared to microarrays, RNA-Seq is especially
well-suited for de novo
discovery of RNA splicing patterns, and for determining unequivocally
the presence or absence of lower abundance class RNAs.
RNA-Seq is performed by reverse-transcribing an RNA sample into
cDNA, followed by high throughput DNA sequencing. Most data is
produced in one of two formats: single reads, each of which comes from
one end of a randomly primed cDNA molecule; and paired-end reads, which
are obtained as pairs from both ends cDNAs resulting from random
priming. The resulting sequence reads are then informatically mapped
onto the genome sequence (Alignments).
Those that don't map to the genome are mapped to known RNA splice
junctions (Splice Sites).
These mapped reads are then counted to determine their frequency of
occurrence at known gene models.
Some RNA-Seq protocols do not specify the coding strand. As a result,
there can be ambiguity at loci where both strands are transcribed.
Display Conventions
This track is a multi-view composite track that contains multiple data
types (views). For each view, there are multiple subtracks that display
individually on the browser. Instructions for configuring multi-view
tracks are here. The following views are in this track:
Signal, Raw Signal, or
RPKM
Density graph (wiggle) of signal
enrichment based on a normalized aligned read density, indicating RNA
abundance. In some tracks, this is divided further by strand, to
indicate the abundance of RNA transcribed on each strand.
Splice Sites
Reads mapped to the genome that do not
map in one contiguous block. Often, extra steps are taken to
ensure that these reads represent splice sites, such as ensuring that
they align to some sequence in a catalog of spliced RNAs,
Alignments
Reads mapped to the genome. For some
tracks, these reads are available for viewing. For others, they
cannot be viewed but are available as downloadable files.
Credits
These data were generated and analyzed as part of the ENCODE project, a
genome-wide consortium project with the aim of cataloging all
functional elements in the human genome. This effort includes
collecting a variety of data over a specific set of cell types.
Consequently,
data
related to these tracks be available under ENCODE
tracks.
References
Morozova O, Hirst M, Marra MA. Applications of new sequencing
technologies for transcriptome analysis. Annual Review of
Genomics and Human Genetics. 2009;10:135-51.
Metzker ML. Sequencing
technologies - the next generation. Nature Reviews: Genetics. 2010
Jan;11(1):31-46
Data Release Policy
Data users may freely use ENCODE data, but may not, without prior
consent, submit publications that use an unpublished ENCODE dataset
until nine months following the release of the dataset. This date is
listed in the Restricted Until column on the track configuration page
and the download page. The full data release policy for ENCODE is
available here.