Description

This track shows 5' cap analysis gene expression (CAGE) tags and clusters in RNA extracts from different sub-cellular localizations in multiple cell lines. A CAGE cluster is a region of overlapping tags with an assigned value that represents the expression level. The data in this track were produced as part of the ENCODE Transcriptome Project.

Display Conventions and Configuration

This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here.

To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide.

This track contains the following views:

Plus and Minus Clusters: These views display clusters of overlapping read mappings on the forward and reverse genomic strands.
Alignments: The Alignments view shows the individual tags (read mappings), with mismatches from the genomic reference highlighted.

Color differences in subtracks are are used as a visual cue to distinguish between the different cell types, and between annotations on the plus and minus strand.

Methods

Cells were grown according to the approved ENCODE cell culture protocols. RNA molecules longer than 200 nt and present in the RNA population isolated from each subcellular compartment were fractionated into polyA+ and polyA- fractions as described in these protocols. The CAGE tags were sequenced from the 5' ends of cap-trapped cDNAs produced using RIKEN CAGE technology (Kodzius et al. 2006; Valen et al. 2009). To create the tag, a linker was attached to the 5' end of polyA+ or polyA- reverse-transcribed cDNAs which were selected by cap trapping (Carninci et al. 1996). The first 27 bp of the cDNA were cleaved using class II restriction enzymes. A linker was then attached to the 3' end of the cDNA.

After PCR amplification, the tags were sequenced (36 bp single reads) using ABI SOLiD technology (polyA- RNA from the cytosol and nucleus of K562 cell lines, and from whole cell in prostate cells) or Illumina/Solexa GA (all other data). Tags were mapped to the human genome (NCBI Build36, hg18) using the program nexalign (T. Lassmann manuscript in preparation). SOlid CAGE sequences were mapped with up to 3 mismatches; 2 mismatches were allowed for Solexa CAGE. Alignments of sequences mapping 10 times or fewer were retained. The expression level was computed as the number of reads making up the cluster, divided by the total number of reads sequenced, times 1 million.

Release Notes

This is Release 2 of this track. This release adds data for eight new cell-type/compartment combinations (GM12878 Nucleus, H1-hESC whole cell, HepG2 cytosol/nucleus/nucleolus, HUVEC cytosol, and NHEK cytosol/nucleus).

Credits

These data were generated and analyzed by Timo Lassmann, Phil Kapranov, Hazuki Takahashi, Yoshihide Hayashizaki, Carrie Davis, Tom Gingeras, and Piero Carninci.

Contact: Piero Carninci at RIKEN Omics Science Center

References

Kodzius R, Kojima M, Nishiyori H, Nakamura M, Fukuda S, Tagami M, Sasaki D, Imamura K, Kai C, Harbers M, et al. CAGE: cap analysis of gene expression. Nat Methods. 2006 March 1; 3(3):211-222.

Valen E, Pascarella G, Chalk A, Maeda N, Kojima M, Kawazu C, Murata M, Nishiyori H, Lazarevic D, Motti D, et al. Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE. Genome Res. 2009 February; 19(2):255-265.

Carninci P, Kvam C, Kitamura A, Ohsumi T, Okazaki Y, Itoh M, Kamiya M, Shibata K, Sasaki N, Izawa M, et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics. 1996 November 1; 37(3):327-336.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.