Description

This track is produced as part of the ENCODE Project. Describe the project here. A major goal of the ENCODE project is the complete characterization and understanding of the regulation and dynamics of the human transcriptome. High-throughput sequencing of reverse-transcribed RNA molecules (RNA-seq) has revolutionized work in this area by providing a genome-wide digital readout of transcription at base-pair resolution. However, such measurements are typically performed on population consisting of a large number of cells thus the readout is the average of the transcriptome profiles of individual cells. This masks potentially important cell-to-cell variation in transcript abundance, allele expression bias, dynamic response to external stimuli, etc. In order to adress these issues, we have generated single-cell RNA-seq measurements for invidual cells from ENCODE cell lines.

Display Conventions and Configuration

This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. The following views are in this track:

Raw Signal
Density graph (wiggle) of read coverage.

Downloadable Files

The following files can be found on the downloads page.

Raw Reads, Alignments and Coverage Tracks:

*.fastq - raw sequence files in fastq format with phred33 quality scores
*.BigWig - read density files (RawSignal view, BigWig format; "plus" and "minus" versions provided for stranded data)
*.bam - all alignments in SAM/BAM format (Alignments view)

Expression Estimates and Transcript Models (Cufflinks):

*.genes.FPKM - expression level estimates for GENCODE CRCh37.v7 genes in FPKM (Fragments Per Kilobase per Million fragments)

Methods

Experimental Procedures

Cells were grown according to the approved ENCODE cell culture protocols Single GM12878 cells from suspension culture are aspirated with a micropipette. The cell is deposited into a cell lysis/dT priming solution under visual observation to confirm the presence of a single cell. The lysate is then frozen on dry ice and stored at -80C until processing. After denaturation at 70C, first strand reverse transcription is performed with a template switching enzyme (Clontech). 18 cycles of PCR are used to amplify the cDNA into a double stranded library. The amplified library is fragmented using transposome-mediated “tagmentation” (Epicentre), which simultaneously fragments and attaches Illumina sequencing primer sequences to the cDNA. The fragmented cDNA (average size 300 bp) is amplified for an additional 9 cycles using primers which incorporate bridge PCR sequences and multiplex bar codes. The final product is cleaned up using SPRI beads and submitted for bridge PCR cluster formation on an Illumina flow cell.

Data Processing and Analysis

Verification

Credits

Wold Group: Ali Mortazavi, Brian Williams, Georgi Marinov, Diane Trout, Brandon King, Ken McCue, Lorian Schaeffer.

Myers Group: Norma Neff, Florencia Pauli, Fan Zhang, Tim Reddy, Rami Rauch, Chris Partridge.

Illumina gene expression group: Gary Schroth, Shujun Luo, Eric Vermaas.

TopHat/Cufflinks development: Cole Trapnell, Lior Pachter, Steven Salzberg .

Contacts: Georgi Marinov (data coordination/informatics/experimental). Diane Trout (informatics) and Brian Williams (experimental).

References

EXAMPLE

Mortazavi A, Williams BA, McCue K, Schaeffer L, and Wold BJ. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods. 2008 Jul; 5(7):621-628.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.