Description

This track is produced as part of the ENCODE Project. Transcription factors (TF) are hypothesized to be knocked down through the interference of short hairpin RNA (shRNA). The derivative sublines may show differentially expressed genes where TF binding sites are proximal. This increases evidence of potential genomic targets for a TF.

K562, a non-adherent human erythromyeloblastoid leukemia cell line, was transduced with lentiviral vectors carrying an inducible shRNA that targets a specific TF to reduce its expression. Cells with stable integration of shRNA constructs were selected with puromycin. Doxycycline was used to induce shRNA, which reduced the expression (measured by QPCR) of the targeted gene by at least 70% relative to cells not treated with doxycycline.

The cell lines are named shX_K562, where X is the TF targeted by shRNA and K562 denotes the parent cell line. For example, shATF3_K562 cells are K562 derived cells selected for stable integration of shRNA targeting the TF ATF3 gene and showed at least a 70% reduction in the expression of ATF3 gene.

RNA-seq (Mortazavi et al., 2008) was used to quantify the transcriptome of these cell lines both for the doxycycline treatment with induced shRNA and for no doxycycline. The cDNA was sequenced on an Illumina Genome Analyzer (GAI or GAIIx). Biological replicates were performed for each experiment.

Display Conventions and Configuration

This is a composite track that contains multiple data types (views). For each view, there are multiple subtracks (cell lines, replicates and growth conditions) that display individually on the browser. Instructions for configuring multi-view tracks are here. The following views are in this track:

FPKM
Gencode.v3c (Harrow et al., 2006) gene models are shaded with a gray scale according to a score calculated from the FPKM value (Fragments Per Kilobase of exon per Million reads, Cufflinks v0.9.3, Roberts et al., 2011). Score is equal to 100 x log2(FPKM +1) and is capped at 1000. The gray scale becomes darker as score and FPKM increase, thereby assisting in visualizing the relative amount of a given transcript across multiple samples.

Alignments
The Alignments view shows reads mapped to the genome.

Raw Signal
Density graph of signal enrichment based on a normalized aligned read density (Read Per Million, RPM). RPM is reported in the score field and is equal to the number of reads at that position divided by the total number of reads divided by one million. The RawSignal view displays dense, continuous data as a graph and the RPM measure assists in visualizing the relative amount of a given transcript across multiple samples. Raw Signals are colored by cell type.

Methods

Experimental Procedures

Cells were grown according to the approved ENCODE cell culture protocols. Messenger RNA was isolated, reverse transcribed to cDNA and sequenced according to the protocol in Mortazavi et al. (2008). The Genome Analyzer flowcell was used according to the protocol for the ChIPSeq DNA genomic DNA kit (Illumina). The sequencing libraries were size-selected around 225 bp and amplified with 15 rounds of PCR. Libraries were sequenced with an Illumina Genome Analyzer IIx according to the manufacturer's recommendations. Single end reads of 36 nt in length were obtained.

Data Processing and Analysis

Fastq files were made from qseq files generated by the Illumina pipeline (Casava 1.7). The Raw Signal files (bigWig) were generated from bedgraph files and the score was calculated as the number of reads at that position divided by the total number of reads divided by one million. Casava export files were aligned to the NCBI Build37 (hg19) version of the human genome with ELAND (Illumina), generating SAM files. SAM files were converted to BAM with SAMtools (Li et al., 2009). The first 10 residues of sequencing have a weak characteristic nucleotide bias of unknown origin. This RNA-seq protocol does not specify the coding strand. As a result, there will be ambiguity at loci where both strands are transcribed.

RNA-seq reads were aligned to Gencode.v3c (Harrow et al., 2006) gene models and gene expression was measured in Fragments Per Kilobase exon per Million reads (FPKM) using Cufflinks v0.9.3 (Roberts et al., 2011). FPKM is calculated by dividing the total number of fragments that align to the gene model by the size of the spliced transcript (exons) in kilobases. This number is then divided by the total number of reads in millions for the experiment. FPKM is reported in the last column of the gtf (TranscriptGencV3c) files.

RawData (fastq), RawSignal (bigWig), Alignments (bam) and TranscriptGencV3c (gtf) files are available from the Downloads page.

Verification

Credits

These data were produced by the Dr. Richard Myers Lab at the Hudson Alpha Institute for Biotechnology.

Contact: Dr. Florencia Pauli.

References

Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D, Rossier C, Ucla C, Hubbard T, Antonarakis SE and Guigo R. GENCODE: producing a reference annotation for ENCODE Genome Biology. 2006; 7 Suppl 1;S4.1-9

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R and 1000 Genome Project Data Processing Subgroup. The Sequence alignment/map (SAM) format and SAMtools Bioinformatics. 2009; 25:2078-9.

Mortazavi A, Williams BA, McCue K, Schaeffer L, and Wold BJ. Mapping and quantifying mammalian transcriptomes by RNA-Seq Nature Methods. 2008 Jul; 5(7):621-628.

Roberts A, Trapnell C, Donaghey J, Rinn JL, Patcher L. Improving RNA-Seq expression estimates by correcting for fragment bias Genome Biology. 2011 Mar; 12:R22.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.