Description

This track displays binding sites of the specified transcription factors in the given cell types as identified by chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-Seq — see Johnson DS, et al., 2007 and Fields S, 2007).

The ChIP-Seq method was used to assay chromatin fragments bound by specific or general transcription factors as described below. DNA isolated by ChIP-Seq was size-selected (~225 bp) and sequenced. Short reads of 25-36 bp were mapped to the human reference genome, and enriched regions of high read density relative to a total input chromatin control reads were identified.

The sequence reads with quality scores (fastq files) and alignment coordinates (BAM files) from these experiments are available for download.

Display Conventions and Configuration

This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. The subtracks in this track are grouped by transcription factor targeted antibody and by cell type. For each experiment (cell type vs. antibody), the following views are included:

Peaks: Sites with the greatest evidence of transcription factor binding, calculated as enriched regions of high read density in the ChIP experiment relative to total input chromatin control reads by MACS (Zhang Y, et al., 2008).
Raw Signal: A continuous signal which indicates density of aligned reads. The sequence reads were extended to the size-selected length (225 bp), and the read density computed as reads per million

Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks.

Methods

Cells were grown according to the approved ENCODE cell culture protocols. Cross-linked chromatin was immunoprecipitated with an antibody. The Protein:DNA crosslinks were then reversed and the DNA fragments were recovered and sequenced. Please see protocol notes below and go here for the most current version of the protocol. Biological replicates from each experiment were completed.

Libraries were sequenced with an Illumina Genome Analyzer I or an Illumina Genome Analyzer IIx according to the manufacturer's recommendations. Sequence data produced by the Illumina data pipeline software were quality filtered and then mapped to NCBI Build37 (hg19) using the integrated Eland software; 32 nt of the sequence reads were used for alignment; up to two mismatches were tolerated; reads that mapped to multiple sites in the genome were discarded.

To identify likely binding sites, peak calling was applied to the aligned sequence data sets using Model-based Analysis of Chip-Seq MACS (Zhang Y, et al., 2008). MACS models the shift size of ChIP-Seq tags empirically, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to capture local biases in the genome, allowing for more robust predictions (Zhang Y, et al., 2008).

Protocol Notes

Several changes and improvements were made to the original ChIP-Seq protocol (Jonshon et al.,2008). The major differences between protocols are the number of cells and magnetic beads used for IP, the method of sonication used to fragment DNA, and the number of cycles of PCR used to amplify the sequencing library. The most current protocol used by the Myers lab can be found here. The protocol field for each file denotes the version of the protocol used as being PCR1x, PCR2x or a version number (for examples, v041610.1).

The sequencing libraries labeled as PCR2x were made with two rounds of amplification (25 and 15 cycles) and those labeled as PCR1x were made with one 15-cycle round of amplification. These experiments were completed prior to January 2010 and were originally aligned to NCBI Build36 (hg18). They have been re-aligned to NCBI Build37 (hg19) with the Bowtie software (Langmead, et al., 2009) for this data release. The libraries labeled with a protocol version number were competed after January 2010 and were only aligned to NCBI Build37 (hg19). Please refer to the Myers Lab website for details on each protocol version.

Verification

The MACS peak caller was used to call significant peaks on the individual replicates of a ChIP-Seq experiment. Afterwards, the irreproducible discovery rate (IDR) method, developed by Li et al. (submitted), was used to quantify the consistency between pairs of ranked peaks lists from replicates. The IDR methods uses a model that assumes that the ranked lists of peaks in a pair of replicates consist of two groups - a reproducible group and an irreproducible group. In general, the signals in the reproducible group are more consistent (i.e. with a larger rank correlation coefficient) and are ranked higher than the irreproducible group. The proportion of peaks that belong to the irreproducible component and the correlation of the reproducible component are estimated adaptively from the data. The model also provides an IDR score for each peak, which reflects the posterior probability of the peak belonging to the irreproducible group. The aligned reads were pooled from all replicates and the MACS peak caller was used to call significant peaks on the pooled data. Only datasets containing at least 100 peaks passing the IDR threshold are considered valid and submitted for release.

Credits

These data were provided by the Myers Lab at the HudsonAlpha Institute for Biotechnology.

Contact: Flo Pauli.

References

Fields S. Molecular biology: Site-seeing by sequencing. Science. 2007 Jun 8;316(5830):1441-2.

Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007 Jun 8;316(5830):1497-502.

Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.Genome Biology 10:R25.

Li Q, Brown JB, Huang H, Bickel PJ. Measuring Reproducibility of High-throughput experiments (Submitted to the Annals of Applied Statistics).

Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein B, Nussbaum C, Myers RM, Brown M, Li W, Liu S. Model-based Analysis of ChIP-Seq (MACS). Genome Biology 2008 Sep;9:R137. Epub 2008 Sep 17.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.