This track displays binding sites of the specified transcription factors in the given cell types as identified by chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-Seq — see Johnson DS, et al., 2007 and Fields S, 2007).

The ChIP-seq method was used to assay chromatin fragments bound by specific or general transcription factors as described below. DNA isolated by ChIP-seq was size-selected (~225 bp) and sequenced. Short reads of 25-35 nt were mapped to the human reference genome, and enriched regions of high read density relative to a total input chromatin control reads were identified.

Included for each cell type is a control signal, which represents the control condition where the protein:DNA crosslinks were reversed and DNA fragments were sequenced with no immunoprecipitation (IP).

The sequence reads, quality scores, and alignment coordinates from these experiments are available for download.

Display Conventions and Configuration

This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. The subtracks in this track are grouped by transcription factor targeted with an antibody for ChIP and by cell type. For each experiment (cell type vs. antibody), the following views are included:

Peaks: Sites with the greatest evidence of transcription factor binding.
Raw Signal: A continuous signal which indicates density of aligned reads. The sequence reads were extended to the size-selected length (225 bp), and the read density computed using the bedItemOverlapCount utility. This annotation was generated by the ENCODE Data Coordination Center at UCSC.

Methods

Cells were grown according to the approved ENCODE cell culture protocols. Briefly, cross-linked chromatin was immunoprecipitated with antibody, the protein:DNA crosslinks were reversed and the DNA fragments were recovered and sequenced. Because these experiments were carried out over the course of two years, several changes and improvements were made to the original protocol (see Johnson DS, et al. 2007). The major differences between protocols are the number of cells and magnetic beads used for IP, the method of sonication used to fragment DNA, and the number of cycles of PCR used to amplify the sequencing library. The most current protocol used by the Myers Lab can be found here. The sequencing libraries labeled as PCR2x were made with two rounds of amplification (25 and 15 cycles) and those labeled as PCR1x were made with one 15-cycle round of amplification. Biological replicates from each experiment were completed. For specific details on the protocol used for a ChIP of interest (number of cells, DNA fragmentation and sequencing library construction), please contact the Myers Lab at the contact information provided below.

Libraries were sequenced with an Illumina Genome Analyzer I or an Illumina Genome Analyzer IIx according to the manufacturer's recommendations. Sequence data produced by the Illumina data pipeline software were quality filtered and then mapped to NCBI Build36 (hg18) using the integrated Eland software; 25 to 36 bp of the sequence reads were used for alignment; up to two mismatches were tolerated; reads that mapped to multiple sites in the genome were discarded.

To identify likely binding sites, peak calling was applied to the aligned sequence data sets using either Quantitative Enrichment of Sequence Tags (QuEST, see Valouev A, et al., 2008) or Model-based Analysis of ChIP-Seq ( MACS, see Zhang Y, et al., 2008). Experiments for which peak calling was completed using MACS are labeled as "softwareVersion: MACS" in the list above and can be found by clicking on the metadata link "..." for the Peaks subtrack. Experiments for which QuEST was used do not have a software version annotated. QuEST is based on the kernel density estimation approach, which uses ChIP-seq data to determine positions where protein complexes contact DNA. QuEST uses data in the form of genome coordinates ('tags') obtained from mapping several million sequencing reads to a reference genome. Tags from forward and reverse reads cluster on opposite sides of the transcription factor binding site. QuEST first constructs two separate profiles, one for forward and one for reverse tags. QuEST identifies candidates for combined density profile (CDP) peaks as positions in the reference genome corresponding to local maxima of the CDP with sufficient enrichment compared to the control data. MACS empirically models the shift size of ChIP-seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to capture local biases in the genome, allowing for more robust predictions (see Zhang Y, et al., 2008).

Verification

Quantitative polymerase chain reaction (qPCR) assays can be used to validate the transcription factor binding sites found using ChIP-Seq. Regions of enriched read density reported by QuEST are reported as a single genomic coordinate (peak) for each enriched region. These peaks are ranked according to the ChIP-Seq enrichment ratio and qPCR assays are used to validate the set of overlapping peaks between replicates for each cell line. qPCR primer pairs were designed to interrogate the list of ordered peaks in common between replicates. Amplicons were 60-100 bp in length and were completely contained within 250 bp of either side of the peak coordinate. For each primer pair, qPCR assays were performed on biological replicate ChIP samples on both cell lines and total input chromatin DNA was recovered. Enrichment was calculated as a ratio of amount of target DNA over the average of a pair of negative control primers. An assay was considered positive when it had a two-fold or greater enrichment of the average qPCR replicates (see Valouev A, et al., 2008).

Notes:
Protocol pA: Unless otherwise noted, datasets were generated using a protocol that involved a single round of PCR (15 cycles) to prepare DNA fragment libraries. Certain earlier datasets, however, were generated using a protocol with two rounds of PCR (25 + 15 cycles). These datasets contain "PCR2x" in their label and metadata. Peaks for these experiments were called using "Input (PCR2x)" for background.

Release Notes

This is Release 3 (July 2010) of this track, which includes the revoking of a number of data sets. In addition, experiments previously identified as antibody Input are now identified as RevXlinkChromatin. FTP site.

Credits

These data were provided by the Myers Lab at the HudsonAlpha Institute for Biotechnology.

Contact: Flo Pauli.

References

Fields S. Molecular biology. Site-seeing by sequencing. Science. 2007 Jun 8;316(5830):1441-2.

Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007 Jun 8;316(5830):1497-502.

Valouev A, Johnson DS, Sundquist A, Medina C, Anton E, Batzoglou S, Myers RM, Sidow A. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods. 2008 Sep;5(9):829-34.

Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.