Description

This track displays binding sites of the specified transcription factors in the given cell types as identified by chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-Seq — see Johnson DS, et al., 2007 and Fields S, 2007).

The ChIP-Seq method was used to assay chromatin fragments bound by specific or general transcription factors as described below. DNA isolated by ChIP-Seq was size-selected (~225 bp) and sequenced. Short reads of 25-35 bp were mapped to the human reference genome, generating peaks or enriched regions of high read density relative to a total input chromatin control reads.

Included for each cell type is a control signal, which represents the control condition where no antibody targeting was performed.

The sequence reads, quality scores, and alignment coordinates from these experiments are available for download.

Track Conventions

This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. The subtracks in this track are grouped by transcription factor targeted antibody and by cell type. For each experiment (cell type vs. antibody), the following views are included:

Peaks: Sites with the greatest evidence of transcription factor binding.
Raw Signal: A continuous signal which indicates density of aligned reads. The sequence reads were extended to the size-selected length (225 bp), and the read density computed using the bedItemOverlapCount utility. This annotation was generated by the ENCODE Data Coordination Center at UCSC.

Methods

Cells were grown according to the approved ENCODE cell culture protocols. Cross-linked chromatin was immunoprecipitated with antibody according to the protocol posted here. Biological replicates from each cell line were tested. Libraries were generated from DNA fragments recovered from immunoprecipitation and total input chromatin according to the protocol posted here.

Libraries were sequenced with an Illumina Genome Analyzer (GA1) according to the manufacturer's recommendations. Sequence data produced by the Illumina data pipeline software were quality filtered and then mapped to NCBI Build36 (hg18) using the integrated Eland software; only the first 25 bp of reads were used for alignment; up to two mismatches were tolerated; multiply mapped reads were discarded.

To identify likely binding sites, peak calling was applied to the aligned sequence data sets using quantitative enrichment of sequence tags (QuEST). (See Valouev A, et al., 2008.) QuEST is based on the kernel density estimation approach, which uses ChIP-Seq data to determine positions where protein complexes contact DNA. QuEST uses data in the form of genome coordinates ('tags') obtained from mapping several million sequencing reads to a reference genome. Tags from forward and reverse reads cluster on opposite sides of the transcription factor binding site. QuEST first constructs two separate profiles, one for forward and one for reverse tags. QuEST identifies candidates for combined density profile (CDP) peaks as positions in the reference genome corresponding to local maxima of the CDP with sufficient enrichment compared to the control data.

Validation

Quantitative polymerase chain reaction (qPCR) assays can be used to validate the transcription factor binding sites found using ChIP-Seq. Regions of enriched read density reported by QuEST are reported as a single genomic coordinate (peak) for each enriched region. These peaks are ranked according to the ChIP-Seq enrichment ratio and qPCR assays are used to validate the set of overlapping peaks between replicates for each cell line. qPCR primer pairs were designed to interrogate the list of ordered peaks in common between replicates. Amplicons were 60-100 bp in length and were completely contained within 250 bp of either side of the peak coordinate. For each primer pair, qPCR assays were performed on biological replicate ChIP samples on both cell lines and total input chromatin DNA was recovered. Enrichment was calculated as a ratio of amount of target DNA over the average of a pair of negative control primers. An assay was considered positive when it had a two-fold or greater enrichment of the average qPCR replicates (see Valouev A, et al., 2008).

Credits

These data were provided by the Myers Lab at the HudsonAlpha Institute for Biotechnology.

Contact: Rami Rauch

References

Fields S. Molecular biology. Site-seeing by sequencing. Science. 2007 Jun 8;316(5830):1441-2.

Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007 Jun 8;316(5830):1497-502.

Valouev A, Johnson DS, Sundquist A, Medina C, Anton E, Batzoglou S, Myers RM, Sidow A. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods. 2008 Sep;5(9):829-34.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.