Description

This track shows STAT1 binding sites as determined by chromatin immunoprecipitation (ChIP) and paired-end di-tag (PET) sequencing.

The PET sequences in this track are derived from 327,838 STAT1 ChIP fragments of interferon gamma-stimulated HeLa cells and 263,901 STAT1 ChIP fragments of non-stimulated HeLa cells. Of these individual ChIP fragments, 3,180 of the PETs from the stimulated cells and 4,007 PETs from unstimulated cells were mapped to the ENCODE regions. The data from the unstimulated cells were used as the negative control.

Only PETs mapped to the ENCODE regions are shown in this track.

Display Conventions and Configuration

In the graphical display, PET sequences are shown as two blocks, representing the ends of the pair, connected by a thin, arrowed line. Overlapping PET clusters (PET fragments that overlap one another) originating from the ChIP enrichment process define the genomic loci that are potential transcription factor binding sites (TFBSs). PET singletons, from non-specific ChIP fragments that did not cluster, are not shown.

In full and packed display modes, the arrowheads on the horizontal line represent the orientation of the PET sequence, and an ID of the format XXXXX-M is shown to the left of each PET, where X is the unique ID for each PET and M is the number of PET sequences at this location. The track coloring reflects the value of M: light gray indicates one or two sequences (score = 333), dark gray is used for three sequences (score = 800) and black indicates four or more PET sequences (score = 1000) at the location.

Methods

The STAT1 chromatin immuno-precipitated DNA fragments from stimulated and non-stimulated control cells were end-polished and cloned into the plasmid vector, pGIS3. pGIS3 contains two MmeI recognition sites that flank the cloning site, which were used to produce a 36 bp PET from the original ChIP DNA fragments (18 bp from each of the 5' and 3' ends). Multiple 36 bp PETs were concatenated and cloned into pZero-1 for sequencing, where each sequence read can generate 10-15 PETs. The PET sequences were extracted from raw sequence reads and mapped to the genome, defining the boundaries of each ChIP DNA fragment. The following specific mapping criteria were used:

Due to the known possibility of MmeI slippage (+/- 1 bp) that leads to ambiguities at the PET signature boundaries, a minimal 17 bp match was set for each 18 bp signature. Only PETs with specific mapping (one location) to the genome were considered. PETs that mapped to multiple locations may represent low complexity or repetitive sequences, and therefore were not included for further analysis.

Verification

Statistical and experimental verification exercises have shown that the overlapping PET clusters result from ChIP enrichment events.

Monte Carlo simulation using the STAT1 ChIP-PET data from interferon gamma-stimulated dataset estimated that random chance accounted for about 58% of PET-3 clusters (maximal numbers of PETs within the overlap region of any cluster), 21% of the PET clusters with 4 overlapping members (PET-4 clusters), and less than 0.5% of PET clusters with more than 5 overlapping members. This suggests that the PET-5+ clusters represent the real enrichment events by ChIP and that a higher number of overlapping fragments correlates to a higher probability of a real ChIP enrichment event. Furthermore, based on goodness-of-fit analysis for assessing the reliability of PET clusters, it was estimated that less than 30% of the PET-4 clusters and over 90% of the PET-5+ clusters (clusters with five or more overlapping members) are true enrichment ChIP sites.

In addition to these statistical analyses, 9 out of 14 genomic locations (64%) identified by PET-5+ clusters in the ENCODE regions were supported by ChIP-chip data from Yale using the same ChIP DNA as hybridization material.

Credits

The ChIP fragment prep was provided by Ghia Euskirchen from Michael Snyder's lab at Yale. The ChIP-PET library and sequence data were produced at the Genome Institute of Singapore. The data were mapped and analyzed by scientists from the Genome Institute of Singapore and the Bioinformatics Institute, Singapore.

References

Ng P, Wei CL, Sung WK, Chiu KP, Lipovich L, Ang CC, Gupta S, Shahab A, Ridwan A, Wong CH, et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods. 2005 Feb;2(2):105-11.