Description

This track shows genome-wide p53 binding sites as determined by chromatin immunoprecipitation (ChIP) and paired-end di-tag (PET) sequencing. The p53 protein is a transcription factor involved in the control of cell growth that is often expressed at high levels in cancer cells. See the Methods section below for more information about ChIP and PET.

The PET sequences in this track are derived from 65,572 individual p53 ChIP fragments of 5-fluorouracil (5FU) stimulated HCT116 cells. More datasets will be submitted in the future, including STAT1, TAF250, and E2F1.

Display Conventions and Configuration

In the graphical display, PET sequences are shown as two blocks, representing the ends of the pair, connected by a thin arrowed line. Overlapping PET clusters (PET fragments that overlap one another) originating from the ChIP enrichment process define the genomic loci that are potential transcription factor binding sites (TFBSs). PET singletons, from non-specific ChIP fragments that did not cluster, are not shown.

In full and packed display modes, the arrowheads on the horizontal line represent the orientation of the PET sequence, and an ID of the format XXXXX-M is shown to the left of each PET, where X is the unique ID for each PET and M is the number of PET sequences at this location. The track coloring reflects the value of M: light gray indicates one or two sequences (score = 333), dark gray is used for three sequences (score = 800) and black indicates four or more PET sequences (score = 1000) at the location.

Methods

HCT116 cells were treated with 5FU for six hours. The cross-linked chromatin was sheared and precipitated with a high affinity antibody. The DNA fragments were end-polished and cloned into the plasmid vector, pGIS3. pGIS3 contains two MmeI recognition sites that flank the cloning site, which were used to produce a 36 bp PET from the original ChIP DNA fragments (18 bp from each of the 5' and 3' ends). Multiple 36 bp PETs were concatenated and cloned into pZero-1 for sequencing, where each sequence read can generate 10-15 PETs. The PET sequences were extracted from raw sequence reads and mapped to the genome, defining the boundaries of each ChIP DNA fragment. The following specific mapping criteria were used:

Due to the known possibility of MmeI slippage (+/- 1 bp) that leads to ambiguities at the PET signature boundaries, a minimal 17 bp match was set for each 18 bp signature. The total count of PET sequences mapped to the same locus but with slight nucleotide differences may reflect the expression level of the transcripts. Only PETs with specific mapping (one location) to the genome were considered. PETs that mapped to multiple locations may represent low complexity or repetitive sequences, and therefore were not included for further analysis.

Verification

Statistical and experimental verification exercises have shown that the overlapping PET clusters result from ChIP enrichment events.

Monte Carlo simulation using the p53 ChIP-PET data estimated that about 27% of PET-2 clusters (PET clusters with two overlapping members), 3% of the PET clusters with 3 overlapping members (PET-3 clusters), and less than 0.0001% of PET clusters with more than 3 overlapping members were due to random chance. This suggests that the PET clusters most likely represent the real enrichment events by ChIP and that a higher number of overlapping fragments correlates to a higher probability of a real ChIP enrichment event. Furthermore, based on goodness-of-fit analysis for assessing the reliability of PET clusters, it was estimated that less than 36% of the PET-2 clusters and over 99% of the PET-3+ clusters (clusters with three or more overlapping members) are true enrichment ChIP sites. Thus, the verification rate is nearly 100% for PET-3+ ChIP clusters, and the PET-2 clusters contain significant noise.

In addition to these statistical analyses, 40 genomic locations identified by PET-3+ clusters were randomly selected and analyzed by quantitative real-time PCR. The relative enrichment of candidate regions compared to control GST ChIP DNA was determined and all 40 regions (100%) were confirmed to have significant enrichment of p53 ChIP clusters.

Credits

The p53 ChIP-PET library and sequence data were produced at the Genome Institute of Singapore. The data were mapped and analyzed by scientists from the Genome Institute of Singapore, the Bioinformatics Institute, Singapore, and Boston University.

References

Ng, P. et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nature Methods 2, 105-111 (2005).