Description

This track contains chromatin interaction data from the University of Washington ENCODE group generated using 5C (Chromatin Conformation Capture Carbon Copy). The 5C method is used here to define short and long-range range interactions between transcription start sites (TSS) and DNaseI hypersensitive sites (DHS) or other genomic features. The 5C method is summarized below.

Transcription factors bind to promoter-associated proteins, bringing the associated DNA sequences in close proximity to each other. Cross linking the DNA and proteins immobilizes these interactions and thus maintains their close proximity. Cleavage of the sample with restriction endonuclease followed by ligation results in hybrid molecules where a fragment with a regulatory element is physically associated with a fragment containing a TSS. The interactions are then detected by oligonucleotide-dependent, ligation-mediated assays, where one set of primers is complementary to the end of fragments with a TSS and the second set of primers are complementary to fragments with a feature. Primers are designed to the forward strand of the feature and the reverse strand of the TSS so that ligation only occurs between TSS and feature, not between different features. Specific interactions are detected by massively parallel sequencing.

The data in this track comprises two different experiment types focusing on targeted regions:

Gene-targeted project

Analysis of DNase I hypersensitive sites reveals many genes where there are multiple sites restricted to the cell type where a protein is observed to be expressed. These sites potentially identify regulatory sites for the gene. This set of experiments attempts to observe interactions between these DHS sites and transcription starts in 25 regions selected based on genes expressed in GM06990 (B-lymphocyte), BJ (foreskin fibroblast), HepG2 (liver cancer cell line), or SK-N-SH_RA (neuroblastoma cell line, SKNSH, differentiated with retinoic acid).

Myc project

Genome wide association studies have identified SNPs linked to prostate, colon, and breast cancer in the gene desert region upstream of the myc gene. 5C of HindIII fragments interacting with those containing refSeq txStarts in this region were performed in 5 cell types: GM12878 (B-lymphocyte), CaCo2 (colon cancer cell line), LNCaP (prostate cancer cell line), MCF7 (breast cancer cell line), and K562 (erythroleukemia cell line).

File Conventions

The following types of data are available for download:

Matrix
Interaction files are in a matrix format indicating interaction strength, with "reverse primer name | genome version | reverse HindIII fragment coordinates" in the top row and "forward primer name | genome version | forward primer fragment coordinates" in the first column. The number of sequences mapped to each interaction fills the matrix. In order to understand the Matrix data, you must download the associated primer data file.
Primer
Primer data files include the sequences of the primers used in the experiments and sequences for control sites in the ENCODE pilot ENr313 gene desert region on chr16. These files are available for download in the supplemental materials.
Raw Data
Sequencing files are provided in fastQ format.

Methods

Cells were grown according to the approved ENCODE cell culture protocols. The isolated nuclei were formaldehyde cross-linked. The DNA isolated from the nuclei was cleaved with restriction enzyme, ligated, and cross-links removed to create a 3C library (Dekker et al., 2002). Primers complementary to the TSS and feature were added, annealed and ligated to produce a 5C library (Dostie et al., 2006). The DNA fragments generated in the ligation mediated-reactions were partially digested with DNaseI, end-repaired and ligated to adapters, before sequencing. The sequencing reads generated were mapped to the predicted ligation products. The number of sequences mapping to predicted junction fragments were tabulated from sequencing runs. The number of times a sequence was detected for a given interaction between a TSS and feature indicates the relative strength of the interation.

Gene-targeted project

Forward primers were designed to HindIII sites in a 230-415 kb sequence centered on the DNase I hypersensitive sites of interest. Reverse primers were designed to HindIII sites for all transcription starts extending 1 Mb on either side of the region targeted by the forward primer set. Matrix files are labeled by the coordinates of the region covered by the forward primer set. These experiments were done in a multiplex manner with the forward and reverse primers for all 25 regions mixed together in a single reaction. Two replicates were performed for 4 cell lines for 25 regions. High-throughput sequencing was performed on an ABI SOLiD instrument collecting 50 bp reads. The interaction files provided map all the reads in the output sequence without a mismatch threshold.

Myc project

Forward primers were designed to HindIII fragments of 4.29 Mb section of human chromosome 8 centered on the gene desert 5~R of the myc gene. Reverse primers were designed to all HindIII fragments containing refseq txStarts in a 7.6 Mb region extending > 2 Mb on either side of the forward primer set. High-throughput sequencing was performed on an ABI SOLiD instrument collecting 50 bp reads. The interaction files provided map all the reads in the output sequence without a mismatch threshold.

Verification

Data were verified by sequencing biological replicates displaying correlation coefficient > 0.9.

Credits

These data were generated by the University of Washington ENCODE Group.

Contact: Richard Sandstrom

References

Dekker J, Rippe K, Dekker M, Kleckner N. Capturing chromosome conformation. Science 2002 Feb 15;295(5558):1306-11.

Dostie J, Richmond TA, Arnaout RA, Selzer RR, Lee WL, Honan TA, Rubio ED, Krumm A, Lamb J, Nusbaum C et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res 2006 Oct;16(10):1299-309.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.