Description

This track is produced as part of the ENCODE project. The track reports the percentage of DNA molecules that exhibit cytosine methylation at specific CpG dinucleotides. In general, DNA methylation within a gene's promoter is associated with gene silencing, and DNA methylation within the exons and introns of a gene is associated with gene expression. Proper regulation of DNA methylation is essential during development and aberrant DNA methylation is a hallmark of cancer. DNA methylation status is assayed at more than 500,000 CpG dinucleotides in the genome using Reduced Representation Bisulfite Sequencing (RRBS). Genomic DNA is digested with the methyl-insensitive restriction enzyme MspI, small genomic DNA fragments are purified by gel electrophoresis, and then used to construct an Illumina sequencing library. The library fragments are treated with sodium bisulfite and amplified by PCR to convert every unmethylated cytosine to a thymidine while leaving methylated cytosines intact. The sequenced fragments are aligned to a customized reference genome sequence. For each assayed CpG the number of sequencing reads covering that CpG and the percentage of those reads that are methylated were reported.

Display Conventions and Configuration

Methylation status is represented with an 11-color gradient using the following convention:

The score in this track reports the number of sequencing reads obtained for each CpG, which is often called 'coverage'. Score is capped at 1000, so any CpGs that were covered by more than 1000 sequencing reads will have a score of 1000. The bed files available for download contain two extra columns, one with the uncapped coverage (number of reads at that site), and one with the percentage of those reads that show methylation. High reproducibility is obtained, with correlation coefficients greater than 0.9 between biological replicates, when only considering CpGs represented by at least 10 sequencing reads (10x coverage, score=10). Therefore, the default view for this track is set to 10X coverage, or a score of 10.

Methods

DNA methylation at CpG sites was assayed with a modified version of Reduced Representation Bisulfite Sequencing (RRBS; Meissner et al., 2008). RRBS was performed on cell lines grown by many ENCODE production groups. The production group that grew the cells and isolated genomic DNA is indicated in the "obtainedBy" field of the metadata. When a cell type was provided by more than one lab, the data from only one lab are available for immediate display. However, the data for every cell type from every lab is available from the Downloads page. RRBS was also performed on genomic DNA from tissue samples provided by BioChain (denoted as "BC" in the sample name). The replicates for the BioChain tissues are technical beginning at the bisulfite treatment step. RRBS was carried out by the Myers production group at the HudsonAlpha Institute for Biotechnology.

Isolation of genomic DNA

Genomic DNA is isolated from biological replicates of each cell line using the QIAGEN DNeasy Blood & Tissue Kit according to the instructions provided by the manufacturer. DNA concentrations for each genomic DNA preparation are determined using fluorescent DNA binding dye and a fluorometer (Invitrogen Quant-iT dsDNA High Sensitivity Kit and Qubit Fluorometer). Typically, 1 µg of DNA is used to make an RRBS library; however, there has been success in making libraries with 200 ng genomic DNA from rare or precious samples.

RRBS library construction and sequencing

RRBS library construction starts with MspI digestion of genomic DNA , which cuts at every CCGG regardless of methylation status. Klenow exo- DNA Polymerase is then used to fill in the recessed end of the genomic DNA and add an adenosine as a 3prime overhang. Next, a methylated version of the Illumina paired-end adapters is ligated onto the DNA. Adapter ligated genomic DNA fragments between 105 and 185 base pairs are selected using agarose gel electrophoresis and Qiagen Qiaquick Gel Extraction Kit. The selected adapter-ligated fragments are treated with sodium bisulfite using the Zymo Research EZ DNA Methylation Gold Kit, which converts unmethylated cytosines to uracils and leaves methylated cytosines unchanged. Bisulfite treated DNA is amplified in a final PCR reaction which has been optimized to uniformly amplify diverse fragment sizes and sequence contexts in the same reaction. During this final PCR reaction uracils are copied as thymines resulting in a thymine in the PCR products wherever an unmethylated cytosine existed in the genomic DNA. The sample is now ready for sequencing on the Illumina sequencing platform. These libraries were sequenced with an Illumina Genome Analyzer IIx according to the manufacturer's recommendations. The full RRBS protocol can be found here.

Data analysis

To analyze the sequence data, a reference genome is created that contains only the 36 base pairs adjacent to every MspI site and every C in those sequences is changed to T. A converted sequence read file is then created by changing each C in the original sequence reads to a T. The converted sequence reads are aligned to the converted reference genome and only reads that map uniquely to the reference genome are kept. Once reads are aligned, the percent methylation is calculated for each CpG using the original sequence reads. The percent methylation and number of reads is reported for each CpG.

Release Notes

This is release 2 (September 2011) of this track and includes an additional 44 cell types. Please refer to the cell line descriptions and the Open Chromatin(Crawford) protocols for details on the HSMM, HSMM_tube, HSMM_FSHD and HSMMtube_FSHD samples. There are also additional data in the Downloads section. Fastq files, which contain the raw DNA sequencing reads of the bisulfite treated DNA are now available for every experiment. GEO accession numbers are available for the data from release 1.

Please note: The subtrack for HSMM Rep 3 was replaced with the correct data for that replicate.

Credits

These data were produced by the Dr. Richard Myers Lab

at the HudsonAlpha Institute for Biotechnology.

Cells were grown by the Myers Lab and other ENCODE production groups.

Contact: Dr. Florencia Pauli.

References

Meissner A, Mikkelsen TS, Gu H, Wernig M, Hanna J, Sivachenko A, Zhang X, Bernstein BE, Nusbaum C, Jaffe DB et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature. 2008 Aug 7;454(7205):766-70.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.