Description

This track was produced as part of the ENCODE project. It reports the percentage of DNA molecules that exhibit cytosine methylation. In general, DNA methylation within a gene's promoter is associated with gene silencing, and DNA methylation within the exons and introns of a gene is associated with gene expression. Proper regulation of DNA methylation is essential during development and aberrant DNA methylation is a hallmark of cancer. DNA methylation status was assayed with Whole Genome Bisulfite Sequencing (WGBS). Genomic DNA was sheared by sonication, end-repaired and then ligated to methylated sequencing adapters. The library fragments were treated with sodium bisulfite and amplified by PCR to convert every unmethylated cytosine to a thymine while leaving methylated cytosines intact. The sequenced fragments were aligned to a bisulfite-converted reference genome. For each assayed cytosine, the number of sequencing reads covering that C and the percentage of those reads that were methylated were reported.

Display Conventions and Configuration

Methylation status is represented with an 11-color gradient using the following convention:

Methylation is reported in two tracks: CpG and non-CpG (cytosines that are not followed by a guanine). The score in this track reports the number of sequencing reads obtained for each C, which is often called 'coverage'. Score is capped at 1000 and any Cs that are covered by more than 1000 sequencing reads have a score of 1000. The bed files available for download contain two extra columns: one with the uncapped coverage (number of reads that cover that site) and one with the percentage of those reads that show methylation. We have found that with bisulfite sequencing, high reproducibility is obtained when considering CpGs represented by at least 10 sequencing reads (10x coverage, score=10). Therefore, the default view for this track is set to 10X coverage, or a score of 10.

Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks.

Methods

DNA methylation at cytosines across the genome was assayed with Whole Genome Bisulfite Sequencing (WGBS). WGBS was performed on cell lines grown by ENCODE production groups. WGBS was carried out by the Myers production group at the HudsonAlpha Institute for Biotechnology.

Isolation of Genomic DNA

Genomic DNA was isolated from each cell line using the QIAGEN DNeasy Blood & Tissue Kit according to the instructions provided by the manufacturer. DNA concentrations for each genomic DNA preparation were determined using fluorescent DNA-binding dye and a fluorometer (Invitrogen Quant-iT dsDNA High Sensitivity Kit and Qubit Fluorometer). Typically, 2 µg of genomic DNA is used to make WGBS libraries.

WGBS Library Construction and Sequencing

WGBS library construction started with sonication of genomic DNA on a Covaris S2 instrument. Sheared ends were then repaired and blunted with DNA polymerase I, T4 DNA polymerase and T4 polynucleotide kinase in the presence of dATP, dGTP and dTTP. After end repair, Klenow exo- DNA Polymerase was used to add an adenosine as a 3' overhang. Next, a methylated version of the Illumina paired-end adapters was ligated onto the DNA. Adapter-ligated 400 bp genomic DNA fragments were selected using a 2% agarose SizeSelect E-gel. The selected adapter-ligated fragments were treated with sodium bisulfite using the Zymo Research EZ DNA Methylation Gold Kit, which converts unmethylated cytosines to uracils and leaves methylated cytosines unchanged. Bisulfite-treated DNA was amplified in a final PCR reaction which was optimized to uniformly amplify diverse fragment sizes and sequence contexts in the same reaction. During this final PCR reaction, uracils were copied as thymines, resulting in a thymine in the PCR products wherever an unmethylated cytosine existed in the genomic DNA. These libraries were then sequenced with an Illumina HiSeq 2000 according to the manufacturer's recommendations as paired-end 50 bp reads. Libraries were sequenced to a depth of 600 million aligned reads.

Data Analysis

To analyze the sequence data, Bismark (Krueger and Andrews, 2011) was used to align sequences reads. Generally, each read went through a conversion of Cs to Ts and was then aligned to fully converted plus and minus strands of the hg19 build of the human genome. A few custom refinements were made to the Bismark program. Since these libraries were made in a directional orientation with the first read always being C-poor, we skipped unnecessary alignments to impossible orientations. We also implemented a more stringent uniqueness filter, only allowing reads that have one acceptable alignment (based on default Bowtie parameters) across both strands. Once reads were aligned, the percent methylation was calculated for each cytosine using the original sequence reads. The percent methylation and number of reads is reported for each CpG in the wgEncodeHaibMethylWgbsXXXXCpg.bigBed file and for each non CpG cytosine in the wgEncodeHaibMethylWgbsXXXXNoncpg.bigBed file.

Release Notes

The Whole Genome Bisulfite Sequencing data, alignment and browser display were produced as a test of the Whole Genome Bisulfite Sequencing method for ENCODE. Due to the complexity of aligning bisulfite-converted DNA to the whole genome, this method is still being perfected. Therefore, this track will not be released to public but will remain on the preview browser until more tests can be performed to confirm the alignment. These are still very valuable data (alignment and fastq files), however, and the data will be made available on GEO.

Credits

These data were produced by the Dr. Richard Myers Lab at the HudsonAlpha Institute for Biotechnology.

Cells were grown by the Myers Lab and other ENCODE production groups.

Contact: Dr. Florencia Pauli

References

Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics. 2011 Jun 1;27(11):1571-2.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.