Description

CpG islands are associated with genes, particularly housekeeping genes, in vertebrates. CpG islands are typically common near transcription start sites, and may be associated with promoter regions. Normally a C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the Cs in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over evolutionary time methylated Cs tend to turn into Ts because of spontaneous deamination. The result is that CpGs are relatively rare unless there is selective pressure to keep them or a region is not methylated for some reason, perhaps having to do with the regulation of gene expression. CpG islands are regions where CpGs are present at significantly higher levels than is typical for the genome as a whole.

The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length.

Methods

The genome sequence was masked using the output of RepeatMasker and the Tandem Repeats Finder (period ≤ 12). A sliding-window search was performed on the set of CpG locations in the masked genome sequence to find the longest spans that met the criteria given in Gardiner-Garden, M. and Frommer, M. (1987) in the References section below:

The ratio of observed to expect CpGs is calculated as follows:

Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)

Credits

This track was generated using a program written by Andy Law (Roslin Institute) with minor modifications by Angie Hinrichs (UCSC).

References

Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J. Mol. Biol. 1987 Jul 20;196(2):261-82.