Description

This track displays the measured changes in the activity of human promoters when computationally identified transcription factor binding sites (TFBS) were mutated. The mutations were computationally selected to have the greatest negative effect on the predicted TF binding affinity.

The effects of the selected mutations on promoter activity were experimentally measured by transient transfection reporter assays in the following cell lines: ht1080, t98g, hct116, hepg2 and 293.

File Conventions

The bed files available for download contain the following information:

chromosome: reference sequence or scaffold
start site: start position in chromosome
end site: end position in chromosome
tfname: transcription factor
gene: associated gene
wt sequence: wild type DNA sequence
mut sequence: mutant DNA sequence
avg wt score: average promoter activity assay score for wild type sequence
var wt score: variance in promoter activity assay score
avg pa score: avg pa score = log2(avg mut score/avg wt score)
var pa score: variance in promoter activity assay score
pvalue: p-value from hypothesis testin

Methods

The reported data were generated by a two-step process. An initial TFBS prediction and the best Transcription Factor (TF) binding disruptor single-point mutation were generated. These predictions were then experimentally verified by a transient transfection reporter assay which measures promoter activity in both the wild type and mutant promoters.

In order to identify the 6-10 bp footprint along the entire length of a promoter, the POSSUM score of a given PSSM was computed for every position along the approximately 1 kb length of our promoters. The POSSUM score is a log likelihood ratio characterizing the probability of observing a TFBS given the nucleotide frequencies in the corresponding PSSM with respect to the probability of observing a TFBS given the nucleotide frequencies in a background model. A simple strategy would be to select the strongest (according to the PSSM) TFBS on the promoter. Additional (listed below) filtering criteria have been imposed, however. Promoter activity assays were performed on 4575 human promoters in eight cell lines. The results of these experiments were used to train a machine learning algorithm (SVM) to predict the activity of novel promoters and only promoters that were predicted to show activity were searched for TF binding sites.

Filtering criterion 1: Each TFBS must be solitary. TFBS can appear multiple times along a promoter, such that secondary TFBS may compensate if the primary site is damaged or mutated. Because we mutated one TFBS at a time, this criterion reduces the chance of having a TF bind to a secondary site after mutating the primary site. This filtering criterion requires that the difference between the site with the highest POSSUM score and the site with the second highest Possum score be in the top 20% of a background distribution. The background distribution was calculated by surveying POSSUM score differences among 246 promoters in the ENCODE region.

Filtering criterion 2: The highest-scoring TFBS must be statistically significant. The POSSUM scores of every position along our training set of 4575 promoters were computed for 21 preliminarily selected transcription factors in order to generate background distributions. Filtering criterion 2 requires that binding sites selected for mutagenesis have POSSUM scores beyond the 99.95th percentile in their respective background distributions. Wild type and mutant promoter sequences were placed in plasmid constructs for use in transient transfection reporter assays. Reporter assays were carried out following the protocols prescribed by SwitchGear Genomics (see http://switchgeargenomics.com/resources/protocols/transfection-protocol/).

Verification

The promoter activity assays described in the methods section were carried out with 3 biological replicates and 2 additional technical replicates for a total of 5 replicates per sequence.

Credits

Computational prediction of transcription factor binding sites, selection of mutations and analysis of promoter activity data was carried out in the Weng lab at the University of Massachusetts Medical School. Promoter activity assays were carried out in the Myers lab at the HudsonAlpha Institute, and preparation of wild type and mutant plasmids was done by SwitchGear Genomics. The following people contributed: Jane Landolin, Troy Whitfield, Zhiping Weng, Christopher Partridge, Richard Myers, Nathan Trinklein and Patrick Collins