This track displays the measured changes in the activity of human promoters when computationally identified transcription factor binding sites (TFBS) were mutated. The mutations were computationally selected to have the greatest negative effect on the predicted TF binding affinity.
The effects of the selected mutations on promoter activity were experimentally measured by transient transfection reporter assays in the following cell lines: ht1080, t98g, hct116, hepg2 and 293.
The bed files available for download contain the following information:
The reported data were generated by a two-step process. An initial TFBS prediction and the best Transcription Factor (TF) binding disruptor single-point mutation were generated. These predictions were then experimentally verified by a transient transfection reporter assay which measures promoter activity in both the wild type and mutant promoters.
In order to identify the 6-10 bp footprint along the entire length of a promoter, the POSSUM score of a given PSSM was computed for every position along the approximately 1 kb length of our promoters. The POSSUM score is a log likelihood ratio characterizing the probability of observing a TFBS given the nucleotide frequencies in the corresponding PSSM with respect to the probability of observing a TFBS given the nucleotide frequencies in a background model. A simple strategy would be to select the strongest (according to the PSSM) TFBS on the promoter. Additional (listed below) filtering criteria have been imposed, however. Promoter activity assays were performed on 4575 human promoters in eight cell lines. The results of these experiments were used to train a machine learning algorithm (SVM) to predict the activity of novel promoters and only promoters that were predicted to show activity were searched for TF binding sites.
Filtering criterion 1: Each TFBS must be solitary. TFBS can appear multiple times along a promoter, such that secondary TFBS may compensate if the primary site is damaged or mutated. Because we mutated one TFBS at a time, this criterion reduces the chance of having a TF bind to a secondary site after mutating the primary site. This filtering criterion requires that the difference between the site with the highest POSSUM score and the site with the second highest Possum score be in the top 20% of a background distribution. The background distribution was calculated by surveying POSSUM score differences among 246 promoters in the ENCODE region.
Filtering criterion 2: The highest-scoring TFBS must be statistically significant. The POSSUM scores of every position along our training set of 4575 promoters were computed for 21 preliminarily selected transcription factors in order to generate background distributions. Filtering criterion 2 requires that binding sites selected for mutagenesis have POSSUM scores beyond the 99.95th percentile in their respective background distributions. Wild type and mutant promoter sequences were placed in plasmid constructs for use in transient transfection reporter assays. Reporter assays were carried out following the protocols prescribed by SwitchGear Genomics (see http://switchgeargenomics.com/resources/protocols/transfection-protocol/).
The promoter activity assays described in the methods section were carried out with 3 biological replicates and 2 additional technical replicates for a total of 5 replicates per sequence.