Description

This track displays measured changes in the activity of human promoters when computationally identified transcription factor binding sites (TFBS) were mutated. The mutations were computationally selected to have the greatest negative effect on the predicted TF binding affinity.

The effects of the selected mutations on promoter activity were experimentally measured by transient transfection reporter assays in the following cell lines: ht1080, t98g, hct116, hepg2 and 293.

Methods

The reported data were generated by means of a two-step process, consisting of an initial computational prediction of a TFBS and the most effective (in terms of disrupting TF binding) single-point mutation to carry out, followed by a subsequent experimental verification by means of a transient transfection reporter assay to measure promoter activity in both the wild type and mutant promoters.

In order to identify the 6-10-bp footprint along the entire length of a promoter, the POSSUM score of a given PSSM was computed for every position along the approximately 1kb length of our promoters. The POSSUM score is a log likelihood ratio characterizing the probability of observing a TFBS given the nucleotide frequencies in the corresponding PSSM, versus the probability of observing a TFBS given the nucleotide frequencies in a background model. A simple strategy would be to select the strongest (according to the PSSM) TFBS on the promoter. Additional (listed below) filtering criteria have been imposed, however. Promoter activity assays were performed on 4575 human promoters in eight cell lines. The results of these experiments were used to train a machine learning algorithm (SVM) topredict the activity of novel promoters and only promoters that were predicted to show activity were searched for TF binding sites.

Filtering criterion 1: each TF binding site must be solitary Transcription factor binding sites can appear many times along a promoter, such that secondary TF binding sites may compensate for a primary TF binding site if the primary site is damaged or mutated. As we mutated one TF binding site at a time, this criterion reduces the chance of having a TF bind to a secondary site after mutating the primary site. This filtering criterion requires that the difference between the site with the highest POSSUM score and the site with the second highest Possum score be in the top 20% of a background distribution. The background distribution was calculated by surveying POSSUM score differences among 246 promoters in the ENCODE region.

Filtering criterion 2: The highest-scoring TFBS must be statistically significant The POSSUM scores of every position along our training set of 4575 promoters were computed for 21 preliminarily selected transcription factors in order to generate background distributions. Filtering criterion 2 requires that binding sites selected for mutagenesis have POSSUM scores beyond the 99.95th percentile in their respective background distributions. Wild type and mutant promoter sequences were placed in plasmid constructs for use in transient transfection reporter assays. Reporter assays were carried out following the protocols prescribed by SwitchGear Genomics (see http://switchgeargenomics.com/resources/protocols/transfection-protocol/).

Verification

The promoter activity assays described in the methods section were carried out with 3 biological replicates and 2 additional technical replicates for a total of 5 replicates per sequence.

Credits

Computational prediction of transcription factor binding sites, selection of mutations and analysis of promoter activity data was carried out in the Weng lab at the University of Massachusetts Medical School. Promoter activity assays were carried out in the Myers lab at the HudsonAlpha Institute, and preparation of wild type and mutant plasmids was done by SwitchGear Genomics. The following people contributed: Jane Landolin, Troy Whitfield, Zhiping Weng, Christopher Partridge, Richard Myers, Nathan Trinklein and Patrick Collins