MEME-ChIP Tutorial

Purpose

MEME-ChIP performs several motif analysis steps on a set of DNA sequences that you provide. It is especially appropriate for analyzing the bound genomic regions identified in a transcription factor (TF) ChIP-seq experiment. MEME-ChIP can 1) discover novel DNA-binding motifs, 2) analyze them for similarity to known binding motifs, 3) visualize the arrangement of the predicted motif sites in your input sequences, 4) detect very subtly enriched known motifs in your sequences, and 5) provide an estimate of the amount of binding of each novel motif to each of your sequences.

You provide MEME-ChIP with a set of sequences in FASTA format. Ideally the sequences are about 100 base-pairs long and enriched for motifs. The immediate regions around individual ChIP-seq "peaks" from a transcription factor (TF) ChIP-seq experiment are ideal. There is no limit on the number of sequences you provide, and they may be longer (or shorter) than 100 base-pairs if you desire. (The suggested 100 base-pair size is based on the typical resolution of ChIP-seq peaks.) We recommend that you "repeat mask" your sequences, replacing repeat regions to the "N" character.

The overall purpose of MEME-ChIP is to provide you with a number of investigations of your sequences without having to configure a number of tools separately. These investigations are:

  1. Ab initio motif discovery – for which we provide two different analyses, using MEME and DREME:
    • MEME is good for finding your ChIPed TF's motif, and can find wide motifs corresponding to complexes
    • DREME is good for finding shorter monomeric motifs and cofactors
  2. Comparison to known motifs – TOMTOM compares motifs found in your sequences by one of our ab initio tools to known motifs in a database.
  3. Visualization of motifs in the input sequences – MAST shows you where motifs found using an ab initio approach match the sequences
  4. Estimation of binding affinity of input sequences to each motif – AMA measures how strongly a motif is associated with each sequence.
  5. Motif enrichment analysis – AME discovers subtly enriched known binding motifs in your input sequences

Biological questions you can address using MEME-ChIP

MEME-ChIP Examples

We supply data to allow you to run a complete example, based on sequences from a ChIP-seq experiment (Klf1, mouse); you may also wish to study the outputs from a different example which we provide (SCL, also known as Tal1), and compare those outputs with those you obtain from the Klf1 example.

To see how the system works, try submitting the supplied Klf1 sample. Below the button for submitting a file, locate the link to “Sample DNA Input Sequences”. Click on the link, copy the sequence data, use the web browser's back button to go back to the data form, and paste the sequences into the box provided below the text “the actual sequences here (Sample DNA Input Sequences):”. This example has 945 sequences, just enough to get answers of reasonably high significance, and to exercise all the features of the web service. Do no change any of the settings, and click Start search at the bottom of the form. You should see confirmation that your job has been submitted. If you click the link next to the words “You can view your job results at:”, and refresh the screen every now and then, you will see a summary of your results in about an hour. If you have other things to do, don't worry if you have to close your web browser and go away: you will receive an email echoing the confirmation page, that includes directing you to the results. The primary motif should be a CACC motif, and GATA should feature as a secondary motif. SCL (also called TAL1) may also be associated with this data set, though harder to find.

Once outputs are available, here are some options open to you:

Once you have this information, you are in a position to start reasoning about the motifs you have found. Here are some examples of questions you may ask. Is the primary motif found by MEME and DREME essentially the same? Is that primary motif a known motif for the primary transcription factor, or similar to such a known motif? Does AME find a known motif similar to the primary motif with one of AME's lowest p-values? Are any known motifs found by AME likely to be cofactors? How do secondary motifs identified by MEME and DREME compare? How low are the p-values of theses secondary motifs as found by these tools? In the event that all the tools find consistent results, this increases your confidence that you have found a motif or motifs consistent with your input sequences. If the tools find inconsistent results, you need to investigate further to understand the cause or causes of the inconsistency.

Once you have worked through this example, study the description below of the tools we use here, and the outputs produced for more insight into how to construct your own examples. To get some idea of how time scales up, MEME takes about 4 times as long every time you double the size of the data (total number of bases in the sequences). Once you submit more than 600 sequences, time for the other tools increases linearly, so for very large examples, much larger than 600 sequences, you should expect run time to double if you double the size of your input. However, for runs of a few thousand sequences, most of the run time will be taken by MEME, even though we limit MEME to 600 sequences.

Tools used by MEME-ChIP

How MEME-ChIP pre-processes your input sequences

MEME-ChIP pre-processes your input DNA sequences before running some of the above tools. For input to the motif discovery and enrichment tools (MEME, DREME, AME), any sequences that are longer than 100 base-pairs are trimmed evenly to that length. All trimmed sequences are input to DREME and AME, but a maximum of 600 (randomly chosen) sequences are input to MEME.

The output report, which you can see one all the tools have completed, lists the commands you could run on the command line to reproduce the outputs, and provides pointers to outputs of all the above tools.

In addition, the output report makes available the following files:

In the event that you submit more than 600 sequences, the following files are also provided: