MEME Suite Motif File Formats
Output Formats
MEME results are recorded in three file formats: plain text, HTML, and XML. The MEME XML format is completely specified by the Document Type Definition (DTD) found at the start of the MEME XML output. The MEME plain text and HTML formats contain much explanatory text and are thus self-documenting. The XML format was added for MEME 4.0. The plain text and HTML formats have been supported in all versions of MEME.
GLAM2 provides plain text and HTML output. The format is described in the Output format section of the GLAM2 Tutorial. GLAM2 also provides MEME minimal motif format.
Input Formats
MAST will accept the plain text, HTML, and XML forms of MEME output, and the MEME minimal motif format. .
FIMO will accept the plain text, HTML, and XML forms of MEME output, and the MEME minimal motif format.
GOMO will accept the plain text, HTML, and XML forms of MEME output, and the MEME minimal motif format.
GLAM2SCAN will accept the plain text and HTML forms of GLAM2 output.
MEME Minimal Motif Format
Users may create motif files in a simplified format for use by the MEME Suite programs. Examples are given in the sample files below.
For All Programs Except MAST | For MAST | |
---|---|---|
DNA | sample | sample |
Protein | sample | sample |
The meaning of the format is as follows:
- The MEME version number line.
MEME version 4.5
-
The alphabet line.
For DNA motif files the lineALPHABET= ACGT
or for protein motif filesALPHABET= ACDEFGHIKLMNPQRSTVWY
must be present. -
Strand information line. (DNA motif files only.)
If both DNA strands are included in the motif:strands: + -
or if only one strand is included:strands: +
- The background distribution lines.
The background must start a new line with the string:Background letter frequencies (from
This is followed, on the next line(s), by a list of characters and their associated frequencies, delimited by white space. - The motifs.
There may be one or more motifs. Each motif starts with a "MOTIF" line, followed by a "log-odds matrix" and/or a "letter-probability matrix" section. MAST requires each motif to be represented as a "log-odds matrix". For all other programs that accept MEME minimal motif format, each motif must be represented as a "letter-probability matrix". You may include both formats for a given motif but only one will be used, depending on which program you are running. If you include both formats for a motif, you should put the log-odds matrix format first. Each of the sections has a header line followed by one line of letter score/frequencies for each position in the motif, as detailed below. It is recommended, though not required, that you included a URL line listing the webpage where more information can be found on the motif.
The motif format for MAST is:
MOTIF motif_name log-odds matrix: alength= 4 w= 22 E= 0 ... ... lines of log-odds scores; each line is list of scores for each letter ... ... URL website
The motif format for all other programs that accept MEME minimal motif format is:MOTIF motif_name letter-probability matrix: alength= 4 w= 22 nsites= 49 E= 0 ... ... lines of probabilities; each line is list of probabilities of each letter ... ... URL website
The first line should be exactly as shown, except the word "motif_name" should be replaced with the name of the motif, which may be any word (without spaces).
Note: If you included log-odds and letter-probability formats for a given motif, only include the "MOTIF name" line once, just above the "log-odds" line.
The "log-odds matrix" header has the values "alength" for the size of the alphabet (4 for DNA, 20 for protein), "w" for the number of positions in the motif, and "E" for the E-value of the motif (in most cases this can be set to zero). The "letter-probability matrix" header has the values "alength", "w" and "E" as above, with the addition of "nsites", specifying the number of sites that were used in creating the motif.
The log-odds or probabilities for the motif are listed "rotated". That is each line represents a "column" or "position" within the motif. There should be "w" lines. Each line specifies "alength" log-odds scores or probabilities, one for each letter in the alphabet. Within a line, scores or probabilities are listed for each letter in the order of the motif alphabet.
The last line is optional, but if included should have the word "website" replaced with the web address of more information on the motif.