Usage: tomtom [options] <query motifs>
<target motif database>+
Description:
The Tomtom
program searches one or more query
motifs against one or more databases of target motifs
(and their DNA reverse complements), and reports for each
query a list of target motifs, ranked by p-value.
The E-value and the q-value of each match
is also reported. The q-value is the minimal false
discovery rate at which the observed similarity would be
deemed significant. The output contains results for each
query, in the order that the queries appear in the input file.
For a given pair of motifs, the program considers all offsets,
while requiring a minimum number of overlapping positions. For a
given offset, each overlapping position is scored using one of
seven column similarity functions defined below.
Columns in the query motif that don't overlap the target motif
are assigned a score equal to the median score of the set of
random matches to that column.
In order to
compute the scores, Tomtom
needs to know the
frequencies of the letters of the sequence alphabet in the
database being searched (the "background" letter
frequencies). By default, the background letter frequencies
included in the MEME input files are used. The scores of columns
that overlap for a given offset are summed. This summed score is
then converted to a p-value. The reported p-value
is the minimal p-value over all possible offsets.
To compensate for multiple testing, each reported p-value is
converted to an E-value by multiplying it by twice the
number of target motifs. As a second type of multiple-testing
correction, q-values for each match are computed
from the set of p-values and reported.
Input:
-
<query motifs> - A file containing one or more motifs in MEME format. Each of these motifs will be
searched against the target databases. If you only wish to search with
a subset of these motifs then look into the
-m
and-mi
options. - <target motif databases> - One or more files containing one or more motifs in MEME format.
Output:
Tomtom
writes its output to files in a directory named tomtom_out, which it creates if necessary. (You can also cause the output to be written to a different directory; see -o and -oc, below.) The main output file is named tomtom.html and can be viewed with an internet browser. The tomtom.html file is created from the tomtom.xml file. An additional file, tomtom.txt, contains a simplified, text-only version of the output. (See -text, below, for the text output format.) For each query-target match, two additional files containing LOGO alignments are also written--an encapsulated postscript file (.eps) and a PNG file (.png). If the convert program is not available, no PNG files will be written.Only matches for which the significance is less than or equal to the threshold set by the
-thresh
switch (default of 0.5) will be shown. By default, significance is measured by q-value of the match. The q-value is the estimated false discovery rate if the occurrence is accepted as significant. See Storey JD, Tibshirani R, "Statistical significance for genome-wide studies". Proc. Natl Acad. Sci. USA (2003) 100:9440–9445
Options:
-
-o <output dir>
- Name of the output directory for all output files. If the output directory already exists, it will not be replaced and the program will exit without doing anything. -
-oc <output dir>
- Name of the output directory for all output files. If the output directory already exists, it will be replaced ('clobbered'). -
-bfile <background file>
- Name of a file specifying the background frequencies. If this is omitted then the background frequencies will be derived from the first target database. -
-m <id>
- The name of a motif in the query file that will be used. This option may be repeated multiple times. If both this option and the related-mi
is unused then all motifs in the query file will be used. -
-mi <index>
- The offset in the query file of a motif that will be used. This option may be repeated multple times. If both this option and the related-m
is unused then all motifs in the query file will be used. -
-incomplete-scores
- Compute scores using only aligned columns. -
-thresh <value>
- Only report matches with significance values less than or equal to the specified threshold (Default = 0.5). Unless the-evalue
option is specifed then this value must be smaller than or equal to 1. -
-evalue
- Use the E-value of the match as the significance threshold (Default: use the q-value). -
-dist [allr|ed|kullback|pearson|sandelin]
- allr
- Average log-likelihood ratio
- ed
- Euclidian distance
- kullback
- Kullback-Leibler divergence
- pearson
- Pearson correlation coefficient
- sandelin
- Sandelin-Wasserman function
Tomtom
. -
-internal
- This parameter forces the shorter motif to be completely contained in the longer motif. -
-min-overlap <value>
- Only report motif matches that overlap by this many positions or more. In case a query motif is smaller than the value ofmin-overlap
, then the corresponding motif-width is used as the requiredmin-overlap
for that query. The default value is 1. -
-query-pseudo <float>
- This option adds the specified pseudocount to each count in the each query matrix. The default value is 0. -
-target-pseudo <float>
- This option adds a pseudocount to each count in each target matrix. The default value is 0. -
-text
This option causes Tomtom to print just a tab-delimited text file to standard output. The output begins with a header, indicated by leading "#" characters. This is followed by a single title line, and then the actual values. The columns are:- Query motif name
- Target motif name
- Optimal offset: the offset between the query and the target motif
- p-value
- E-value
- q-value
- Overlap: the number of positions of overlap between the two motifs.
- Query consensus sequence.
- Target consensus sequence.
- Orientation: Orientation of target motif with respect to query motif.
-
-no-ssc
This option causes the LOGOs in the LOGO alignments output by Tomtom not to be corrected for small-sample sizes. By default, the height of letters in the LOGOs are reduced when the number of samples on which a motif is based (nsites
in the MEME motif) is small. The default setting can cause motifs based on very few sites to have "empty" LOGOs, so this switch can be used if your query or target motifs are based on few samples. -
-verbosity [1|2|3|4]
- This option changes the level of detail of messages printed. At level 1 only critical errors are reported whereas at level 4 everything is printed. The default is 2.
Bugs: none known.
Authors: Shobhit Gupta (shobhitg@u.washington.edu), Timothy Bailey (tbailey@imb.uq.edu.au), Charles E. Grant (cegrant@gs.washington.edu) and William Noble (noble@gs.washington.edu).