Tomtom

Usage: tomtom [options] <query motifs> <target motif database>+

Description:

The Tomtom program searches one or more query motifs against one or more databases of target motifs (and their DNA reverse complements), and reports for each query a list of target motifs, ranked by p-value. The E-value and the q-value of each match is also reported. The q-value is the minimal false discovery rate at which the observed similarity would be deemed significant. The output contains results for each query, in the order that the queries appear in the input file.

For a given pair of motifs, the program considers all offsets, while requiring a minimum number of overlapping positions. For a given offset, each overlapping position is scored using one of seven column similarity functions defined below. Columns in the query motif that don't overlap the target motif are assigned a score equal to the median score of the set of random matches to that column. In order to compute the scores, Tomtom needs to know the frequencies of the letters of the sequence alphabet in the database being searched (the "background" letter frequencies). By default, the background letter frequencies included in the MEME input files are used. The scores of columns that overlap for a given offset are summed. This summed score is then converted to a p-value. The reported p-value is the minimal p-value over all possible offsets. To compensate for multiple testing, each reported p-value is converted to an E-value by multiplying it by twice the number of target motifs. As a second type of multiple-testing correction, q-values for each match are computed from the set of p-values and reported.

Input:

<query motifs> - A file containing one or more motifs in MEME format. Each of these motifs will be searched against the target databases. If you only wish to search with a subset of these motifs then look into the -m and -mi options.
<target motif databases> - One or more files containing one or more motifs in MEME format.

Output:

Tomtom writes its output to files in a directory named tomtom_out, which it creates if necessary. (You can also cause the output to be written to a different directory; see -o and -oc, below.) The main output file is named tomtom.html and can be viewed with an internet browser. The tomtom.html file is created from the tomtom.xml file. An additional file, tomtom.txt, contains a simplified, text-only version of the output. (See -text, below, for the text output format.) For each query-target match, two additional files containing LOGO alignments are also written--an encapsulated postscript file (.eps) and a PNG file (.png). If the convert program is not available, no PNG files will be written.

Only matches for which the significance is less than or equal to the threshold set by the -thresh switch (default of 0.5) will be shown. By default, significance is measured by q-value of the match. The q-value is the estimated false discovery rate if the occurrence is accepted as significant. See Storey JD, Tibshirani R, "Statistical significance for genome-wide studies". Proc. Natl Acad. Sci. USA (2003) 100:9440–9445

Options:

-o <output dir> - Name of the output directory for all output files. If the output directory already exists, it will not be replaced and the program will exit without doing anything.
-oc <output dir> - Name of the output directory for all output files. If the output directory already exists, it will be replaced ('clobbered').
-bfile <background file> - Name of a file specifying the background frequencies. If this is omitted then the background frequencies will be derived from the first target database.
-m <id> - The name of a motif in the query file that will be used. This option may be repeated multiple times. If both this option and the related -mi is unused then all motifs in the query file will be used.
-mi <index> - The offset in the query file of a motif that will be used. This option may be repeated multple times. If both this option and the related -m is unused then all motifs in the query file will be used.
-incomplete-scores - Compute scores using only aligned columns.
-thresh <value> - Only report matches with significance values less than or equal to the specified threshold (Default = 0.5). Unless the -evalue option is specifed then this value must be smaller than or equal to 1.
-evalue - Use the E-value of the match as the significance threshold (Default: use the q-value).
-dist [allr|ed|kullback|pearson|sandelin]

allr

Average log-likelihood ratio

ed

Euclidian distance

kullback

Kullback-Leibler divergence

pearson

Pearson correlation coefficient

sandelin

Sandelin-Wasserman function

Detailed descriptions of these functions can be found in the published description of Tomtom.
-internal - This parameter forces the shorter motif to be completely contained in the longer motif.
-min-overlap <value> - Only report motif matches that overlap by this many positions or more. In case a query motif is smaller than the value of min-overlap, then the corresponding motif-width is used as the required min-overlap for that query. The default value is 1.
-query-pseudo <float> - This option adds the specified pseudocount to each count in the each query matrix. The default value is 0.
-target-pseudo <float> - This option adds a pseudocount to each count in each target matrix. The default value is 0.
-text This option causes Tomtom to print just a tab-delimited text file to standard output. The output begins with a header, indicated by leading "#" characters. This is followed by a single title line, and then the actual values. The columns are:
- Query motif name
- Target motif name
- Optimal offset: the offset between the query and the target motif
- p-value
- E-value
- q-value
- Overlap: the number of positions of overlap between the two motifs.
- Query consensus sequence.
- Target consensus sequence.
- Orientation: Orientation of target motif with respect to query motif.
-no-ssc This option causes the LOGOs in the LOGO alignments output by Tomtom not to be corrected for small-sample sizes. By default, the height of letters in the LOGOs are reduced when the number of samples on which a motif is based (nsites in the MEME motif) is small. The default setting can cause motifs based on very few sites to have "empty" LOGOs, so this switch can be used if your query or target motifs are based on few samples.
-verbosity [1|2|3|4] - This option changes the level of detail of messages printed. At level 1 only critical errors are reported whereas at level 4 everything is printed. The default is 2.

Bugs: none known.

Authors: Shobhit Gupta (shobhitg@u.washington.edu), Timothy Bailey (tbailey@imb.uq.edu.au), Charles E. Grant (cegrant@gs.washington.edu) and William Noble (noble@gs.washington.edu).