TOMTOM logo

Usage: tomtom [options] <query motifs> <target motif database>+

Description:

The Tomtom program searches one or more query motifs against one or more databases of target motifs (and their DNA reverse complements), and reports for each query a list of target motifs, ranked by p-value. The E-value and the q-value of each match is also reported. The q-value is the minimal false discovery rate at which the observed similarity would be deemed significant. The output contains results for each query, in the order that the queries appear in the input file.

For a given pair of motifs, the program considers all offsets, while requiring a minimum number of overlapping positions. For a given offset, each overlapping position is scored using one of seven column similarity functions defined below. Columns in the query motif that don't overlap the target motif are assigned a score equal to the median score of the set of random matches to that column. In order to compute the scores, Tomtom needs to know the frequencies of the letters of the sequence alphabet in the database being searched (the "background" letter frequencies). By default, the background letter frequencies included in the MEME input files are used. The scores of columns that overlap for a given offset are summed. This summed score is then converted to a p-value. The reported p-value is the minimal p-value over all possible offsets. To compensate for multiple testing, each reported p-value is converted to an E-value by multiplying it by twice the number of target motifs. As a second type of multiple-testing correction, q-values for each match are computed from the set of p-values and reported.

Input:

Output:

Tomtom writes its output to files in a directory named tomtom_out, which it creates if necessary. (You can also cause the output to be written to a different directory; see -o and -oc, below.) The main output file is named tomtom.html and can be viewed with an internet browser. The tomtom.html file is created from the tomtom.xml file. An additional file, tomtom.txt, contains a simplified, text-only version of the output. (See -text, below, for the text output format.) For each query-target match, two additional files containing LOGO alignments are also written--an encapsulated postscript file (.eps) and a PNG file (.png). If the convert program is not available, no PNG files will be written.

Only matches for which the significance is less than or equal to the threshold set by the -thresh switch (default of 0.5) will be shown. By default, significance is measured by q-value of the match. The q-value is the estimated false discovery rate if the occurrence is accepted as significant. See Storey JD, Tibshirani R, "Statistical significance for genome-wide studies". Proc. Natl Acad. Sci. USA (2003) 100:9440–9445

Options:

Bugs: none known.

Authors: Shobhit Gupta (shobhitg@u.washington.edu), Timothy Bailey (tbailey@imb.uq.edu.au), Charles E. Grant (cegrant@gs.washington.edu) and William Noble (noble@gs.washington.edu).