MIAO -- not currently supported
Usage:
miao [options] <motifs> <database>
Description:
miao
(motifs in any order)
Searches a sequence database for clusters of known motifs.
As in beadstring
motifs are represented using a hidden Markov model,
but miao
uses the complete or star topologies.
This allows the motifs to appear in any order.
A full description of the algorithm is found in:
Input:
-
<motifs>
is a list of motifs, in MEME format. -
<database>
is a collection of sequences in FASTA format.
Output:
An XML file using the CisML schema.
Options:
--allow-weak-motifs
- In p-value score mode, weak motifs are defined as ones where the best possible hit has a p-value greater than the p-value threshold. Such motifs cannot contribute to a match in p-value score mode. By default, the program rejects any search results containing weak motifs, unless the--allow-weak-motifs
switch is given. In that case, the search will proceed, but the weak motifs will never appear in any matches. Note:This switch only applies to p-value score mode.--bgfile <bfile>
- Read background frequencies from<bfile>
. The file should be in MEME background file format. The default is to use frequencies embedded in the application from the non-redundant database. If the argument is the keywordmotif-file
, then the frequencies will be taken from the motif file.--eg-cost <cost>
- Scale the expected cost of a random gap to be<cost>
times the expected score of a random hit. By default, gap costs are essentially zero. The larger you set<cost>
, the more gaps will be penalized. This can only be used in conjunction with--max-gap
. This may not be used in conjunction with--min-score
.--e-thresh <ev>
- Only print results with E-values less than<ev>
.--fim
- Gaps between motifs are not penalized. Spacer states between motifs are represented as free-insertion modules (FIM). A FIM is an insert state with 1.0 probability of self-transition and 1.0 probability of exit transition. Thus, traversing such a state has zero transition cost. Specifying this option causes all spacers to be represented using FIMs.--gap-extend <cost>
- This switch causes all spacer self-loop log-odds scores to be set to<cost>
. In addition, it causes all other transitions out of a spacer to be set to zero. Together with the--gap-open
switch, this allows you to specify an affine gap penalty function, overriding the gap penalty implicit in the model (self-loop transition probabilities of gap states).--gap-open <cost>
- This switch causes all transitions into a spacer state to be assigned a log-odds score equal to<cost>
. Together with the--gap-extend
switch, this allows you to specify an affine gap penalty function, overriding the gap penalty implicit in the model (transition probabilities into and out of gap states).--keep-unused
- By default all inter-motif transitions that are not observed in the data are removed from the transition probability matrix. This option allows those transitions to be retained. This option is only relevant if the model has a completely connected topology.--max-gap <max-gap>
- The value of<max-gap>
specifies the longest distance allowed between two hits in a match. Hits separated by more than<max-gap>
will be placed in different matches. The default value is 50. Note: Large values of<max-gap>
combined with large values of pthresh may preventMCAST
from computing E-values.--max-seqs
- Print results for no more than<max>
<max>
sequences. By default, all matches are reported, up to the specified E-value threshold (see--e-thresh
).--min-score <minscore>
- This switch allows you to specify the threshold for the repeated match algorithm used bymiao
. Matches must have a score of at least<minscore>
to be detected. Matches containing internal regions with scores less than minus 'threshold' will be split and reported as two separate matches.--motif <id>
- Use only the motif identified by<id>
. This option may be repeated.--nspacer <value>
- By default each spacer is modeled using a single insert state. The distribution of spacer lengths produced by a single insert state is exponential in form. A more reasonable distribution would be a bell-shaped curve such as a Gaussian. Modeling the length distribution explicitly is computationally expensive; however, a Gaussian distribution can be approximated using multiple insert states to represent a single spacer region. The--nspacer
option specifies the number of insert states used to represent each spacer.--pam <distance>
- By default, target probabilities are derived from the distance-250 PAM matrix for proteins, and from a<distance>-1
transition/transversion matrix for DNA. With the-pam
switch, you can specify a different integer distance from 1 to 500. (This can be overridden with the--score-file
switch below). The<distance>-1
transition/transversion joint probability matrix for DNA is given below:A C G T A .990 .002 .006 .002 C .002 .990 .002 .006 G .006 .002 .990 .002 T .002 .006 .002 .990
--progress <value>
- Print to standard error a progress message approximately every<value>
seconds.--score-file <score file>
- Cause a score file (in BLAST format) to be read and used instead of the built-in PAM (for proteins) or transition/transversion (for DNA) score file. Several score files are provided (including BLOSUM62) in the directorydoc
. Other, user-provided score files may be specified as well, as long as they are in the proper format.--spacer-pseudo <value>
- Specify the value of the pseudocount used in converting transition counts to spacer self-loop probabilities.--synth
- Create synthetic sequences for estimating E-values. This is useful with small input databases where not enough match scores are found to estimate E-values. The--bgfile
option must also be set when using this option.--trans-pseudo <value>
- Specify the value of the pseudocount used in converting transition counts to transition probabilities.--type [complete|star]
- This option specifies the topology of the model. Thecomplete
topology includes transitions from the end of each motif to the beginning of every other motif in the model (with a spacer model along each transition). This allows for motifs that are repeated, deleted or shuffled. In thestar
topology the transitions from each motif lead to the intra-motif state. The default formiao
is thecomplete
topology.--verbosity 1|2|3|4
- Set the verbosity of status reports to standard error. The default level is 2.--zselo
Spacer emission log-odds scores to be set to zero. This prevents regions of unusual base/residue composition matching spacers well when the spacer emission frequencies are different than the background frequencies. It is particularly useful with DNA models.