beadstring -- not currently supported
Usage:
beadstring [options] <motifs> <database>
Description:
Beadstring
builds a linear hidden Markov model (HMM) from
the motifs and motif occurences listed in the motif file, and uses
that HMM to search a sequence database for a particular ordered series
of motifs. A description of the algorithm is found in:
By default, the order and spacing of motifs in the model is determined
from the "Summary of Motifs" section of the MEME input file.
Beadstring
searches the summary for the sequence that
contains the maximal number of distinct motif occurrences. If there
is a tie, then beadstring
selects the sequence with the
smallest combined p-value. Beadstring
then eliminates
all but the most significant occurrence of each motif and uses the
resulting order and spacing of motif occurrences to initialize the
HMM. This procedure can be overridden by selecting the
--motif
, --motif-e-thresh
,
--motif-p-thresh
or --order
options.
The command line option --p-score
activates an
alternative scoring mode, called "p-value scoring." This scoring
method is described in
Input:
-
<motifs>
is a list of motifs in MEME format. -
<database>
is a collection of sequences in FASTA format.
Output:
Beadstring will create a directory, named beadstring_out
by default.
Any existing output files in the directory will be overwritten.
The directory will contain:
-
An XML file named
beadstring.xml
using the CisML schema. -
An XML file named
model.xml
using the MEME_HMM schema. -
An HTML file named
beadstring.html
-
A plain text file named
beadstring.text
The default output directory can be overridden using the --o
or --oc
options which are described below.
Options:
Options related to input and output:
--bgfile <bfile>
- Read background frequencies from<bfile>
. The file should be in MEME background file format. The default is to use frequencies embedded in the application from the non-redundant database. If the argument is the keywordmotif-file
, then the frequencies will be taken from the motif file.--e-thresh <ev>
- Only print results with E-values less than<ev>
. Default is 0.01.--max-seqs
- Print results for no more than<max>
<max>
sequences. By default, all matches are reported, up to the specified E-value threshold (see--e-thresh
).--model-file <model file>
- Creation of the HMM will be skipped, and the HMM will be read from the file instead.--no-search
- This option turns off the search phase ofbeadstring
. The HMM will be stored if the--model
option is specified.--o <dir name>
- Specifies the output directory. If the directory already exists, the contents will not be overwritten.--oc <dir name>
- Specifies the output directory. If the directory already exists, the contents will be overwritten.--progress <value>
- Print to standard error a progress message approximately every<value>
seconds.--score-file <score file>
- Cause a score file (in BLAST format) to be read and used instead of the built-in PAM (for proteins) or transition/transversion (for DNA) score file. Several score files are provided (including BLOSUM62) in the directorydoc
. Other, user-provided score files may be specified as well, as long as they are in the proper format.--verbosity 1|2|3|4
- Set the verbosity of status reports to standard error. The default level is 2.
Options related to selecting motifs for the model:
--motif <id>
- Use only the motif identified by<id>
. This option may be repeated.--motif-e-thresh <ev>
- Only motifs with E-values less than<ev>
will be used to build the HMM.--motif-p-thresh <pv>
- Only motif occurences with p-values less than<pv>
will be used to build the HMM.--order <string>
- The given string specifies the order and spacing of the motifs within the model, and has the format "l=n=l=n=...=l=n=l", where "l" is the length of a region between motifs, and "n" is a motif index. Thus, for example, the string "34=3=17=2=5" specifies a two-motif linear model, with motifs 3 and 2 separated by 17 letters and flanked by 34 letters and 5 letters on the left and right. If the motif file contains motif occurrences on both strands, then the motif IDs in the order string should be preceded by "+" or "-" indicating the strandedness of the motif.
Options related to building the model:
--fim
- Gaps between motifs are not penalized. Spacer states between motifs are represented as free-insertion modules (FIM). A FIM is an insert state with 1.0 probability of self-transition and 1.0 probability of exit transition. Thus, traversing such a state has zero transition cost. Specifying this option causes all spacers to be represented using FIMs.--gap-extend <cost>
- This switch causes all spacer self-loop log-odds scores to be set to<cost>
. In addition, it causes all other transitions out of a spacer to be set to zero. Together with the--gap-open
switch, this allows you to specify an affine gap penalty function, overriding the gap penalty implicit in the model (self-loop transition probabilities of gap states).--gap-open <cost>
- This switch causes all transitions into a spacer state to be assigned a log-odds score equal to<cost>
. Together with the--gap-extend
switch, this allows you to specify an affine gap penalty function, overriding the gap penalty implicit in the model (transition probabilities into and out of gap states).--motif-pseudo <float>
- A pseudocount to be added to each count in the motif matrix, after first multiplying by the corresponding background frequency (default=0.1). Default value is 0.0.--nspacer <value>
- By default each spacer is modeled using a single insert state. The distribution of spacer lengths produced by a single insert state is exponential in form. A more reasonable distribution would be a bell-shaped curve such as a Gaussian. Modeling the length distribution explicitly is computationally expensive; however, a Gaussian distribution can be approximated using multiple insert states to represent a single spacer region. The--nspacer
option specifies the number of insert states used to represent each spacer.--spacer-pseudo <value>
- Specify the value of the pseudocount used in converting transition counts to spacer self-loop probabilities. Default value is 0.0.--trans-pseudo <value>
- Specify the value of the pseudocount used in converting transition counts to transition probabilities. Default value is 0.1.--zselo
Spacer emission log-odds scores to be set to zero. This prevents regions of unusual base/residue composition matching spacers well when the spacer emission frequencies are different than the background frequencies. It is particularly useful with DNA models.
Options related to scoring:
--allow-weak-motifs
- In p-value score mode, weak motifs are defined as ones where the best possible hit has a p-value greater than the p-value threshold. Such motifs cannot contribute to a match in p-value score mode. By default, the program rejects any search results containing weak motifs, unless the--allow-weak-motifs
switch is given. In that case, the search will proceed, but the weak motifs will never appear in any matches. Note:This switch only applies to p-value score mode.--global
- Scores are computed for the match between the entire sequence and the model (the default is to use the maximal local score).--pam <distance>
- By default, target probabilities are derived from the distance-250 PAM matrix for proteins, and from a<distance>-1
transition/transversion matrix for DNA. With the-pam
switch, you can specify a different integer distance from 1 to 500. (This can be overridden with the--score-file
switch below). The<distance>-1
transition/transversion joint probability matrix for DNA is given below:A C G T A .990 .002 .006 .002 C .002 .990 .002 .006 G .006 .002 .990 .002 T .002 .006 .002 .990
--paths single|all
- This option determines how the program computes raw scores. With thesingle
option, the program computes the Viterbi score, which is the log-odds score associated with the single most likely match between the sequence and the model. Theall
option yields the total log-odds score, which is the sum of the log-odds of all sequence-to-model matches. The default is Viterbi scoring.--p-score <float>
- The--p-score
switch activates p-value score mode with the given threshold. (The default score mode is called "log-odds score mode".) In p-value score mode, motif match scores are converted to their p-values. They are then converted to bit scores as follows:S = -log2(p/T)where S is the bit score of the hit, p is the p-value of the log-odds score, and T is the p-value threshold. In this way, only hits more significant than the p-value threshold get positive scores. The p-value threshold, T, must be in the range 0<T<=1.