beadstring -- not currently supported

Usage:

beadstring [options] <motifs> <database>

Description:

Beadstring builds a linear hidden Markov model (HMM) from the motifs and motif occurences listed in the motif file, and uses that HMM to search a sequence database for a particular ordered series of motifs. A description of the algorithm is found in:

Grundy, Bailey, Elkan and Baker. "Meta-MEME: Motif-based Hidden Markov Models of Protein Families". Computer Applications in the Biosciences. 13(4):397-406, 1997.

By default, the order and spacing of motifs in the model is determined from the "Summary of Motifs" section of the MEME input file. Beadstring searches the summary for the sequence that contains the maximal number of distinct motif occurrences. If there is a tie, then beadstring selects the sequence with the smallest combined p-value. Beadstring then eliminates all but the most significant occurrence of each motif and uses the resulting order and spacing of motif occurrences to initialize the HMM. This procedure can be overridden by selecting the --motif, --motif-e-thresh, --motif-p-thresh or --order options.

The command line option --p-score activates an alternative scoring mode, called "p-value scoring." This scoring method is described in

Bailey and Noble. "Searching for statistically significant regulatory modules." Bioinformatics 19(Suppl 2):ii16-ii25, 2003.

Input:

<motifs> is a list of motifs in MEME format.
<database> is a collection of sequences in FASTA format.

Output:

Beadstring will create a directory, named beadstring_out by default. Any existing output files in the directory will be overwritten. The directory will contain:

An XML file named beadstring.xml using the CisML schema.
An XML file named model.xml using the MEME_HMM schema.
An HTML file named beadstring.html
A plain text file named beadstring.text

The default output directory can be overridden using the --o or --oc options which are described below.

Options:

Options related to input and output:

--bgfile <bfile> - Read background frequencies from <bfile>. The file should be in MEME background file format. The default is to use frequencies embedded in the application from the non-redundant database. If the argument is the keyword motif-file, then the frequencies will be taken from the motif file.
--e-thresh <ev> - Only print results with E-values less than <ev>. Default is 0.01.
--max-seqs <max> - Print results for no more than <max> sequences. By default, all matches are reported, up to the specified E-value threshold (see --e-thresh).
--model-file <model file> - Creation of the HMM will be skipped, and the HMM will be read from the file instead.
--no-search - This option turns off the search phase of beadstring. The HMM will be stored if the --model option is specified.
--o <dir name> - Specifies the output directory. If the directory already exists, the contents will not be overwritten.
--oc <dir name> - Specifies the output directory. If the directory already exists, the contents will be overwritten.
--progress <value> - Print to standard error a progress message approximately every <value> seconds.
--score-file <score file> - Cause a score file (in BLAST format) to be read and used instead of the built-in PAM (for proteins) or transition/transversion (for DNA) score file. Several score files are provided (including BLOSUM62) in the directory doc. Other, user-provided score files may be specified as well, as long as they are in the proper format.
--verbosity 1|2|3|4 - Set the verbosity of status reports to standard error. The default level is 2.

Options related to selecting motifs for the model:

--motif <id> - Use only the motif identified by <id>. This option may be repeated.
--motif-e-thresh <ev> - Only motifs with E-values less than <ev> will be used to build the HMM.
--motif-p-thresh <pv> - Only motif occurences with p-values less than <pv> will be used to build the HMM.
--order <string> - The given string specifies the order and spacing of the motifs within the model, and has the format "l=n=l=n=...=l=n=l", where "l" is the length of a region between motifs, and "n" is a motif index. Thus, for example, the string "34=3=17=2=5" specifies a two-motif linear model, with motifs 3 and 2 separated by 17 letters and flanked by 34 letters and 5 letters on the left and right. If the motif file contains motif occurrences on both strands, then the motif IDs in the order string should be preceded by "+" or "-" indicating the strandedness of the motif.

Options related to building the model:

--fim - Gaps between motifs are not penalized. Spacer states between motifs are represented as free-insertion modules (FIM). A FIM is an insert state with 1.0 probability of self-transition and 1.0 probability of exit transition. Thus, traversing such a state has zero transition cost. Specifying this option causes all spacers to be represented using FIMs.
--gap-extend <cost> - This switch causes all spacer self-loop log-odds scores to be set to <cost>. In addition, it causes all other transitions out of a spacer to be set to zero. Together with the --gap-open switch, this allows you to specify an affine gap penalty function, overriding the gap penalty implicit in the model (self-loop transition probabilities of gap states).
--gap-open <cost> - This switch causes all transitions into a spacer state to be assigned a log-odds score equal to <cost>. Together with the --gap-extend switch, this allows you to specify an affine gap penalty function, overriding the gap penalty implicit in the model (transition probabilities into and out of gap states).
--motif-pseudo <float> - A pseudocount to be added to each count in the motif matrix, after first multiplying by the corresponding background frequency (default=0.1). Default value is 0.0.
--nspacer <value> - By default each spacer is modeled using a single insert state. The distribution of spacer lengths produced by a single insert state is exponential in form. A more reasonable distribution would be a bell-shaped curve such as a Gaussian. Modeling the length distribution explicitly is computationally expensive; however, a Gaussian distribution can be approximated using multiple insert states to represent a single spacer region. The --nspacer option specifies the number of insert states used to represent each spacer.
--spacer-pseudo <value> - Specify the value of the pseudocount used in converting transition counts to spacer self-loop probabilities. Default value is 0.0.
--trans-pseudo <value> - Specify the value of the pseudocount used in converting transition counts to transition probabilities. Default value is 0.1.
--zselo Spacer emission log-odds scores to be set to zero. This prevents regions of unusual base/residue composition matching spacers well when the spacer emission frequencies are different than the background frequencies. It is particularly useful with DNA models.

Options related to scoring:

--allow-weak-motifs - In p-value score mode, weak motifs are defined as ones where the best possible hit has a p-value greater than the p-value threshold. Such motifs cannot contribute to a match in p-value score mode. By default, the program rejects any search results containing weak motifs, unless the --allow-weak-motifs switch is given. In that case, the search will proceed, but the weak motifs will never appear in any matches. Note:This switch only applies to p-value score mode.
--global - Scores are computed for the match between the entire sequence and the model (the default is to use the maximal local score).
--pam <distance> - By default, target probabilities are derived from the distance-250 PAM matrix for proteins, and from a <distance>-1 transition/transversion matrix for DNA. With the -pam switch, you can specify a different integer distance from 1 to 500. (This can be overridden with the --score-file switch below). The <distance>-1 transition/transversion joint probability matrix for DNA is given below:
```
           A    C    G    T    
      A  .990 .002 .006 .002
      C  .002 .990 .002 .006
      G  .006 .002 .990 .002
      T  .002 .006 .002 .990
    
```
--paths single|all - This option determines how the program computes raw scores. With the single option, the program computes the Viterbi score, which is the log-odds score associated with the single most likely match between the sequence and the model. The all option yields the total log-odds score, which is the sum of the log-odds of all sequence-to-model matches. The default is Viterbi scoring.
--p-score <float> - The --p-score switch activates p-value score mode with the given threshold. (The default score mode is called "log-odds score mode".) In p-value score mode, motif match scores are converted to their p-values. They are then converted to bit scores as follows:
S = -log₂(p/T)
where S is the bit score of the hit, p is the p-value of the log-odds score, and T is the p-value threshold. In this way, only hits more significant than the p-value threshold get positive scores. The p-value threshold, T, must be in the range 0<T<=1.