mhmms -- not currently supported

Usage: mhmms [options] <HMM file> <FASTA file>

Description:

mhmms searches a sequence database using a Meta-MEME motif-based hidden Markov model (HMM) of the kind produced by mhmm. Each sequence in the database is assigned an E-value, and the IDs and scores of sequences scoring below a given threshold are printed in sorted order.

The E-value of a given sequence is the expected number of sequences which match the given model as well or better than this sequence that you would expect to see by chance in a random database of the same size as the given database. Scores are assigned using a local search algorithm; in other words, the algorithm finds the subsequence that matches a subset of model states with highest log-odds.

The emission probabilities in the model are converted to log-odds scores before performing the local search. This is done by combining pseudocount probabilities derived from a score matrix (see the '--pam' and '--score-file' options below) with the emission frequencies. You can control the relative weight placed on the emission probabilities versus the pseudocount probabilities (see '--pseudo-weight' below). The adjusted emission probabilities are then converted to odds by dividing by background probabilities (see '--bg-file' below). Finally, they are converted to log-odds scores by taking their logarithm.

Transition probabilities are converted to log-odds scores by taking their logarithms before searching. This can be overridden and the gap scores can be set explicitly using the '--gap-open' and '--gap-extend' switches, below. This allows you to specify a single affine gap cost function for all spacers in the model.

Input:

Output:

The mhmms output has up to three sections containing your search results:

The second two sections will not be present unless the -fancy option was specified.

The results in all three sections are sorted by increasing E-value if possible, or by decreasing alignment score if E-values could not be computed.

DATABASE SEARCH RESULTS

The "Database Search Results" section consists of lines of the following form:

<ID> <E-value> <Score> <Start> <End> <Length> <Description>

These fields contain, for each alignment,

ALIGNMENTS

Each alignment lists the sequence identifier, alignment E-value and log-odds score along the left. On the right, it shows the alignment of the model with the sequence in groups of three segments.

MOTIF DIAGRAMS

The motif diagrams section shows the alignments in schematic format. For each alignment, in the right two columns, it shows the sequence identifier and the alignment E-value. On the left, it shows the positions and spacings of the motifs in the alignment. Hits are labeled with numbers corresponding to the order the motifs were given in the query. A plus or minus sign preceding a motif indicates that the motif occurs on the given (+) or reverse complement (-) of the DNA sequence in the database.

LOG-ODDS SCORES

The log-odds scores for each motif column are created using prior information on the letters appearing in alignment columns. The prior information is the target frequencies [Karlin,S. and Altschul,S.F., PNAS USA , 87, 2264-2268] implicit in a scoring matrix. Meta-MEME can read a user-specified scoring matrix (in the same format as used by the BLAST family of programs) from a file or generate a PAM matrix. By default, PAM 250 is used for proteins, and PAM 1 is used for DNA. For DNA, the "PAM 1" frequency matrix is

              .990 .002 .006 .002
              .002 .990 .002 .006 
              .060 .002 .990 .020
              .020 .060 .002 .990
      

Meta-MEME calculates the target frequencies qij = pipj exp(L sij) from the scoring matrix sij and the background letter frequencies pi by finding the value of L that makes the qij sum to one. These target frequencies are then used to create pseudo-frequencies to be added to the emission frequencies of the column, following the approach of [Henikoff,S. and Henikoff,J.G., JMB, 243, 574-578]. The pseudo-frequency for the ith letter is computed as: gi = sum j in alphabet (fj qij/pj).

The pseudo-frequencies, gi, are then combined with the emission frequencies, fi to give frequency estimates

Qi = (alpha fi + beta gi) / (alpha + beta).

In general, alpha should be proportional to the amount of independent information in the emission frequencies. We have set it to the constant 20. The parameter beta is arbitrary and controls the relative importance of prior information. We set it to the constant 10.

Our method is essentially that used in PSI-BLAST [Altschul,S.F et al., NAR, 25:17, 3389-3402] without

  1. sequence weighting, and
  2. scaling for amount of independent information (alpha).

To do 1) and 2) correctly would require having and using alignment information rather than emission frequencies as the starting point.

Options:

Advanced Options:

The following option is automatically invoked when you specify --p-thresh. You can also set it when you do not want p-value score but want to prevent partial matches to motifs.

The following options can be used in both p-value and log-odds score modes to control how the emission probabilities in the HMM are converted into log-odds scores.

Bugs: None known.

Author: William Stafford Noble.