Background model format
The format for n-order Markov background models is as follows.
The file must contain one line for each combination of 1, 2, ..., n-1 letters in the alphabet. The DNA alphabet is ACGT and protein the protein alphabet is ACDEFGHIKLMNPQRSTVWY.
Each line must contain the letter combination followed by the letter combination's frequency (probability). All other lines in the file are ignored, including comment lines starting with '#'.
For example, a 0-order Markov model <file> might contain:
# tuple frequency_non_coding a 0.324 c 0.176 g 0.176 t 0.324
A 1-order Markov model <file> might contain:
# tuple frequency_non_coding a 0.324 c 0.176 g 0.176 t 0.324 # tuple frequency_non_coding aa 0.119 ac 0.052 ag 0.056 at 0.097 ca 0.058 cc 0.033 cg 0.028 ct 0.056 ga 0.056 gc 0.035 gg 0.033 gt 0.052 ta 0.091 tc 0.056 tg 0.058 tt 0.119
NOTE:
You can create a background model file from any
FASTA sequence file using the
fasta-get-markov
command.