fasta-subsample
Usage: fasta-subsample <fasta> <n> [options]
Description:
Create a random subset of the sequences in a FASTA formatted file. The random seed is fixed so the same subset will be ouput in every run of the program unless it is explicitly set.
Input:
Takes a FASTA file <fasta> and the count of sequences to randomly select <n>.
Options:
- -seed <random seed>
- seed the random number generator uses to select the sequences; default: 1
- -rest <file>
- name of the file to send the sequences not selected in the output; default: none
- -off <offset>
- the offset within each sequence to print; default: 1 (no offset)
- -len <len>
- the maxiumum length that printed sequences are constrained to; default: print entire sequence
Output:
Writes a FASTA formatted file to standard out containing the specified subsample of the original file. If -rest <file> is specifed then any left over sequences are written to <file>, which is useful for cross-validation.