fasta-subsample

Usage: fasta-subsample <fasta> <n> [options]

Description:

Create a random subset of the sequences in a FASTA formatted file. The random seed is fixed so the same subset will be ouput in every run of the program unless it is explicitly set.

Input:

Takes a FASTA file <fasta> and the count of sequences to randomly select <n>.

Options:

-seed <random seed>
seed the random number generator uses to select the sequences; default: 1
-rest <file>
name of the file to send the sequences not selected in the output; default: none
-off <offset>
the offset within each sequence to print; default: 1 (no offset)
-len <len>
the maxiumum length that printed sequences are constrained to; default: print entire sequence

Output:

Writes a FASTA formatted file to standard out containing the specified subsample of the original file. If -rest <file> is specifed then any left over sequences are written to <file>, which is useful for cross-validation.