DNA or protein sequences are accepted. The sequences must all be of the same type, either protein or DNA.
  • Protein sequences should use the standard IUPAC alphabet: ACDEFGHIKLMNPQRSTVWY.
    They may also contain the ambiguous letters "BUXZ", which are converted to "X" and treated as "unknown".
  • DNA sequences should use the standard DNA alphabet: ACGT.
    They may also contain the ambiguous letters "BDHKMNRSUVWY", which will be converted to "X" and treated as "unknown".
Note: If none of the sequences in your dataset contain any of the letters "EFILPQXZ", it will be assumed that your sequences are DNA. You can force them to be interpreted as protein sequences by adding an "X" to the end (or beginning) of one of the sequences in your dataset.