GenBank/RefSeq Data Processing Step

The data processing step extracts data from the downloaded GenBank files into a format that is ready for import into the database.

Algorithm

Directory structure

Genbank index file

A GenBank index file is a tab-seperated file in the format:
    acc version moddate organism
The name of the file is either mrna.gbidx or est.*.gbidx and is associated with the a *.ra or *.fa files of the same name. The columns are: