GenBank/RefSeq Download Step

The download step retrieves files from the NCBI ftp area and store the results in the download/ directory.

Algorithm

Note that this process isn't mirroring; it doesn't overwrite existing files. This minimizes the danger of leaving the data in an indeterminate state. Only the required subset of files are downloaded.

Directory structure

The directory structure for GenBank and RefSeq are a subset of the directories at the NCBI ftp site. Release version numbers are added to the database directory names to allow keeping multiple versions. Now that RefSeq is doing versioned releases, it is handled in the same manner as GenBank on an independent releases cycle.