Genbank configuration file

The file $gbRoot/etc/genbank.conf contains options control the alignment and loading of GenBank and RefSeq data. The file is in the same format as .hg.conf:

blank lines or lines where the first non-blank character is a "#" are ignored.
records are lines of:
name=value
By convention, names consist of dot separated words used to define a hierarchy. Yes, this is ugly and verbose.

Variables

Variables are available in the configuration file. A variable is defined using the syntax:

var.varname=value

Variables maybe referenced in any value using the syntax:

${varname}

A variable must be defined before it is referenced in the file, and are expanded immediately. A variable definition may reference another variable.

The program gbConf can be used to print the configuration file with variables expanded for debugging purposes.

Global configuration parameters

cluster.rootDir - root directory on cluster filesystems.
cluster.paraHub - Host where parasol hub is running.
grepIndex.genbank.default - Create grepIndex files for databases in the specified directory. This value should correspond to the value grepIndex.genbank in the browser hg.conf file. The default in this name leaves open the possibility of per-server overrides of this value, however this is not currently implemented.
gbdb.genbank - Location of genbank gbdb directory. Defaults to /gbdb/genbank.
build.server - Build server host name. If set, build scripts will generate an error if not run on this server.

per-database configuration parameters

Per-database configuration parameters start with the database name, represented here by $db. Default values are set with a database name of default

$db.align.window - Size of genome alignment windows, in bases. The genome is partitioned in segments of no more than this sizes for alignment.
$db.align.overlap - Number of bases of overlap in the alignment windows.
$db.align.maxGap - Gaps no larger than this value are contained within a window rather then starting a new window. This allows gaps within introns.
$db.align.maxJobTarget - Approximate limit on the number of target window bases in a job. This trades of parallelism for the cost of create lots of small jobs and their associated files.
$db.align.unplacedChroms - White-space separated list of pseudo-chromosome that contain unplaced sequences. Alignments will not be allowed to span gaps on these chromsomes. Requires a lift file to specify the gaps. The names can also be patterns similar to file name glob patterns that are matched against chromosome names. The entire pattern must be matched. The meta characters are * and ?. For example *_random.
$db.align.minUnplacedSize - Skip unplaced sequences smaller than this size.
$db.align.querySplitSize - Max size of query FASTA files to create, in megabytes.
$db.serverGenome - glob pattern or file for genome sequences on server. This can be NIBs or a TwoBit file.
$db.clusterGenome - glob pattern or file for genome sequences on cluster. This can be NIBs or a TwoBit file. The resulting list of files must match was what is on the server.
$db.ooc - path to blat ooc file on cluster, or no if no blat ooc is to be used.
$db.maxIntron - blat -maxIntron value.
$db.lift - path to lift file for genome, or no if none. Lift file is used to find gaps when partitioning the sequence for alignment. It does not have to contain all chromosomes. Sequences named gap are treated as gaps as well as gaps in contig placement. A lift file generate by gapToLift is ideal.
$db.hapRegions - path to a PSL that contains alignments of haplotype pseudo-chromosomes to the reference chromosomes or no if none. This is used to map alignments between haplotype regions for the near-best in genome alignment mappings. This is not access from cluster jobs.
$db.$srcDb.$cdnaType.$orgCat.pslCDnaFilter - Arguments for pslCDnaFilter for various types of alignments or no to skip pslCDnaFilter. Special handling is done for the the -polyASizes option. This option should be supplied without an argument. The location of the generated file will be added when pslCDnaFilter is called.
The following values are recongized in the name:
- $srcDb - genbank, refseq
- $cdnaType - mrna, est
- $orgCat - native, xeno
$db.mgcTables.$host - indicates which MGC tables to create when loading database on $host. Values are no, all, or full. $host is the value returned by uname -n.
$db.mgcTables.default - indicates which MGC tables to create when there is no host-specific setting.
$db.upstreamGeneTbl - If specified, use this table to create upstream FASTA files. It is an error if the table doesn't exist. If not specified, don't create upstream FASTAs.
$db.upstreamMaf - If specified, create update MAFs using these MAF tables and organism lists, and the genes in $db.upstreamGeneTbl. The value should be white-space separated pairs of table name and fully qualified path to file with order list of organisms to include in the upstream MAF.
$db.ccds.ncbiBuild - includes NCBI build number (e.g. 36.3) for CCDS table building. Specifying this enabled auto-update of ccdsGene and related tables.
$db.mgc - Should MGC tables be loaded. Values are yes, or no.
$db.orfeome - Should ORFeome tables be loaded. Values are yes, or no.
$db.perChromTables - Set to no if per-chromosome alignment tables should not be created.
$db.$srcDb.$cdnaType.$orgCat.load - Should cDNAs be aligned and loaded into the database for the specified category. Value is yes or no.
$db.$srcDb.$cdnaType.$orgCat.align - Should cDNAs be aligned for the specified category. Value is yes or no. This is used if it's desirable to align a particular category but not create a track of it.
$db.$srcDb.$cdnaType.$orgCat.loadDesc Should the description table be loaded? Value is yes or no. Descriptions are normally not loaded for ESTs as they are large and not very useful
$db.downloadDir - directory relative to goldenPath/ where download files are stored.
$db.align.prefilter - Prefilter alignments in cluster jobs. If set to yes or not specified do prefiltering. If set to no, no prefiltering is done. This is often useful for debugging.
$db.$srcDb.mrna.blatTargetDb - Should BLAT targetDb two-bit files be created for database for the specified srcDb. Value is yes or no.