$gbRoot/etc/genbank.conf
contains options control the alignment and loading of
GenBank and RefSeq data. The file is in the same format as
.hg.conf
:
name=value
var.varname=value
Variables maybe referenced in any value using the syntax:
${varname}
A variable must be defined before it is referenced in the file, and are expanded immediately.
A variable definition may reference another variable.
The program gbConf
can be used to print the configuration file with variables
expanded for debugging purposes.
cluster.rootDir
- root directory on cluster filesystems.
cluster.paraHub
- Host where parasol hub is running.
grepIndex.genbank.default
- Create grepIndex files
for databases in the specified directory. This value should
correspond to the value grepIndex.genbank
in the
browser hg.conf
file. The default
in this name
leaves open the possibility of per-server overrides of this
value, however this is not currently implemented.
gbdb.genbank
- Location of genbank gbdb
directory. Defaults to /gbdb/genbank
.
build.server
- Build server host name. If set,
build scripts will generate an error if not run on this server.
$db
. Default values are set with a
database name of default
$db.align.window
- Size of genome alignment windows, in bases.
The genome is partitioned in segments of no more than this sizes for alignment.
$db.align.overlap
- Number of bases of overlap in the alignment
windows.
$db.align.maxGap
- Gaps no larger than this
value are contained within a window rather then starting a
new window. This allows gaps within introns.
$db.align.maxJobTarget
- Approximate limit
on the number of target window bases in a job. This trades
of parallelism for the cost of create lots of small jobs
and their associated files.
$db.align.unplacedChroms
-
White-space separated list of pseudo-chromosome that contain
unplaced sequences. Alignments will not be allowed to span
gaps on these chromsomes. Requires a lift file to specify the
gaps. The names can also be patterns similar to file name
glob patterns that are matched against chromosome names. The
entire pattern must be matched. The meta characters are *
and ?
. For example *_random
.
$db.align.minUnplacedSize
- Skip
unplaced sequences smaller than this size.
$db.align.querySplitSize
- Max size of query
FASTA files to create, in megabytes.
$db.serverGenome
- glob pattern or file for genome
sequences on server. This can be NIBs or a TwoBit file.
$db.clusterGenome
- glob pattern or file for genome
sequences on cluster. This can be NIBs or a TwoBit file. The
resulting list of files must match was what is on the server.
$db.ooc
- path to blat ooc file on cluster, or
no
if no blat ooc is to be used.
$db.maxIntron
- blat -maxIntron value.
$db.lift
- path to lift file for genome,
or no
if none. Lift file is used to find gaps when partitioning the
sequence for alignment. It does not have to contain all chromosomes.
Sequences named gap
are treated as gaps as well as gaps in
contig placement. A lift file generate by gapToLift
is ideal.
$db.hapRegions
- path to a PSL that contains
alignments of haplotype pseudo-chromosomes to the reference chromosomes
or no
if none. This is used to map
alignments between haplotype regions for the near-best in genome alignment
mappings. This is not access from cluster jobs.
$db.$srcDb.$cdnaType.$orgCat.pslCDnaFilter
- Arguments
for pslCDnaFilter
for various types of alignments
or no
to skip pslCDnaFilter.
Special handling is done for the the -polyASizes
option.
This option should be supplied without an argument. The location
of the generated file will be added when pslCDnaFilter
is called.
The following values are recongized in the name:
$srcDb
- genbank
, refseq
$cdnaType
- mrna
, est
$orgCat
- native
, xeno
$db.mgcTables.$host
- indicates which
MGC tables to create when loading database on $host
.
Values are no
, all
, or full
.
$host
is the value returned by uname -n
.
$db.mgcTables.default
- indicates which
MGC tables to create when there is no host-specific setting.
$db.upstreamGeneTbl
- If specified, use this
table to create upstream FASTA files. It is an error if the table
doesn't exist. If not specified, don't create upstream
FASTAs.
$db.upstreamMaf
- If specified, create update
MAFs using these MAF tables and organism lists, and the genes
in $db.upstreamGeneTbl
. The value should be
white-space separated pairs of table name and fully qualified path
to file with order list of organisms to include in the upstream MAF.
$db.ccds.ncbiBuild
- includes NCBI build number
(e.g. 36.3) for CCDS table building. Specifying this enabled auto-update
of ccdsGene and related tables.
$db.mgc
- Should MGC tables be loaded.
Values are yes
, or no
.
$db.orfeome
- Should ORFeome tables be loaded.
Values are yes
, or no
.
$db.perChromTables
- Set to no
if per-chromosome alignment tables should not be created.
$db.$srcDb.$cdnaType.$orgCat.load
-
Should cDNAs be aligned and loaded into the database for the
specified category. Value is yes
or no
.
$db.$srcDb.$cdnaType.$orgCat.align
-
Should cDNAs be aligned for the
specified category. Value is yes
or no
. This is used if it's
desirable to align a particular category but not create a track of it.
$db.$srcDb.$cdnaType.$orgCat.loadDesc
Should the description table be loaded? Value is yes
or no
.
Descriptions are normally not loaded for ESTs as they are
large and not very useful
$db.downloadDir
- directory relative to goldenPath/
where download files are stored.
$db.align.prefilter
- Prefilter alignments in cluster jobs.
If set to yes
or not specified do prefiltering. If set to no
,
no prefiltering is done. This is often useful for debugging.
$db.$srcDb.mrna.blatTargetDb
-
Should BLAT targetDb two-bit files be created for database for the
specified srcDb. Value is yes
or no
.