/hive/data/outside/genbank/
.
-keep
is specified.
gbBlat
no longer needs to be modified to specify
the ooc file. It is now specified in the genbank.conf
file.
genbank.conf
file.
ssh
is configure to
not require a passphase.
$db
refers to the database being aligned.
Substitute the actual database name (e.g. hg15
).
kent/src/hg/makeDb/genbank/
hg
of hg13
) and the organism names
used in GenBank needs to be defined. This is done by editing
genbank/src/lib/gbGenome.c
and rebuilding the programs. It maybe
necessary to define multiple organism name mappings. A list of
organism in GenBank/RefSeq, along with the count of cDNAs is in:/hive/data/outside/genbank/data/organism.lst
cd
to the top of the genbank source (kent/src/hg/makeDb/genbank/
)
make
to test if the source builds
make install-server
to update /hive/data/outside/genbank/
.
markd
to update the round-robin code.
kent/src/hg/makeDb/genbank/etc/genbank.conf
to
configure this databases. Must set:
$db.serverGenome
$db.clusterGenome
$db.lift
$db.align.maxGapChrs
and a lift file.
Commit your changes and then go to the top of the genbank source tree and update the installed genbank etc files with:
make etc-update
ssh fileServer
fileServer
is the NFS server with
/hive/data/outside/genbank/
.
cd /hive/data/outside/genbank
$gbRoot
.
nice bin/gbAlignStep -initial $db&
This will run the entire alignment process.
The -initial
option defaults several parameters for
and initial alignment and prevents this alignment from blocking
the automatic daily alignments.
Warning: gbAlignStep
and other GenBank do not currently accept options after
the positional arguments (i.e. the databases).
All output is saved in the log file.
If your organism has xeno ESTs enabled, it's a good idea to start out by aligning and loading just the the mRNAs, as this will go much faster. Two options control what is aligned:
-srcDb=name
- Restrict the source
database to either genbank
or refseq
.
-type=name
- Restrict the type of sequence
processeed to either mrna
or est
.
If anything fails, a subset of the tasks done by
gbAlignStep
script can be rerun after correcting
the problem. This is done using the
-continue=subtask
option with
subtask
is either
copy
- continue with coping to the iserver,
this skips extracting the sequences to align.
run
- Continue with parasol blat run.
finish
- finish, alignments, doing
lifting and filtering.
gbAlignStep
with
-continue=finish
. If parasol loses track of
the jobs, one can use the parasol recover
command to generate a new jobs file with the jobs
that have not completed.
nice bin/gbDbLoadStep -drop -initialLoad $db
-drop
option drops any existing GenBank or RefSeq
tables before loading.
-initialLoad
option
when loading the ESTs.
etc/align.dbs
.
etc/hgwdev.dbs
.
make update-etc
$gbRoot/etc/align.dbs
.
data/aligned/genbank.139.0/hg16/
data/aligned/refseq.139.0/hg16/
data/aligned/refseq.139.0/hg16/*/mrna.native.*
data/aligned/refseq.139.0/hg16/*/est.*.xeno.*
-srcDb
and -type
.
-srcDb
and -type
options restrict
the subset. The organism category (native or xeno) isn't
specified. Reloading of ESTs isn't supported, use -drop
and -initialLoad
instead.
nice bin/gbDbLoadStep -reload -srcDb=genbank -type=mrna $db