2012-02-23: import and UCSC GENCODE group process of GENCODE V11 (markd) # Due to UCSC Genome Browser using the NC_001807 mitochondrial genome sequence # (chrM) and GENCODE annotating the NC_012920 mitochondrial sequence, the # GENCODE mitochondrial sequences are not loaded # download files mkdir -p /hive/groups/encode/dcc/data/gencodeV11/release cd /hive/groups/encode/dcc/data/gencodeV11/release wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.2wayconspseudos.gtf.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.annotation.gtf.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.long_noncoding_RNAs.gtf.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.pc_transcripts.fa.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.pc_translations.fa.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.polyAs.gtf.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.tRNAs.gtf.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode11_GRCh37.tgz # silly sanity check: for f in * ; do zcat $f >/dev/null ; done # untar main distribution tar -zxf gencode11_GRCh37.tgz # obtain transcription support level analysis from UCSC GENCODE group (markd/rachel) mkdir -p data cp /cluster/home/markd/compbio/ccds/branches/transSupV11.1/modules/gencodeTransSupport/exprs/classDev/runs/2012-02-26/results/gencode.v11.transcriptionSupportLevel.{tab,tsv} data/ # create Makefile from previous one cd /hive/groups/encode/dcc/data/gencodeV11/ cp ../gencodeV10/Makefile . # edit as needed # on code in the CCDS subversion tree: # svn+ssh://hgwdev.cse.ucsc.edu/projects/compbio/svnroot/hausslerlab/ccds/trunk # and markd's python library (it will be moved to the hausslerlab # repository soon) # may need to update ccds2/modules/gencode/src/lib/gencode/data/gencodeGenes.py # to add new biotypes, use this command to verify and update as needed make checkAttrs (time nice make -j 10 mkTables) >&build.out& # took ~45 minutes, log below ## copy and update trackDb files from previous release kent/src/hg/makeDb/trackDb cp human/hg19/wgEncodeGencodeV10.ra human/hg19/wgEncodeGencodeV11.ra cp human/hg19/wgEncodeGencodeV10.html human/hg19/wgEncodeGencodeV11.html ### IMPORTANT: make sure that hgTracks/simpleTracks.c registers ### track handler for this version of gencode: registerTrackHandlerOnFamily("wgEncodeGencodeV11", gencodeGeneMethods); ------------------------------------------------------------------------------ 2012-06-05: An error was discovered where both gene names and transcript names (not ids) a version number appended. eg. RP11-513O17.2.1 and RP11-513O17.2-006. In total 19958 gene names and 27058 transcript names have been corrected. Repeat above build in a new directory mkdir -p /hive/groups/encode/dcc/data/gencodeV11b/release cd /hive/groups/encode/dcc/data/gencodeV11b/release wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.2wayconspseudos.gtf.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.annotation.gtf.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.long_noncoding_RNAs.gtf.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.pc_transcripts.fa.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.pc_translations.fa.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.polyAs.gtf.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode.v11.tRNAs.gtf.gz wget -nv ftp://ftp.sanger.ac.uk/pub/gencode/release_11/gencode11_GRCh37.tgz # silly sanity check: for f in * ; do zcat $f >/dev/null ; done # untar main distribution tar -zxf gencode11_GRCh37.tgz # copy over obtain transcription support level analysis, not affect by change cd .. mkdir -p data cp /cluster/home/markd/compbio/ccds/branches/transSupV11.1/modules/gencodeTransSupport/exprs/classDev/runs/2012-02-26/results/gencode.v11.transcriptionSupportLevel.{tab,tsv} data/ # keep Makefile from previous V11 cp ../gencodeV11/Makefile . (time nice make -j 10 mkTables) >&build.out& # took ~45 minutes # validate what changed (cd ../gencodeV11; md5sum tables/*)> v11.tables.md5 diff -r ../gencodeV11/tables tables >tbl.diff