This directory contains the Blat application for stand-alone use. Please note that the Blat source and executables are freely available for academic, nonprofit and personal use. Commercial licensing information is available on the Kent Informatics website (http://www.kentinformatics.com/). For help installing and running Blat please see: - Documentation: http://genome.ucsc.edu/goldenPath/help/blatSpec.html - FAQs: http://genome.ucsc.edu/FAQ/FAQblat.html
Name Last modified Size Description
Parent Directory - FOOTER 06-Apr-2011 15:41 10K blat 05-Apr-2011 17:00 432K gfClient 05-Apr-2011 17:00 417K gfServer 05-Apr-2011 17:00 291K
================================================================ ======== blat ==================================== ================================================================ blat - Standalone BLAT v. 34x10 fast sequence search command line tool usage: blat database query [-ooc=11.ooc] output.psl where: database and query are each either a .fa , .nib or .2bit file, or a list these files one file name per line. -ooc=11.ooc tells the program to load over-occurring 11-mers from and external file. This will increase the speed by a factor of 40 in many cases, but is not required output.psl is where to put the output. Subranges of nib and .2bit files may specified using the syntax: /path/file.nib:seqid:start-end or /path/file.2bit:seqid:start-end or /path/file.nib:start-end With the second form, a sequence id of file:start-end will be used. options: -t=type Database type. Type is one of: dna - DNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein The default is dna -q=type Query type. Type is one of: dna - DNA sequence rna - RNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein rnax - DNA sequence translated in three frames to protein The default is dna -prot Synonymous with -t=prot -q=prot -ooc=N.ooc Use overused tile file N.ooc. N should correspond to the tileSize -tileSize=N sets the size of match that triggers an alignment. Usually between 8 and 12 Default is 11 for DNA and 5 for protein. -stepSize=N spacing between tiles. Default is tileSize. -oneOff=N If set to 1 this allows one mismatch in tile and still triggers an alignments. Default is 0. -minMatch=N sets the number of tile matches. Usually set from 2 to 4 Default is 2 for nucleotide, 1 for protein. -minScore=N sets minimum score. This is the matches minus the mismatches minus some sort of gap penalty. Default is 30 -minIdentity=N Sets minimum sequence identity (in percent). Default is 90 for nucleotide searches, 25 for protein or translated protein searches. -maxGap=N sets the size of maximum gap between tiles in a clump. Usually set from 0 to 3. Default is 2. Only relevent for minMatch > 1. -noHead suppress .psl header (so it's just a tab-separated file) -makeOoc=N.ooc Make overused tile file. Target needs to be complete genome. -repMatch=N sets the number of repetitions of a tile allowed before it is marked as overused. Typically this is 256 for tileSize 12, 1024 for tile size 11, 4096 for tile size 10. Default is 1024. Typically only comes into play with makeOoc. Also affected by stepSize. When stepSize is halved repMatch is doubled to compensate. -mask=type Mask out repeats. Alignments won't be started in masked region but may extend through it in nucleotide searches. Masked areas are ignored entirely in protein or translated searches. Types are lower - mask out lower cased sequence upper - mask out upper cased sequence out - mask according to database.out RepeatMasker .out file file.out - mask database according to RepeatMasker file.out -qMask=type Mask out repeats in query sequence. Similar to -mask above but for query rather than target sequence. -repeats=type Type is same as mask types above. Repeat bases will not be masked in any way, but matches in repeat areas will be reported separately from matches in other areas in the psl output. -minRepDivergence=NN - minimum percent divergence of repeats to allow them to be unmasked. Default is 15. Only relevant for masking using RepeatMasker .out files. -dots=N Output dot every N sequences to show program's progress -trimT Trim leading poly-T -noTrimA Don't trim trailing poly-A -trimHardA Remove poly-A tail from qSize as well as alignments in psl output -fastMap Run for fast DNA/DNA remapping - not allowing introns, requiring high %ID. Query sizes must not exceed 5000. -out=type Controls output file format. Type is one of: psl - Default. Tab separated format, no sequence pslx - Tab separated format with sequence axt - blastz-associated axt format maf - multiz-associated maf format sim4 - similar to sim4 format wublast - similar to wublast format blast - similar to NCBI blast format blast8- NCBI blast tabular format blast9 - NCBI blast tabular format with comments -fine For high quality mRNAs look harder for small initial and terminal exons. Not recommended for ESTs -maxIntron=N Sets maximum intron size. Default is 750000 -extendThroughN - Allows extension of alignment through large blocks of N's ================================================================ ======== gfClient ==================================== ================================================================ gfClient v. 34x10 - A client for the genomic finding program that produces a .psl file usage: gfClient host port seqDir in.fa out.psl where host is the name of the machine running the gfServer port is the same as you started the gfServer with seqDir is the path of the .nib or .2bit files relative to the current dir (note these are needed by the client as well as the server) in.fa is a fasta format file. May contain multiple records out.psl where to put the output options: -t=type Database type. Type is one of: dna - DNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein The default is dna -q=type Query type. Type is one of: dna - DNA sequence rna - RNA sequence prot - protein sequence dnax - DNA sequence translated in six frames to protein rnax - DNA sequence translated in three frames to protein -prot Synonymous with -d=prot -q=prot -dots=N Output a dot every N query sequences -nohead Suppresses psl five line header -minScore=N sets minimum score. This is twice the matches minus the mismatches minus some sort of gap penalty. Default is 30 -minIdentity=N Sets minimum sequence identity (in percent). Default is 90 for nucleotide searches, 25 for protein or translated protein searches. -out=type Controls output file format. Type is one of: psl - Default. Tab separated format without actual sequence pslx - Tab separated format with sequence axt - blastz-associated axt format maf - multiz-associated maf format sim4 - similar to sim4 format wublast - similar to wublast format blast - similar to NCBI blast format blast8- NCBI blast tabular format blast9 - NCBI blast tabular format with comments -maxIntron=N Sets maximum intron size. Default is 750000 ================================================================ ======== gfServer ==================================== ================================================================ gfServer v 34x10 - Make a server to quickly find where DNA occurs in genome. To set up a server: gfServer start host port file(s) Where the files are in .nib or .2bit format To remove a server: gfServer stop host port To query a server with DNA sequence: gfServer query host port probe.fa To query a server with protein sequence: gfServer protQuery host port probe.fa To query a server with translated dna sequence: gfServer transQuery host port probe.fa To query server with PCR primers gfServer pcr host port fPrimer rPrimer maxDistance To process one probe fa file against a .nib format genome (not starting server): gfServer direct probe.fa file(s).nib To test pcr without starting server: gfServer pcrDirect fPrimer rPrimer file(s).nib To figure out usage level gfServer status host port To get input file list gfServer files host port Options: -tileSize=N size of n-mers to index. Default is 11 for nucleotides, 4 for proteins (or translated nucleotides). -stepSize=N spacing between tiles. Default is tileSize. -minMatch=N Number of n-mer matches that trigger detailed alignment Default is 2 for nucleotides, 3 for protiens. -maxGap=N Number of insertions or deletions allowed between n-mers. Default is 2 for nucleotides, 0 for protiens. -trans Translate database to protein in 6 frames. Note: it is best to run this on RepeatMasked data in this case. -log=logFile keep a log file that records server requests. -seqLog Include sequences in log file (not logged with -syslog) -ipLog Include user's IP in log file (not logged with -syslog) -syslog Log to syslog -logFacility=facility log to the specified syslog facility - default local0. -mask Use masking from nib file. -repMatch=N Number of occurrences of a tile (nmer) that trigger repeat masking the tile. Default is 1024. -maxDnaHits=N Maximum number of hits for a dna query that are sent from the server. Default is 100. -maxTransHits=N Maximum number of hits for a translated query that are sent from the server. Default is 200. -maxNtSize=N Maximum size of untranslated DNA query sequence Default is 40000 -maxAsSize=N Maximum size of protein or translated DNA queries Default is 8000 -canStop If set then a quit message will actually take down the server