pslCDnaFilter [options] inPsl outPsl Filter cDNA alignments in psl format. Filtering criteria are comparative, selecting near best in genome alignments for each given cDNA and non-comparative, based only on the quality of an individual alignment. WARNING: comparative filters requires that the input is sorted by query name. The command: 'sort -k 10,10' will do the trick. Each alignment is assigned a score that is based on identity and weighted towards longer alignments and those with introns. This can do either global or local best-in-genome selection. Local near best in genome keeps fragments of an mRNA that align in discontinuous locations from other fragments. It is useful for unfinished genomes. Global near best in genome keeps alignments based on overall score. Options: -algoHelp - print message describing the filtering algorithm. -localNearBest=-1.0 - local near best in genome filtering, keeping aligments within this fraction of the top score for each aligned portion of the mRNA. A value of zero keeps only the best for each fragment. A value of -1.0 disables (default). -globalNearBest=-1.0 - global near best in genome filtering, keeping aligments withing this fraction of the top score. A value of zero keeps only the best alignment. A value of -1.0 disables (default). -ignoreNs - don't include Ns (repeat masked) while calculating the score and coverage. That is treat them as unaligned rather than mismatches. Ns are still counts as mismatches when calculating the identity. -ignoreIntrons - don't favor apparent introns when scoring. -minId=0.0 - only keep alignments with at least this fraction identity. -minCover=0.0 - minimum fraction of query that must be aligned. If -polyASizes is specified and the query is in the file, the ploy-A is not included in coverage calculation. -decayMinCover - the minimum coverage is calculated per alignment from the query size using the formula: minCoverage = 1.0 - qSize / 250.0 and minCoverage is bounded between 0.25 and 0.9. -minSpan=0.0 - keep only alignments whose target length are at least this fraction of the longest alignment passing the other filters. This can be useful for removing possible retroposed genes. -minQSize=0 - drop queries shorter than this size -minAlnSize=0 - minimum number of aligned bases. This includes repeats, but excludes poly-A/poly-T bases if available. -minNonRepSize=0 - Minimum number of matching bases that are not repeats. This does not include mismatches. Must use -repeats on BLAT if doing unmasked alignments. -maxRepMatch=1.0 - Maximum fraction of matching bases that are repeats. Must use -repeats on BLAT if doing unmasked alignments. -maxAligns=-1 - maximum number of alignments for a given query. If exceeded, then alignments are sorted by score and only this number will be saved. A value of -1 disables (default) -polyASizes=file - tab separate file with information about poly-A tails and poly-T heads. Format is outputted by faPolyASizes: id seqSize tailPolyASize headPolyTSize -usePolyTHead - if a poly-T head was detected and is longer than the poly-A tail, it is used when calculating coverage instead of the poly-A head. -bestOverlap - filter overlapping alignments, keeping the best of alignments that are similar. This is designed to be used with overlapping, windowed alignments, where one alignment might be truncated. Does not discarding ones with weird overlap unless -filterWeirdOverlapped is specified. -hapRegions=psl - PSL format alignments of each haplotype pseudo-chromosome to the corresponding reference chromosome region. This is used to map alignments between regions. -dropped=psl - save psls that were dropped to this file. -weirdOverlapped=psl - output weirdly overlapping PSLs to this file. -filterWeirdOverlapped - Filter weirdly overlapped alignments, keeping the single highest scoring one or an arbitrary one if multiple with the same high score. -alignStats=file - output the per-alignment statistics to this file -uniqueMapped - keep only cDNAs that are uniquely aligned after all other filters have been applied. -noValidate - don't run pslCheck validation. -verbose=1 - 0: quite 1: output stats 2: list problem alignment (weird or invalid) 3: list dropped alignments and reason for dropping 4: list kept psl and info 5: info about all PSLs -hapRefMapped=psl - output PSLs of haplotype to reference chromosome cDNA alignments mappings (for debugging purposes). -hapRefCDnaAlns=psl - output PSLs of haplotype cDNA to reference cDNA alignments (for debugging purposes). -hapLociAlns=outfile - output grouping of final alignments create by haplotype mapping process. Each row will start with an integer haplotype group id number follow by a PSL record. All rows with the same id are alignments of the a given cDNA that were determined to be haplotypes of the same locus. Alignments that are not part of a haplotype locus are not included. -alnIdQNameMode - add internal assigned alignment numbers to cDNA names on output. Useful for debugging, as they are include in the verbose tracing as [#1], etc. Will make a mess of normal production usage. -blackList=file.txt - adds a list of accession ranges to a black list. Any accession on this list is dropped. Black list file is two columns where the first column is the beginning of the range, and the second column is the end of the range, inclusive. The default options don't do any filtering. If no filtering criteria are specified, all PSLs will be passed though, except those that are internally inconsistent. THE INPUT MUST BE BE SORTED BY QUERY for the comparative filters.