Filtering algorithm: The filters are applied in the following order to each set of alignments for a given cDNA sequence. o PSLs with accessions on an optional black list are dropped o PSLs that are not internally consistent, such has having overlapping blocks, are dropped. Use pslCheck program to get the details. o Drop if repMatch/totalMatch is greater than maxRepMatch. o Drop queries less than the min size. o Filter by minimum identity to the target sequence. o Filter by minimum coverage, if polyA sizes are supplied, the poly-A tail is not included in coverage calculation. If decayMinCover is specified, the minimum coverage is calculated per alignment from the query size using the formula: minCoverage = 1.0 - qSize / 250.0 and minCoverage is bounded between 0.25 and 0.9. o Keep only best overlapping alignment if request This is the one with the highest score. o Filter by minSpan. o Filter sets of weirdly overlapping alignments, keeping only the best scoring from a set of overlaps. o Filter by either global or local near best in genome. See notes below for handling of haplotype pseudo-chromsomes. o Filter by maxAligns o Look for weird overlaps that might be left after above filters. Either dropping or keep the weird overlaps as requested. Weird alignments are sets of alignments for a given query that overlap, however the same query bases align to different target bases. These will be included in the output unless dropped by another filter criteria. These are often caused by tandem repeats. A small fraction of differently aligned bases are allowed to handle different block boundaries. Filtering of haplotype regions: Haplotypes stored as pseudo-chromosomes require special handling during comparative filtering to prevent alignments from being dropped from one or more versions a haplotype region. The following algorithm is applied to filtering all of the alignments of a given cDNA. Association of hapolotype alignments to reference alignments is not done until other filtering has take place, which limits problem cases. o Alignments of hapolotype pseudo-chromosomes to hapolotype regions of reference chromosomes are supplied in PSL format (hapMappingAlns) o Foreach cDNA alignment to a haplotype pseudo-chromosomes (hapAln): o map hapAln to reference chromsome using hapMappingAlns, which can create multiple mappedHapAlns. o foreach refAln: o foreach mappedHapAlns: o transMap the mappedHapAln with refAln to make cDNA to cDNA alignment (hapRefCdnaAln) o Determine the number of bases in hapRefCdnaAln align that are either mapped to the same mRNA base. Bases of refAln that are outside of the haplotype region on the chromosome are are not counted. This is divided by the total number of mRNA bases that could be counted (excludes outside of the hap region), to form a score. o link refAln and hapAln if any of the same bases aligned o Alignments that are not linked are treated as independent alignments by the comparative filtering. o When a near-best comparative filter is applied, each refAln and it's linked set of hapAlns are treated as if it where a single alignment. The best score for the linked set is used. If any of the refAlns that a hapAln is linked to is kept, it is kept, otherwise the hap align is dropped.