Number of close oligoes for 25-mers. #MaxDiff closeCount 0 1 1 76 2 5476 3 378076 Phase 1 - using normal hash function on 25-mers testing on chromosome 22 only. 5 203M 28.9 10 295M 32.2 20 479M 44.0 30 663M 56.9 18 Meg per oligo ~ 1.2 S/oligo + 23S Nonlinear Phase 2 - using custom hash on chr22 only. #oligos memory time hashHitRatio timekkr1u01 5 190M 25.4 0.056 10 204M 29.7 20.6 20 234M 33.2 30 263M 36.9 40 292M 41.3 80 410M 56.3 0.710 100 468M 63.2 0.892 3 Meg per oligo + 175M ~ 0.35 S/oligo + 24S on chr21 + chr22 #oligos time 10 56.1 100 101.5 on chr5 (about 180Meg) #oligos time 10 117.0 100 195.0 Extrapolating to genome-wide looks like if you just did masked, it might take 20s/oligo. On chromosome 22 have 15 million oligos. 300 million seconds 300,000 cluster seconds 100 cluster hours Looks like could go 70x faster just looking for up to 2 mismatches. This would take it down to 1.5 cluster hours for 22, and 150 for the genome. on chr5 with maxDiff=2 #oligos memory time maxDiff 100 302M 120.5 1 100 306M 121.8 2