Stratified Random Picks

To make the stratified picks the genome is divided into the top 20%, middle 30%, and bottom 50% along two axis - gene density and nontranscribed conservation. Then three random picks are taken from each strata, and a fourth pick in the strata that are underrepresented in the manual picks. One additional backup pick is made in each strata in case there is an unforeseen technical problem with a region. The backup pick is parenthesized below.

consNonTx 0% - 50%, gene 0% - 50% (1 manual)

chr13:28500001-29000000 consNonTx 2.8%, gene 0.5%
chr2:51700001-52200000 consNonTx 3.8%, gene 0.0%
chr4:119000001-119500000 consNonTx 3.9%, gene 0.0%
chr10:54300001-54800000 consNonTx 2.8%, gene 1.2%
(chr5:15900001-16400000 consNonTx 5.1%, gene 1.7%)

consNonTx 0% - 50%, gene 50% - 80% (4 manual)

chr2:115500001-116000000 consNonTx 6.2%, gene 2.3%
chr18:61100001-61600000 consNonTx 3.4%, gene 3.4%
chr12:40500001-41000000 consNonTx 1.7%, gene 3.1%
(chr2:196700001-197200000 consNonTx 5.4%, gene 3.3%)

consNonTx 0% - 50%, gene 80% - 100% (11 manual)

chr2:232500001-233000000 consNonTx 1.3%, gene 4.6%
chr13:111900001-112400000 consNonTx 1.1%, gene 5.5%
chr21:36900001-37400000 consNonTx 2.3%, gene 5.2%
(chr4:47800001-48300000 consNonTx 1.9%, gene 4.4%)

consNonTx 50% - 80%, gene 0% - 50% (2 manual)

chr16:25300001-25800000 consNonTx 9.7%, gene 0.5%
chr5:141800001-142300000 consNonTx 6.7%, gene 1.7%
chr18:25400001-25900000 consNonTx 7.4%, gene 0.9%
(chr4:124800001-125300000 consNonTx 6.3%, gene 0.9%)

consNonTx 50% - 80%, gene 50% - 80% (4 manual)

chr5:56000001-56500000 consNonTx 7.9%, gene 2.2%
chr6:131800001-132300000 consNonTx 6.9%, gene 2.1%
chr6:73700001-74200000 consNonTx 6.4%, gene 3.6%
(chr4:53700001-54200000 consNonTx 9.0%, gene 2.1%)

consNonTx 50% - 80%, gene 80% - 100% (3 manual)

chr1:149000001-149500000 consNonTx 10.2%, gene 8.4%
chr9:122800001-123300000 consNonTx 8.3%, gene 5.9%
chr15:39100001-39600000 consNonTx 9.7%, gene 10.6%
(chr17:33400001-33900000 consNonTx 7.7%, gene 6.1%)

consNonTx 80% - 100%, gene 0% - 50% (3 manual)

chr14:51200001-51700000 consNonTx 14.9%, gene 0.1%
chr11:133100001-133600000 consNonTx 13.5%, gene 0.3%
chr16:52600001-53100000 consNonTx 15.4%, gene 0.0%
(chrX:41900001-42400000 consNonTx 13.4%, gene 0.7%)

consNonTx 80% - 100%, gene 50% - 80% (1 manual)

chr8:117800001-118300000 consNonTx 11.4%, gene 3.2%
chr14:96900001-97400000 consNonTx 15.9%, gene 2.9%
chr7:74700001-75200000 consNonTx 11.8%, gene 2.1%
chrX:117500001-118000000 consNonTx 10.7%, gene 2.0%
(chr6:108100001-108600000 consNonTx 18.6%, gene 2.3%)

consNonTx 80% - 100%, gene 80% - 100% (1 manual)

chr2:218300001-218800000 consNonTx 13.3%, gene 9.1%
chr11:66700001-67200000 consNonTx 13.4%, gene 9.0%
chr20:33600001-34100000 consNonTx 11.5%, gene 9.2%
chr6:41300001-41800000 consNonTx 15.2%, gene 4.8%
(chr9:124300001-124800000 consNonTx 11.4%, gene 5.4%)

Stratefication of Manual Picks

Here is the noncoding conservation and gene density of non-overlapping 500 kb regions in the manual picks. The boundaries between strata are:

          low 50%  middle 30%  high 20%
         ------------------------------
gene      0.0-1.9%  1.9-4.2%   4.2-100%
consNotTx 0.0-6.3%  6.3-10.6% 10.6-100%


CFTR
    chr7:114288355-114788354 gene  1.6% consNotTx  6.0%
    chr7:114788355-115288354 gene  2.8% consNotTx  9.9%
    chr7:115288355-115788354 gene  3.3% consNotTx  8.5%
    chr7:115788355-116165780 gene  2.5% consNotTx  4.4%
Interleukin_Cluster
    chr5:130778557-131278556 gene  5.2% consNotTx  5.1%
    chr5:131278557-131778556 gene  9.5% consNotTx  7.0%
Apo_Cluster
    chr11:118810001-119310000 gene  3.1% consNotTx  9.1%
Chr22
    chr22:28500001-29000000 gene  5.1% consNotTx  3.6%
    chr22:29000001-29500000 gene  3.7% consNotTx  2.6%
    chr22:29500001-30000000 gene  3.9% consNotTx  6.8%
    chr22:30000001-30200000 gene  0.2% consNotTx  8.8%
Chr21
    chr21:30323762-30823761 gene  5.3% consNotTx  6.5%
    chr21:30823762-31323761 gene  2.8% consNotTx 13.5%
    chr21:31323762-31823761 gene  7.8% consNotTx  4.9%
    chr21:31823762-32019746 gene  3.2% consNotTx  3.7%
ChrX
    chrX:147250001-147750000 gene 14.0% consNotTx  6.6%
    chrX:147750001-148250000 gene  8.3% consNotTx  4.3%
    chrX:148250001-148500000 gene  6.3% consNotTx  4.8%
Chr19
    chr19:55200001-55700000 gene 11.3% consNotTx  1.6%
    chr19:55700001-56200000 gene  8.2% consNotTx  1.3%
Alpha_Globin
    chr16:79138-579137 gene 10.5% consNotTx  2.8%
Beta_Globin
    chr11:5550000-6049999 gene  5.4% consNotTx  1.9%
    chr11:6050000-6549999 gene  6.2% consNotTx  0.0%
HOXA_cluster
    chr7:26600001-27100000 gene  5.1% consNotTx 22.0%
IGF2/H19
    chr11:300001-800000 gene  7.2% consNotTx  5.5%
    chr11:800001-900000 gene  1.8% consNotTx  5.4%
FOXP2
    chr7:112410991-112910990 gene  1.1% consNotTx 23.3%
    chr7:112910991-113410990 gene  1.1% consNotTx 16.7%

Semi-Manual Picks

Here's the stratification of the other zoo-seq regions. I recommend picking 7q21.13 and 7q31.33 to round things out.

7q21.13
    chr7:88319137-88819136 gene  3.0% consNotTx  4.4%
    chr7:88819137-89319136 gene  0.3% consNotTx  8.2%
    chr7:89319137-89433560 gene  7.2% consNotTx 14.6%
7q21.3
    chr7:91589227-92089226 gene  2.9% consNotTx  3.4%
    chr7:92089227-92559635 gene  0.5% consNotTx  8.1%
7q21.3
    chr7:93650712-94150711 gene  1.8% consNotTx  9.4%
    chr7:94150712-94650711 gene  1.8% consNotTx 15.2%
    chr7:94650712-94868826 gene  1.5% consNotTx 16.2%
7q31.33
    chr7:124556444-125056443 gene  0.5% consNotTx  7.9%
    chr7:125056444-125556443 gene  1.1% consNotTx 11.2%
    chr7:125556444-125719632 gene  4.3% consNotTx 10.9%
7q32.1
    chr7:126427707-126927706 gene  5.8% consNotTx  6.8%
    chr7:126927707-127330661 gene  8.3% consNotTx  3.8%