Description

This track displays the conservation between the mouse (Feb. 2003) and human (April 2003) genomes for 50 bp windows in the mouse genome that have at least 15 bp aligned to human. Unlike previous versions of this track, it is based on the netting alignment which throws out some of the less well supported blastz alignment pieces. The score for a window reflects the probability that the level of observed conservation in that 50 bp region would occur by chance under neutral evolution. It is given on a logarithmic scale, and thus it is called the "L-score". An L-score of 1 means there is a 1/10 probability that the observed conservation level would occur by chance, an L-score of 2 means a 1/100 probability, an L-score of 3 means a 1/1000 probability, etc. The L-scores display as "mountain ranges". Clicking on a mountain range, a detail page is displayed from which you can access the base level alignments, both for the whole region and for the individual 50 bp windows.

Methods

Genome-wide alignments between mouse and human were produced by blastz and filtered for pseudogenes and artifacts with Jim Kent's "netting". A set of 50 bp windows in the mouse genome were determined by scanning the sequence, sliding 5 bases at a time, and only those windows with at least 15 aligned bases were kept. For each window, a conservation score defined by

S = sqrt(n/m(1-m))(p-m)

was calculated, where n is the number of aligning bases in the window, p is the percent identity between mouse and human for these aligning bases, and m is the average percent identity for aligned neutrally evolving bases in a larger region surrounding the 50 bp window being scored. Neutral bases were taken from ancestral repeat sequences, which are relics of transposons that were inserted before the mouse-human split. To transform S into an L-score, the empirical cumulative distribution function CDF(S) = P(x < S) is computed from the scores of all windows genome-wide, and the L-score is defined as

L = -log_10(1 - CDF(S)).


The L-score provides a frequentist confidence assessment. A Bayesian calculation of the probability that a window is under selection can also be made using a mixture decomposition of the empirical density of the scores for all windows genome-wide into a neutral and a selected component. Details are given in a manuscript in preparation. The results are summarized in the table below.

L-score       Frequentist probability       Bayesian probability
              of this L-score or greater    that window with this
              given neutral evolution       L-score is under
                                            selection

------------------------------------------------------------------

   1                0.1                          0.32 
  2                0.01                         0.75
  3                0.001                        0.94
  4                0.0001                       0.97
  5                0.00001                      0.98
  6                0.000001                     0.99
    7                0.0000001                    >0.99 
   8                0.00000001                   >0.99

Using the Filter

The track filter can be used to configure some of the display characteristics of the track.

When you have finished configuring the filter, click the Submit button.

Credits

Thanks to Jim Kent for creating the blastz alignments and post-processing them to create the netted alignment. Ryan Weber computed the windows s-scores, computed the CDF of these scores, and created the remaining track display functions. Mark Diekhans and Krishna Roskin created software used in this process. Mouse sequence data are provided by the Mouse Genome Sequencing Consortium.