Description

This track displays the conservation between the mouse and human genomes for 50 bp windows in the mouse genome that have at least 15 bp aligned to human. The score for a window reflects the probability that the level of observed conservation in that 50 bp region would occur by chance under neutral evolution. It is given on a logarithmic scale, and thus it is called the "L-score". An L-score of 1 means there is a 1/10 probability that the observed conservation level would occur by chance, an L-score of 2 means a 1/100 probability, an L-score of 3 means a 1/1000 probability, etc. The L-scores display as "mountain ranges". Clicking on a mountain range, a detail page is displayed from which you can access the base level alignments, both for the whole region and for the individual 50 bp windows.

Methods

Genome-wide alignments between mouse and human were produced by blastz. A set of 50 bp windows in the mouse genome were determined by scanning the sequence, sliding 5 bases at a time, and only those windows with at least 15 aligned bases were kept. For each window, a conservation score defined by

S = sqrt(n/m(1-m))(p-m)

was calculated, where n is the number of aligning bases in the window, p is the percent identity between mouse and human for these aligning bases, and m is the average percent identity for aligned neutrally evolving bases in a larger region surrounding the 50 bp window being scored. Neutral bases were taken from ancestral repeat sequences, which are relics of transposons that were inserted before the human-mouse split. To transform S into an L-score, the empirical cumulative distribution function CDF(S) = P(x < S) is computed from the scores of all windows genome-wide, and the L-score is defined as

L = -log_10(1 - CDF(S)).


The L-score provides a frequentist confidence assessment. A Bayesian calculation of the probability that a window is under selection can also be made using a mixture decomposition of the empirical density of the scores for all windows genome-wide into a neutral and a selected component. Details are given in a manuscript in preparation. The results are summarized in the table below.

L-score       Frequentist probability       Bayesian probability
              of this L-score or greater    that window with this
              given neutral evolution       L-score is under
                                            selection

------------------------------------------------------------------

   1                0.1                          0.32 
  2                0.01                         0.75
  3                0.001                        0.94
  4                0.0001                       0.97
  5                0.00001                      0.98
  6                0.000001                     0.99
    7                0.0000001                    >0.99 
   8                0.00000001                   >0.99

Using the Filter

The track filter can be used to configure some of the display characteristics of the track.

When you have finished configuring the filter, click the Submit button.

Credits

Thanks to Webb Miller and Scott Schwartz for creating the blastz alignments, Jim Kent for post-processing them, and Mark Diekhans for scoring the windows and selecting out the ancestral repeats. Krishna Roskin created S-scores for these windows. Ryan Weber computed the CDF for these S-scores, and created the remaining track display functions. Thanks to the Mouse Genome Sequencing Consortium for providing the mouse sequence data.