Description

This track displays the conservation between the human and mouse genomes for 50 bp windows in the human genome that have at least 15 bp aligned to mouse. The score for a window reflects the probability that the level of observed conservation in that 50 bp region would occur by chance under neutral evolution. It is given on a logarithmic scale, and thus it is called the "L-score". An L-score of 1 means there is a 1/10 probability that the observed conservation level would occur by chance, an L-score of 2 means a 1/100 probability, an L-score of 3 means a 1/1000 probability, etc. The L-scores display as "mountain ranges". Clicking on a mountain range, a detail page is displayed from which you can access the base level alignments, both for the whole region and for the individual 50 bp windows.

Methods

Genome-wide alignments between human and mouse were produced by blastz. A set of 50 bp windows in the human genome were determined by scanning the sequence, sliding 5 bases at a time, and only those windows with at least 15 aligned bases were kept. For each window, a conservation score defined by

S = sqrt(n/m(1-m))(p-m)
was calculated, where n is the number of aligning bases in the window, p is the percent identity between human and mouse for these aligning bases, and m is the average percent identity for aligned neutrally evolving bases in a larger region surrounding the 50 bp window being scored. Neutral bases were taken from ancestral repeat sequences, which are relics of transposons that were inserted before the human-mouse split. To transform S into an L-score, the empirical cumulative distribution function CDF(S) = P(x < S) is computed from the scores of all windows genome-wide, and the L-score is defined as

L = -log_10(1 - CDF(S)).

The L-score provides a frequentist confidence assessment. A Bayesian calculation of the probability that a window is under selection can also be made using a mixture decomposition of the empirical density of the scores for all windows genome-wide into a neutral and a selected component. Details are given in a manuscript in preparation. The results are summarized in the table below.

L-score       Frequentist probability       Bayesian probability
              of this L-score or greater    that window with this
              given neutral evolution       L-score is under
                                            selection

------------------------------------------------------------------

   1                0.1                          0.32 
  2                0.01                         0.75
  3                0.001                        0.94
  4                0.0001                       0.97
  5                0.00001                      0.98
  6                0.000001                     0.99
    7                0.0000001                    >0.99 
   8                0.00000001                   >0.99

Using the Filter

The track filter can be used to configure some of the display characteristics of the track.

Interpolation: This attribute determines whether the data samples are displayed as discreet points on the track (the "Only samples" option) or are connected by a line (the "Linear interpolation" option).
Fill Blocks: When the on button is selected in this option, the area underneath the sample points or line is filled in with gray.
Track Height: Type in a new value to adjust the track height in pixels to best suit your screen display.
Vertical Range: Type in a new min or max value to adjust the portion of the track's vertical range that is displayed. Range units are marked by pale blue horizontal lines.
Maximum Interval to Interpolate Across: This attribute sets the maximum gap between alignments that will be spanned when the Linear Interpolation attribute is selected. Type in a new value to increase or decrease the interval.

When you have finished configuring the filter, click the Submit button.

Credits

Thanks to Webb Miller and Scott Schwartz for creating the blastz alignments, Jim Kent for post-processing them, and Mark Diekhans for scoring the windows and selecting out the ancestral repeats. Krishna Roskin created S-scores for these windows. Ryan Weber computed the CDF for these S-scores, and created the remaining track display functions. Mouse sequence data are provided by the Mouse Genome Sequencing Consortium.