Description
This track displays the conservation between the human and mouse genomes for
50 bp windows in the human genome that have at least 15 bp aligned to
mouse. The score for a window reflects the probability that the
level of observed conservation in that 50 bp region would occur by
chance under neutral evolution. It is given on a logarithmic scale,
and thus it is called the "L-score". An L-score of 1 means there is a
1/10 probability that the observed conservation level would occur by
chance, an L-score of 2 means a 1/100 probability, an L-score of 3
means a 1/1000 probability, etc. The L-scores display as
"mountain ranges". Clicking on a mountain range, a detail page is
displayed from which you can access the base level alignments, both
for the whole region and for the individual 50 bp windows.
Methods
Genome-wide alignments between human and mouse were produced by
blastz. A set of 50 bp windows in the human genome were determined
by scanning the sequence, sliding 5 bases at a time, and only those
windows with at least 15 aligned bases were kept. For each window,
a conservation score defined by
S = sqrt(n/m(1-m))(p-m)
was calculated, where n is the number of aligning bases in the
window, p is the percent identity between human and mouse for these
aligning bases, and m is the average percent identity for aligned
neutrally evolving bases in a larger region surrounding the 50 bp
window being scored. Neutral bases were taken from ancestral repeat
sequences, which are relics of transposons that were inserted before
the human-mouse split. To transform S into an L-score, the empirical
cumulative distribution function CDF(S) = P(x < S)
is computed from the scores of all windows genome-wide, and
the L-score is defined as
L = -log_10(1 - CDF(S)).
The L-score
provides a frequentist confidence assessment. A Bayesian
calculation of the probability that a window is under
selection can also be made using a mixture decomposition of
the empirical density of the scores for all windows
genome-wide into a neutral and a selected component. Details
are given in a manuscript in preparation. The results are
summarized in the table below.
L-score Frequentist probability Bayesian probability
of this L-score or greater that window with this
given neutral evolution L-score is under
selection
------------------------------------------------------------------
1 0.1 0.32
2 0.01 0.75
3 0.001 0.94
4 0.0001 0.97
5 0.00001 0.98
6 0.000001 0.99
7 0.0000001 >0.99
8 0.00000001 >0.99
Using the Filter
The track filter can be used to configure some of the display characteristics
of the track.
- Interpolation: This attribute determines whether the data samples are
displayed as discreet points on the track (the "Only samples" option) or are
connected by a line (the "Linear interpolation" option).
- Fill Blocks: When the on button is selected in this option, the area
underneath the sample points or line is filled in with gray.
- Track Height: Type in a new value to adjust the track height in pixels to best suit your screen display.
- Vertical Range: Type in a new min or max value to adjust the portion of the track's vertical
range that is displayed. Range units are marked by pale blue horizontal lines.
- Maximum Interval to Interpolate Across: This attribute sets the maximum gap
between alignments that will be spanned when the Linear Interpolation
attribute is selected. Type in a new value to increase or decrease the interval.
When you have finished configuring the filter, click the Submit button.
Credits
Thanks to Webb Miller and Scott Schwartz for creating the blastz
alignments, Jim Kent for post-processing them, and
Mark Diekhans for scoring the windows and selecting out the ancestral repeats.
Krishna Roskin created S-scores for these windows. Ryan Weber computed the CDF
for these S-scores, and created the remaining track display functions. Mouse sequence data are provided by the Mouse Genome Sequencing Consortium.