Description
This track displays the conservation between the mouse (Feb. 2003,
mm3 ) and rat (Jan. 2003, rn2) genomes for
50 bp windows in the mouse genome that have at least 15 bp aligned to
rat. The score for a window reflects the probability that the
level of observed conservation in that 50 bp region would occur by
chance under neutral evolution. It is given on a logarithmic scale,
and thus it is called the "L-score". An L-score of 1 means there is a
1/10 probability that the observed conservation level would occur by
chance, an L-score of 2 means a 1/100 probability, an L-score of 3
means a 1/1000 probability, etc. The L-scores display as
"mountain ranges". Clicking on a mountain range, a detail page is
displayed from which you can access the base level alignments, both
for the whole region and for the individual 50 bp windows.
Methods
Genome-wide alignments between mouse and rat were produced by
blastz.
A set of 50 bp windows in the mouse genome were determined
by scanning the sequence, sliding 5 bases at a time, and only those
windows with at least 15 aligned bases were kept. For each window,
a conservation score defined by
S = sqrt(n/m(1-m))(p-m)
was calculated, where n is the number of aligning bases in the
window, p is the percent identity between mouse and rat for these
aligning bases, and m is the average percent identity for aligned
neutrally evolving bases in a larger region surrounding the 50 bp
window being scored. Neutral bases were taken from ancestral repeat
sequences, which are relics of transposons that were inserted before
the rat-mouse split. To transform S into an L-score, the empirical
cumulative distribution function CDF(S) = P(x < S)
is computed from the scores of all windows genome-wide, and
the L-score is defined as
L = -log_10(1 - CDF(S)).
The L-score
provides a frequentist confidence assessment. A Bayesian
calculation of the probability that a window is under
selection can also be made using a mixture decomposition of
the empirical density of the scores for all windows
genome-wide into a neutral and a selected component. Details
are given in a manuscript in preparation. The results are
summarized in the table below.
L-score Frequentist probability Bayesian probability
of this L-score or greater that window with this
given neutral evolution L-score is under
selection
------------------------------------------------------------------
1 0.1 0.32
2 0.01 0.75
3 0.001 0.94
4 0.0001 0.97
5 0.00001 0.98
6 0.000001 0.99
7 0.0000001 >0.99
8 0.00000001 >0.99
Using the Filter
The track filter can be used to configure some of the display characteristics
of the track.
- Interpolation: This attribute determines whether the data samples are
displayed as discreet points on the track (the "Only samples" option) or are
connected by a line (the "Linear interpolation" option).
- Fill Blocks: When the on button is selected in this option, the area
underneath the sample points or line is filled in with gray.
- Track Height: Type in a new value to adjust the track height in pixels to best suit your screen display.
- Vertical Range: Type in a new min or max value to adjust the portion of the track's vertical
range that is displayed. Range units are marked by pale blue horizontal lines.
- Maximum Interval to Interpolate Across: This attribute sets the maximum gap
between alignments that will be spanned when the Linear Interpolation
attribute is selected. Type in a new value to increase or decrease the interval.
When you have finished configuring the filter, click the Submit button.
Credits
Thanks to Jim Kent for creating the blastz
alignments and
post-processing to create blastzBestRat track. Ryan Weber
computed the
windows s-scores, computed the CDF of these scores, and created the
remaining
track display functions. Mark Diekhans and Krishna Roskin created
software used
in this process. Mouse sequence data are provided by the Mouse Genome
Sequencing Consortium.