Description

This track shows the "reciprocal best" $organism/$o_organism alignment net. It is useful for finding orthologous regions and for studying genome rearrangement.

Display Conventions and Configuration

In full display mode, the top-level (level 1) chains are the largest, highest-scoring chains that span this region. In many cases gaps exist in the top-level chain. When possible, these are filled in by other chains that are displayed at level 2. The gaps in level 2 chains may be filled by level 3 chains and so forth.

In the graphical display, the boxes represent ungapped alignments; the lines represent gaps. Click on a box to view detailed information about the chain as a whole; click on a line to display information about the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement.

Individual items in the display are categorized as one of four types (other than gap):

Note: The $o_organism data set used to generate this track is based on scaffolds rather than chromosomes. Because of this, the track coloring scheme does not correspond to the chromosome color key displayed on the track page. A somewhat random scheme (scaffold# modulo #colors) was used to color this track simply to make the scaffold boundaries easier to distinguish.

Methods

These alignments were generated using blastz and blat alignments of $o_organism genomic sequence from the 13 Nov. 2003 Arachne $o_organism draft assembly. The initial alignments were generated using blastz on repeatmasked sequence using the following $organism/$o_organism scoring matrix:

     A    C    G    T
A   100 -300 -150 -300
C  -300  100 -300 -150
G  -150 -300  100 -300
T  -300 -150 -300  100

K = 4500, L = 3000,  Y=3400, H=2000

The resulting alignments were processed by the axtChain program, which organizes all the alignments between a single $o_organism scaffold and a single $organism chromosome into a group and makes a kd-tree out of all the gapless subsections (blocks) of the alignments. The maximally-scoring chains of these blocks were found by running a dynamic program over the kd-tree. Chains scoring below a certain threshold were discarded.

To place additional $o_organism scaffolds that weren't initially aligned by blastz, a DNA blat of the unmasked sequence was performed. The resulting blat alignments were also chained, and then merged with the blastz-based chains produced in the previous step to produce "all chains".

These chaines were sorted with the highest-scoring chains in the genome ranked first. The program chainNet was then used to place the chains one at a time, trimming them as necessary to fit into sections not already covered by a higher-scoring chain. During this process, a natural hierarchy emerged in which a chain that filled a gap in a higher-scoring chain was placed underneath that chain. The program netSyntenic was used to fill in information about the relationship between higher- and lower-level chains, such as whether a lower-level chain was syntenic or inverted relative to the higher-level chain.

Due to the draft nature of this initial genome assembly, this net track (and the companion chain track) was generated using a "reciprocal best" strategy. This strategy attempts to minimize paralog fill-in for missing orthologous $o_organism sequence by filtering from the $organism net all sequences not found in the $o_organism side of the net. After generating the $organism alignment net, the subset of chains in the $o_organism reference net was extracted and used for an additional netting step, which was then filtered for non-syntenic sequences smaller than 50 bases.

Credits

The $o_organism sequence used in this track was obtained from the 13 Nov. 2003 Arachne assembly. We'd like to thank the National Human Genome Research Institute (NHGRI), the Eli & Edythe L. Broad Institute at MIT/Harvard, and Washington University School of Medicine for providing this sequence.

The chainNet, netSyntenic, and netClass programs were developed at the University of California Santa Cruz by Jim Kent.

Blastz was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison.

The browser display and database storage of the nets were made by Robert Baertsch and Jim Kent.

References

Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA 100(20), 11484-11489 (2003).

Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., Haussler, D., and Miller, W. Human-Mouse Alignments with BLASTZ. Genome Res. 13(1), 103-7 (2003).