Description

This track displays human-centric multiple sequence alignments in the ENCODE regions for the 23 vertebrates included in the May 2005 ENCODE MSA freeze, based on comparative sequence data generated for the ENCODE project. The alignments in this track were generated using the LAGAN Alignment Toolkit. A complete list of the vertebrates included in the May 2005 freeze may be found at the top of the description page for this track.

The Genome Browser companion tracks, MLAGAN Cons and MLAGAN Elements, display conservation scoring and conserved elements for these alignments based on various conservation methods.

Display Conventions and Configuration

In full display mode, this track shows pairwise alignments of each species aligned to the human genome. The alignments are shown in dense display mode using a gray-scale density gradient. The checkboxes in the track configuration section allow the exclusion of species from the pairwise display.

When zoomed-in to the base-display level, the track shows the base composition of each alignment. The numbers and symbols on the "human gap" line indicate the lengths of gaps in the human sequence at those alignment positions relative to the longest non-human sequence. If there is sufficient space in the display, the size of the gap is shown; if not, and if the gap size is a multiple of 3, a "*" is displayed, otherwise "+" is shown. To view detailed information about the alignments at a specific position, zoom in the display to 30,000 or fewer bases, then click on the alignment.

Methods

To create the alignments, the sequence of each non-human species was first "rearranged" to be orthologously collinear with respect to the human sequence. The rearrangements were generated using a suite of tools and algorithms based on Shuffle-LAGAN and SuperMap. For each pairing of human sequence with that of another species, Shuffle-LAGAN was used to find the best-scoring chain of local similarities according to a scoring scheme that penalized evolutionary rearrangements. SuperMap was then used to aggregate parts of the chain into a human-monotonic map of syntenic blocks. This mapping was used to undo the genomic rearrangements of the other sequence and convert it to a form that was directly alignable to the human sequence.

A multiple global alignment was created for every region using MLAGAN. The alignments were then refined using MUSCLE, which processes small non-overlapping windows of an alignment and attempts to realign them in an iterative fashion, keeping the refined alignment if it has a better sum-of-pairs score than the original.

Credits

The MLAGAN alignments were generated by George Asimenos from Stanford's ENCODE group.

Shuffle-LAGAN, SuperMap and MLAGAN were written by Mike Brudno.

MUSCLE was authored by Bob Edgar.

The phylogenetic tree is based on Murphy et al. (2001) and general consensus in the vertebrate phylogeny community.

References

Brudno M, Do C, Cooper G, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 2006;13(4):721-31.

Brudno M, Malde S, Poliakov A, Do C, Courone O, Dubchak I, Batzoglou S. Glocal alignment: finding rearrangements during alignment. Bioinformatics. 2003;19(Suppl. 1):i54-i62.

Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res. 2004;32(5):1792-7.

Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW et al. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001;294(5550):2348-51.