This track displays human-centric multiple sequence alignments and conserved elements in the ENCODE regions for the 36 vertebrates included in the December 2007 ENCODE MSA freeze. The alignments in this track were generated using the Threaded Blockset Aligner (TBA). The conservation subtracks display conserved elements generated by two methods: BinCons, a binomial-based method that calculates a conservation score in sliding windows with normalization for phylogenetic bias, and Chai Cons, a DNA structure-informed constraint detection algorithm that uses hydroxyl radical cleavage patterns as a measure of DNA structure.
The multiple alignments are based on comparative sequence data generated for the ENCODE project from NIH Intramural Sequencing Center (NISC) as well as whole-genome assemblies residing at UCSC, as listed:
Organism Species Version Human Homo sapiens UCSC hg18 Armadillo Dasypus novemcinctus NISC Baboon Papio anubis NISC Bat (rfbat) Rhinolophus ferrumequinum NISC Bat (sbbat) Myotis lucifugus NISC Cat Felis catus NISC Chicken Gallus gallus UCSC galGal3 Chimpanzee Pan troglodytes UCSC panTro2 Colobus Monkey Colobus guereza NISC Cow Bos taurus UCSC bosTau3 Dog Canis familiaris UCSC canFam2 Dusky titi Callicebus moloch NISC Elephant Loxodonta africana NISC Flying Fox Pteropus vampyrus NISC Galago Otolemur garnettii NISC Gibbon Nomascus leucogenys leucogenys NISC Guinea pig Cavia porcellus NISC Hedgehog Atelerix albiventris NISC Horse Equus caballus NISC Macaque Macaca mulatta UCSC rheMac2 Marmoset Callithrix jacchus NISC Mouse Mus musculus UCSC mm9 Mouse Lemur Microcebus murinus NISC Opossum Monodelphis domestica UCSC monDom4 Orangutan Pongo abelii UCSC ponAbe2 Owl Monkey Aotus nancymaae NISC Platypus Ornithorhychus anatinus NISC Rabbit Oryctolagus cuniculus NISC Rat Rattus norvegicus UCSC rn4 Rock hyrax Procavia capensis NISC Shrew Sorex araneus NISC Squirrel monkey Saimiri boliviensis boliviensis NISC Squirrel Spermophilus tridecemlineatus NISC Tenrec Echinops telfairi NISC Tree shrew Tupaia belangeri NISC Vervet monkey Chlorocebus aethiops NISC
In full display mode, this track shows pairwise alignments of each species aligned to the human genome. In dense mode, the alignments are depicted using a gray-scale density gradient. The checkboxes in the track configuration section allow the exclusion of species from the pairwise display. To view detailed information about the alignments at a specific position, zoom the display in to 30,000 or fewer bases, then click on the alignment.
The Display chains between alignments configuration option enables display of gaps between alignment blocks in the pairwise alignments in a manner similar to the Chain track display. The following conventions are used:
Discontinuities in the genomic context (chromosome, scaffold or region) of the aligned DNA in the aligning species are shown as follows:
When zoomed-in to the base-level display, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the $organism sequence at those alignment positions relative to the longest non-$organism sequence. If there is sufficient space in the display, the size of the gap is shown. If the space is insufficient and the gap size is a multiple of 3, a "*" is displayed; other gap sizes are indicated by "+".
Codon translation is available in base-level display mode if the displayed region is identified as a coding segment. To display this annotation, select the species for translation from the pull-down menu in the Codon Translation configuration section at the top of the page. Then, select one of the following modes:
Codon translation uses the following gene tracks as the basis for translation, depending on the species chosen. Species listed in the row labeled "None" do not have species-specific reading frames for gene translation.
Gene Track Species Gencode Genes human UCSC Genes mouse Known Genes rat RefSeq Genes chimp Ensembl Genes rhesus, opossum None the remaining 30 species
The binCons score is based on the cumulative binomial probability of detecting the observed number of identical bases (or greater) in sliding 25 bp windows (moving one bp at a time) between the reference sequence and each other species, given the neutral rate at four-fold degenerate sites. Neutral rates are calculated separately at each targeted region. For targets with no gene annotations, the average percent identity across all alignable sequence was instead used to weight the individual species binomial scores; this latter weighting scheme was found to closely match 4D weights. Clusters of bases that exceeded the given conservation score threshold were designated as conserved elements. The minimum length of a conserved element is 25 bases. Strict cutoffs were used: if even one base fell below the conservation score threshold, it separates an element into two distinct regions. Regions reported here exceed a 5% False Discovery Rate threshold, using a window size of 7 bases. More details on binCons can be found in Margulies et. al. (2003) cited below.
The TBA multiple alignments were created by Gayle McEwen & Elliott Margulies of NHGRI.
BinCons was developed by Elliott Margulies (Margulies et al. 2003).
Chai was developed by Steve Parker & Tom Tullius (Boston University), Elliott Margulies(NHGRI) and Loren Hansen (NCBI).
The programs Blastz and TBA, which were used to generate the alignments, were provided by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group.
The phylogenetic tree is based on Murphy et al. (2001).
Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15.
Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002;:115-26.
Greenbaum JA, Pang B, Tullius TD. Construction of a genome-scale structural map at single-nucleotide resolution. Genome Res. 2007 Jun;17(6):947-53.Margulies EH, Blanchette, M, NISC Comparative Sequencing Program, Haussler, D and Green, ED. Identification and characterization of multi-species conserved sequences. Genome Res. 2003 Dec;13(12): 2507-18.
Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, Springer MS. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001 Dec 14;294(5550):2348-51.
Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7.