Description

This track shows the Zebrafish Zv4 (June 2004) assembly provided by The Wellcome Trust Sanger Institute. The assembly has a sequence coverage of about 6.5-7X and contains 21,333 scaffolds (supercontigs) totaling 1.6 billion base pairs. 18,969,500 (92%) of the 20,541,433 WGS reads were placed in the assembly. In this Zv5 assembly, this set of reads includes 6,882,050 from a new library that was generated from a single Tuebingen, double haploid fish.

In dense mode, this track depicts the path through the draft and finished clones (aka the golden path) used to create the assembled sequence. Clone boundaries are distinguished by the use of alternating gold and brown coloration. Where gaps exist in the path, spaces are shown between the gold and brown blocks. If the relative order and orientation of the contigs between the two blocks is known, a line is drawn to bridge the blocks.

The Genome Browser depicts the zebrafish genome as 25 chromosomes consisting of whole genome shotgun (WGS) supercontigs that were mapped to a fingerprinted contig (FPC) and were from a known chromosome. There are also 2 unordered virtual chromosomes:

chrNA - WGS contigs that could not be related to any FPC contig
chrUn - WGS supercontigs that mapped to FPC contigs, but the chromosome is unknown

The virtual chromosomes contain 500 bp scaffold gaps that are shown in the Gap track annotation. All the unplaced scaffolds from chrFinished (in assembly Zv3, danRer1) have been mapped to the chromosomes since the Zv4 assembly (danRer2).

All components within this track are of fragment type "W" (WGS contig) except for chrM which is type "F" (Finished).

Methods

This assembly was constructed using the assembler, Phusion, to cluster reads. Phrap was then used for cluster assembly and consensus generation. Supercontigs or scaffolds were created from contigs joined together based on read-pair information where the sizes of gaps were estimated by using insert sizes of different lengths. For the clone-based mapping and finishing, clones from different libraries were fingerprinted by digestion with the HindIII restriction enzyme. From the information produced in this way, overlapping clones were linked into FPCs. Next, clones from a tiling path through the FPC contigs were selected for high quality sequencing. The resulting sequence was submitted to EMBL/GenBank.

The supercontigs tied to the FPC map create the assembly shown in this track. 1.200 Gigabases or 74% of the sequence could be tied to the FPC map. The finished clone sequence was then analyzed via a pipeline that included repeatmasking, ab initio gene prediction and blast searches against all protein, EST and cDNA sequences that were available. Results from this analysis were used to manually annotate clones with gene structures, descriptions and poly-A features. At this point, the clone was submitted to EMBL/GenBank again and can be browsed in Vega.

Credits

The Zv5 Zebrafish assembly was produced by The Wellcome Trust Sanger Institute, in collaboration with the Max Planck Institute for Developmental Biology, the Netherlands Institute for Developmental Biology (Hubrecht Laboratory), and Yi Zhou, Anthony DiBiase and Leonard Zon from the Boston Children's Hospital.