Description

This track shows the Zebrafish Zv6 (March 2006) assembly provided by The Wellcome Trust Sanger Institute. The assembly has a sequence coverage of about 6.5-7X and contains 6,653 scaffolds (supercontigs) totaling 1.6 billion base pairs. A tiling path of sequenced clones is provided by the Fingerprinted Contig (FPC) map (data Freeze 12th March 2006). The Whole Genome Shotgun (WGS) assembly that was used to fill the gaps in the tiling path is the same as that used for the Zv5 assembly. 18,969,500 (92%) of the 20,541,433 WGS reads were placed in the assembly. This set of reads includes 6,882,050 from a new library that was generated from a single Tuebingen, doubled haploid fish.

In dense mode, this track depicts the path through the draft and finished clones (aka the golden path) used to create the assembled sequence. Clone boundaries are distinguished by the use of alternating gold and brown coloration. Where gaps exist in the path, spaces are shown between the gold and brown blocks. If the relative order and orientation of the contigs between the two blocks is known, a line is drawn to bridge the blocks.

The Genome Browser depicts the zebrafish genome as 25 chromosomes consisting of whole genome shotgun (WGS) supercontigs that were mapped to a fingerprinted contig (FPC) and were from a known chromosome. There are also 2 unordered virtual chromosomes:

The virtual chromosomes contain 500 bp scaffold gaps that are shown in the Gap track annotation. All the unplaced scaffolds from chrFinished (in assembly Zv3, danRer1) have been mapped to the chromosomes since the Zv4 assembly (danRer2).

All components within this track are of fragment type "W" (WGS contig) except for chrM which is type "F" (Finished).

Methods

This assembly was constructed using the assembler, Phusion, to cluster reads. Phrap was then used for cluster assembly and consensus generation. Supercontigs or scaffolds were created from contigs joined together based on read-pair information where the sizes of gaps were estimated by using insert sizes of different lengths. For the clone-based mapping and finishing, clones from different libraries were fingerprinted by digestion with the HindIII restriction enzyme. From the information produced in this way, overlapping clones were linked into FPCs. Next, clones from a tiling path through the FPC contigs were selected for high quality sequencing. The resulting sequence was submitted to EMBL/GenBank. Clone sequences and WGS contigs were integrated by considering sequence alignments, BAC end placements and zebrafish cDNAs and markers. Improvements to the integration algorithm allowed the placement of the WGS contigs that contained markers but could not be linked to the FPC contigs. In cases where markers from different chromosomes appear on the same contig, priority has been given to the Heat Shock Diploid Cross Panel (HS) and the Boston MGH Cross Map (MGH). Some of these discrepancies are due to misassemblies but there may also be inconsistencies between the zebrafish marker panels.

The supercontigs tied to the FPC map create the assembly shown in this track. 1.547 Gigabases or 95% of the sequence could be tied to the FPC map. The finished clone sequence was then analyzed via a pipeline that included repeatmasking, ab initio gene prediction and blast searches against all protein, EST and cDNA sequences that were available. Results from this analysis were used to manually annotate clones with gene structures, descriptions and poly-A features. At this point, the clone was submitted to EMBL/GenBank again and can be browsed in Vega.

Credits

The Zv6 Zebrafish assembly was produced by The Wellcome Trust Sanger Institute, in collaboration with the Max Planck Institute for Developmental Biology, the Netherlands Institute for Developmental Biology (Hubrecht Laboratory), and Yi Zhou, Anthony DiBiase and Leonard Zon from the Boston Children's Hospital.