Description

This track displays variant base calls from several personal genomes that have been made publicly available: Craig Venter, James Watson, Anonymous Yoruba individual NA18507, Anonymous Han Chinese individual YH, Seong-Jim Kim (SJK), and 4 individuals from the 1000 Genome Project high-coverage pilot: a CEU daughter and parents (NA12878, NA12891, NA12892) and a YRI daughter (NA19240).

Display Conventions and Configuration

In the genome browser, when viewing the forward strand of the reference genome (the normal case), the displayed alleles are relative to the forward strand. When viewing the reverse strand of the reference genome ("reverse" button), the displayed alleles are reverse-complemented to match the reverse strand. When read frequency data are available, they are displayed in the mouseover text (e.g. "T:8 G:3" means that 8 reads contained a T and 3 reads contained a G at that base position) and box colors are used to show the proportion of alleles.

On the details page for each variant, the alleles are given for the forward strand of the reference genome. Frequency data are shown when available.

Methods

Variants were originally mapped to the Mar. 2006 (hg18, NCBI36) human genome assembly. Their locations were translated into GRCh37 (hg19) coordinates using the liftOver program and the mapping file hg18ToHg19.over.chain.gz. Homozygous matches to the GRCh37 reference were removed.

Craig Venter (JCVI) (Levy et al.)
An overview is given here. This subtrack contains Venter's single-base and multi-base variants and small (< 100 bp) insertions/deletions from the file HuRef.InternalHuRef-NCBI.gff, filtered to include only method 1 variants (variant was kept in its original form and not post-processed), and to exclude any variants that had N as an allele. JCVI hosts a genome browser.

James Watson (CSHL) (Wheeler et al.)
These single-base variants came from the file watson_snp.gff.gz.
CSHL hosts a genome browser.

Yoruba NA18507 (Illumina/Solexa) (Illumina Cambridge/Solexa) (Bentley et al.)
Illumina released the read sequences to the NCBI Short Read Archive. Aakrosh Ratan in the Miller Lab at Pennsylvania State University (PSU) mapped the sequence reads to the reference genome and called single-base variants and small insertions/deletions (< 20 bp) using MAQ.

YH (YanHuang Project) (Wang et al.)
The YanHuang Project released these single-base variants from the genome of a Han Chinese individual. The data are available from the YH database in the file yhsnp_add.gff. The YanHuang Project hosts a genome browser.

SJK (GUMS/KOBIC) (Ahn et al.)
Researchers at Gachon University of Medicine and Science (GUMS) and the Korean Bioinformation Center (KOBIC) released these single-base variants from the genome of Seong-Jin Kim. The data are available from KOBIC in the file KOREF-solexa-snp-X30_Q40d4D100.gff.

CEU trio NA12878, NA12891, NA12892; YRI daughter NA19240 (1000 Genomes) (1000 Genomes)
The variants shown are from the 1000 Genomes Project's December 2008 release. The base calls were taken from more recent 1000 Genomes read alignments (released in July and August 2009). The CEU variant calls were based on sequence data from the Wellcome Trust Sanger Insititute and the Broad Institute, using the Illumina/Solexa platform. For more information on the recalibration, mapping and variant calling, see the CEU trio release README file. The YRI daughter calls were based on sequence data from the Baylor College of Medicine Human Genome Sequencing Center and Applied Biosystems, using the SOLiD platform. For more information on the mapping, variant calling, filtering and validation, see the YRI README file. The variant calls are available from the December 2008 release subdirectory of the 1000 Genomes Project Data Coordination Center (DCC) at the European Bioinformatics Institute; there is also a mirror of the DCC at NCBI, more efficient for users in the US, Oceania and East Asia.

Credits

Variants shown in this track were determined by JCVI, CSHL, Illumina Cambridge (formerly Solexa), Aakrosh Ratan at PSU, the YanHuang Project, the 1000 Genomes Project, GUMS and KOBIC. Thanks to Belinda Giardine at PSU for collecting the data and loading them into the UCSC database.

References

Craig Venter (JCVI)
Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, et al. The diploid genome sequence of an individual human. PLoS Biol. 2007 Sep 4;5(10):e254.

James Watson (CSHL)
Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ, Makhijani V, Roth GT, et al. The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008 Apr 17;452(7189):872-6.

Yoruba NA18507 (Illumina Cambridge/Solexa)
Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008 Nov 6;456(7218):53-9.

YH (YanHuang Project)
Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J, et al. The diploid genome sequence of an Asian individual. Nature. 2008 Nov 6;456(7218):60-5.

SJK
Ahn SM, Kim TH, Lee S, Kim D, Ghang H, Kim DS, Kim BC, Kim SY, Kim WY, Kim C, et al. The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. Genome Res. 2009 Sep;19(9):1622-9.

CEU trio NA12878, NA12891, NA12892; YRI daughter NA19240 (1000 Genomes)
Analysis is underway for a manuscript on the pilot project; until publication, please see http://1000genomes.org/ (See also the Science and Nature Biotechnology news articles describing the project.)