This track displays variant base calls from several personal genomes that have been made publicly available: Craig Venter, James Watson, Anonymous Yoruba individual NA18507, Anonymous Han Chinese individual YH, Seong-Jim Kim (SJK), and 4 individuals from the 1000 Genome Project high-coverage pilot: a CEU daughter and parents (NA12878, NA12891, NA12892) and a YRI daughter (NA19240).
In the genome browser, when viewing the forward strand of the reference genome (the normal case), the displayed alleles are relative to the forward strand. When viewing the reverse strand of the reference genome ("reverse" button), the displayed alleles are reverse-complemented to match the reverse strand. When read frequency data are available, they are displayed in the mouseover text (e.g. "T:8 G:3" means that 8 reads contained a T and 3 reads contained a G at that base position) and box colors are used to show the proportion of alleles.
On the details page for each variant, the alleles are given for the forward strand of the reference genome. Frequency data are shown when available.
Variants were originally mapped to the Mar. 2006 (hg18, NCBI36) human genome assembly. Their locations were translated into GRCh37 (hg19) coordinates using the liftOver program and the mapping file hg18ToHg19.over.chain.gz. Homozygous matches to the GRCh37 reference were removed.
Craig Venter (JCVI)
(Levy et al.)
An overview is given
here.
This subtrack contains Venter's single-base and multi-base variants
and small (< 100 bp) insertions/deletions from the file
HuRef.InternalHuRef-NCBI.gff,
filtered to include only method 1 variants (variant was kept in its original
form and not post-processed), and to exclude any variants that had N as an allele.
JCVI hosts a
genome browser.
James Watson
(CSHL)
(Wheeler et al.)
These single-base variants came from the file
watson_snp.gff.gz.
CSHL hosts a
genome browser.
Yoruba NA18507 (Illumina/Solexa)
(Illumina Cambridge/Solexa)
(Bentley et al.)
Illumina released the read sequences to the
NCBI Short Read Archive.
Aakrosh Ratan in the Miller Lab at Pennsylvania State University (PSU)
mapped the sequence reads to the reference genome and called
single-base variants and small insertions/deletions (< 20 bp) using
MAQ.
YH (YanHuang Project)
(Wang et al.)
The YanHuang Project released these single-base variants from the
genome of a Han Chinese individual.
The data are available from the
YH database in the file
yhsnp_add.gff.
The YanHuang Project hosts a
genome browser.
SJK (GUMS/KOBIC)
(Ahn et al.)
Researchers at Gachon University of Medicine and Science (GUMS)
and the Korean Bioinformation Center (KOBIC)
released these single-base variants from the genome of Seong-Jin Kim.
The data are available from
KOBIC
in the file
KOREF-solexa-snp-X30_Q40d4D100.gff.
CEU trio NA12878, NA12891, NA12892; YRI daughter NA19240
(1000 Genomes)
(1000 Genomes)
The variants shown are from the 1000 Genomes Project's December 2008
release.
The base calls were taken from more recent 1000 Genomes read alignments
(released in July and August 2009).
The CEU variant calls were based on sequence data from the
Wellcome Trust Sanger
Insititute and the
Broad Institute, using the Illumina/Solexa platform.
For more information on the recalibration, mapping and variant calling,
see the
CEU trio release README file.
The YRI daughter calls were based on sequence data from the
Baylor College of Medicine
Human Genome Sequencing Center and
Applied Biosystems, using the SOLiD platform.
For more information on the mapping, variant calling, filtering and
validation, see the
YRI README file.
The variant calls are available from the December 2008 release subdirectory
of the 1000 Genomes Project
Data Coordination Center (DCC) at the
European Bioinformatics Institute;
there is also a
mirror of the DCC at NCBI, more efficient for users in
the US, Oceania and East Asia.
Variants shown in this track were determined by JCVI, CSHL, Illumina Cambridge (formerly Solexa), Aakrosh Ratan at PSU, the YanHuang Project, the 1000 Genomes Project, GUMS and KOBIC. Thanks to Belinda Giardine at PSU for collecting the data and loading them into the UCSC database.
Craig Venter (JCVI)
Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N,
Huang J, Kirkness EF, Denisov G, et al.
The diploid genome sequence of an individual human.
PLoS Biol. 2007 Sep 4;5(10):e254.
James Watson (CSHL)
Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W,
Chen YJ, Makhijani V, Roth GT, et al.
The complete genome of an individual by massively parallel
DNA sequencing.
Nature. 2008 Apr 17;452(7189):872-6.
Yoruba NA18507 (Illumina Cambridge/Solexa)
Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG,
Hall KP, Evers DJ, Barnes CL, Bignell HR, et al.
Accurate whole human genome sequencing using reversible
terminator chemistry.
Nature. 2008 Nov 6;456(7218):53-9.
YH (YanHuang Project)
Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J,
Zhang J, et al.
The diploid genome sequence of an Asian individual.
Nature. 2008 Nov 6;456(7218):60-5.
SJK
Ahn SM, Kim TH, Lee S, Kim D, Ghang H, Kim DS, Kim BC, Kim SY, Kim WY, Kim C,
et al.
The first Korean genome sequence and analysis: full genome
sequencing for a socio-ethnic group.
Genome Res. 2009 Sep;19(9):1622-9.
CEU trio NA12878, NA12891, NA12892; YRI daughter NA19240 (1000 Genomes)
Analysis is underway for a manuscript on the pilot project; until publication,
please see
http://1000genomes.org/
(See also the Science and
Nature Biotechnology
news articles describing the project.)