Wellcome-CTC Mouse Strain SNP Genotype Set

(3rd August 2005)

This page contains information related to our project to genotype Recombinant Inbred Lines and Inbred Lines across 15360 SNPs.

All the genotyping was perfomed by Illumina, San Diego (thanks to Luana Galver, Sandy McBean).

Note that all the genotype files include their creation date as part of their name.

LAST CHANGE TO GENOTYPE FILES WAS ON 3rd August 2005

10052005: The data have now ben remapped onto Build 34 of the mouse genome. You can access the data relative to Build 33 or Build34. You can view the positions of the remapped SNPs here

03082005: We now have genotype data for an additional 10 strains CAST/Ei CASA/RkJ CIM CTP CZECHI/EiJ MAI MBT MOLC/RkJ MOLD/RkJ MOLFEiJ MSM/Ms MusSpretusOutbred1 MusSpretusOutbred2 PANCEVO/EiJ PWK/Rbrc PWK/Pas PWK/PhJ PWK/Ros SKIVE/EiJ SPRET/EiJ. These are mostly wild-derived strains with lower than expected genotype pass rates. It is therefore highly likely that the error rates for these strains are significantly higher than for the other data, and so should be treated with caution. To help the community to make the best use of this data, the genotypes for these strains are also available as a separate text file wild.genotypes_quality_lowcased.03082005.txt.gz that includes the Illumina-computed quality scores [the score score lie between 0 (bad) and 1 (good)]. Genotypes with quality measures below 0.8 are indicated in lower case. In addition the data are available from the snp selector in the usual way.

There are genotypes available for 480 strains and 13370 successful SNP assays that are mapped to build34 of the mouse genome, including 107 SNPs that are mapped to "random" unanchored sequence

13374 SNPs are mapped onto Build 33 of the mouse genome..

We have performed some basic error checking and have not discovered any major problems, but please take note that:

This is a beta release of the data and is subject to revision.
The strand on which genotypes for a given SNP are reported in the flat file download files may not be the same as that reported for the same SNP in other published data sets, and in particular may be different from that in dbSNP. Note however, that if you use the web interface to donwload selected SNPs they are always reported relative to the positive strand of the selected genome build.
We have checked the standard inbred strains provided by the Jackson Laboratory with the published genotypes, and they agree almost perfectly, once strand differences are taken into account. Gary Churchill and Lei Wu have also compared the genotypes for 42 strains and 5089 SNPs with those reported by Merk, and found 99.959% agreement between the two sets (ignoring missing data). The genotying accuracy, based on (a) duplicated samples (b) family data (from the HS mice) indicates the accuracy is over 99.98%
The error rates for the "rescued" 20 strains listed above may be substantially higher.
The haplotype block structure for the Recombinant Inbred Lines in the set shows well-defined blocks, indicating that the genotypes are consistent with each other.
So far we have NOT checked non-Jax strains (including the RIL) against published genotypes, so we can't rule out the possibility of a mix-up of samples. Therefore, if you suspect that anything is wrong, PLEASE LET US KNOW.
If a strain that you expected to be on the list is missing PLEASE LET US KNOW.

Genotype Data

NEW The genotypes can be accessed via a web interface which lets one select genotypes for any of the genotyped strains, either for a single chromosome or across the genome, and optionally restricts output to just those SNPs that are polymorphic between the selected strains. Graphical output is also available.

Alternatively the data can be downloaded as a series of chromosome-specific compressed text files by following these links:

(a) Build34

chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX

Everything (compressed tarball)

(b) Build33 (use is deprecated)

chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chrX

Everything (compressed tarball)

The file format is space-separated text, with one row of data per strain. The first column gives the strain name. The remaining columns are the genotypes in the marker order specified by the SNP names in the first row of the file.

The data are also available in 3 files, transposed so that each row corresponds to one marker and each column to one strain.These files are small enough to view in Excel.

(a) Build34

Strains 1 to 199, Strains 200 to 399, Strains 400 and above.

(b) Build33

Strains 1 to 199, Strains 200 to 399, Strains 400 and above.

Haplotype Structure

Haplotype structure of Recombinant Inbred Lines inferred from the data.

Haplotype structure of all other Contributed strains relative to C57BL/6J.

Sample Information

List of lines and strains sent to Illumina for genotyping.

List in csv text format.

Note that the following samples either failed completely or gave higher than expected failure rates. Many of the latter category are wild-derived mice, where presumably the presence of unknown variants in the flanking sequences caused additional failures.

Sample ID Explanation

BXD82 Failed sample

CAST/Ei Sample had higher than average failure rate

CASA/RkJ Sample had higher than average failure rate

CIM Sample had higher than average failure rate

CTP Sample had higher than average failure rate

CZECHI/EiJ Sample had higher than average failure rate

CZECH11/Ei(35213) Failed sample; low concentration

FVBS/Ant Failed sample

JF1/MS(35242) Failed sample; low concentration

L6 Failed sample

MAI Sample had higher than average failure rate

MBT Sample had higher than average failure rate

MOLC/RkJ Sample had higher than average failure rate

MOLD/RkJ Sample had higher than average failure rate

MOLFEiJ Sample had higher than average failure rate

MSM/Ms Sample had higher than average failure rate

MUSpaha Failed sample

MusSpretusOutbred2 Sample had higher than average failure rate

MusSpretusOutbred1 Sample had higher than average failure rate

NIH/Ola Failed sample

PANCEVO/EiJ Sample had higher than average failure rate

PWD/PhJ(36186) Failed sample; low concentration

PWK/Rbrc Sample had higher than average failure rate

PWK/Pas Sample had higher than average failure rate

PWK/PhJ Sample had higher than average failure rate

PWK/Ros Sample had higher than average failure rate

SKIVE/EiJ Sample had higher than average failure rate

SPRET/EiJ Sample had higher than average failure rate

strain1050185 Failed sample; low concentration

SNP Information

The SNPs were selected as follows. Where possible we used validated SNPs known to be polymorphic on at least some of the eight strains A/J, AKR BALB/cJ, DBA2/J, C57BL/6J, LP/J, I, RIIIS/J, although in many cases we did not have full strain distribution data. About 7000 SNPs were contributed by GNF (Tim Wiltshire), and 7000 by Merck (Eric Schadt), and 1600 by JAX (Petko Petkov). We thinned out SNPs closer than 50kb with identical strain distribution patterns. We then identified all gaps > 500kb and looked for SNPs to fill them . We used Celera, Czech and Affy SNPs to do this (provided by Mark Daly and Rob Williams). We only included SNPs that mapped uniquely to Build33 of the mouse assembly according to BLAT (thanks to Martin Taylor).

We added a few special SNPs that determine the MHC alleles, the tyrosinase and agouti loci, and the mitochondrion. We included SNPS mapping to unordered chromosomal fragments (like 7_random) because these are likely to become part of the assembly in the future.

The resulting set of SNPs is fairly uniformly distributed on Build 33, see final-9-2-5.space.txt. When the next build comes out no doubt much of this careful work will be undone, and new gaps will appear, but this is the best we could do at the moment. Making this selection was surprisingly time-consuming and difficult. In particular filling the gaps was hard. Whether these gaps really are regions with few SNPs, or are caused by errors in the mouse genome assembly, or are caused by SNP ascertainment problems, remains to be seen.

List of SNPS that produced successful assays. These SNPS have also been typed across 2300 HS mice to fine map multiple QTL in parallel.

List of SNPs with "Correct" strand. This file gives the Illumina reported alleles and strand, consistent with the genotypes reported here, and which is often different from the original submitted strand.

Original list of SNPs submitted for genotyping, a comma-separated text file in Illumina format. The columns are: SNP_Name,Sequence,Genome_Build_Version,Chr,Coordinate,Source,dbSNP_Version,Ploidy,Species,Customer_Strand). Note that all SNPs have been remapped onto Build 33 of the mouse genome, and where possible renamed by their dbsnp rs number if that exists. The original SNP name is included in the source information.

Coordinates start at 1, ie follow the DBSNP convention, which is different from UCSC coordinates which start at 0

NEW: Mapping of SNPs onto build 34 of the mouse genome

Spacing of selected SNPS.

Conditions of Use

These data are freely available.
There are no constraints on the use of the data, but if you redistribute it include a reference to the Wellcome-CTC Mouse Strain SNP Genotype Set http://mus.well.ox.ac.uk/mouse/INBREDS in your distribution.
If you make use of the data please reference the Wellcome-CTC Mouse Strain SNP Genotype Set http://mus.well.ox.ac.uk/mouse/INBREDS in any publications.
We do not provide any guarantee that the data are correct. CAVEAT EMPTOR.

Acknowledgements

SNP Information

Many thanks to Tim Wiltshire, Mathew Pletcher (GNF) , Eric Schadt (Rosetta/Merck), Petko Petkov (JAX), Mark Daly/Andrew Kirby (MIT/Broad), Rob Williams, Weikuan Gu, Lu Lu, Yan Jioa (University of Tennessee Health Science Center (supported by P20-MH 62009 and U24AA13513)), Chistophe Benoist (Harvard) for providing SNP information. (The source of each SNP is indicated in the file)

Mouse DNA samples

Many thanks to the following people for providing Mouse DNA samples: Christophe Benoist, Chris Ebeling, Beth Bennett, Lu Lu, Daniel Pomp, David Keays, Robert Reis, Grant Morahan, Gudrun Brockmann, Hiroke Nagase, Howard Gershenfeld, Jim Cheverud, Jimmy Spearow, Jonathan Flint, Kathy Hood, Molly Bogue/Susan Deveau, Morley/Haywood, Peter Demant, Petko Petkov, Rob Williams, Simon Horvat, Steve Clapcote, Xavier Montagutelli.

Funding

And thanks to the following for providing financial support: The Wellcome Trust (who provided the bulk of the funding), James Cheverud: SMXLG (R24RR015116), Gary A. Churchill: SJXL (R01GM072863), Kent Hunter: AKXD (NIH intramural support), Lu Lu and Beth Bennett: LXS (U01AA014425), Richard S. Nowakowski: CXB (R01NS049445), Robert W. Williams: AXB/BXA, BXD, BXH, and miscellaneous samples (P20-MH 62009 and U24AA13513)

Contact Richard Mott or Jonathan Flint