Heterogeneous Stock QTL Mapping: Genome-wide genetic association of complex traits in outbred mice
William Valdar(1), Leah C. Solberg(2), Dominique Gauguier(1), Stephanie Burnett(1), Paul Klenerman(3), William O. Cookson(1), Martin Taylor(1), J. Nicholas P. Rawlins(4), Richard Mott(1), Jonathan Flint(1).
1 Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford Roosevelt Drive Oxford OX3 7BN UK.
2 Medical College of Wisconsin, HMGC, 8701 Watertown Plank Road, Milwaukee, WI 53226, USA
3 Peter Medawar Building for Pathogen Research, Nuffield Department of Medicine, University of Oxford, Oxford OX1 3SY, UK.
4 Department of Experimental Psychology, University of Oxford, Oxford, UK.
Difficulties in fine-mapping quantitative trait loci (QTLs) are a major impediment to progress in the molecular dissection of complex traits in mice. Here we show that genome-wide high resolution mapping of multiple phenotypes can be achieved using a stock of genetically heterogeneous mice. We developed a conservative and robust bootstrap analysis to map 843 QTLs with an average 95% confidence interval of 2.8 megabases. The QTLs contribute to variation in 97 traits, including models of human disease (asthma, type 2 diabetes mellitus, obesity and anxiety) as well as immunological, biochemical and haemotological phenotypes. The genetic architecture of almost all phenotypes was complex, with many loci each contributing a small proportion to the total variance. Our data set, freely available at http://gscan.well.ox.ac.uk/, provides an entry point to the functional characterization of genes involved in many complex traits.
The project is funded by a Programme Grant from The Wellcome Trust, with investigators Jonathan Flint, Richard Mott, Nick Rawlins, Dominique Gauguier, Bill Cookson, and by the EU FP6 NOE BIOSAPIENS (PI Richard Mott). Key personnel on the project are William Valdar and Leah Solberg.
The aims of the project were to:
Future work includes identifying :
Most mice were microchipped and are named by their barcode, eg
QTLs are mapped to marker intervals, not points. Each marker interval is about 200kb wide on average and is named by the id of the left-hand or proximal marker. The name of the right-hand marker can be determined easily from the genetic or physical map.
The Phenotypes are given as tab-delimited text files with a header, eg.
Each phenotype file always contains a column SUBJECT.NAME, followed by other columns containing phenotype measures (e.g. EMO in this example) and folowed by covariates such as GENDER, Family (defined as sibship, and labelled by the names of the parents), Date.StudyDay etc.
Related phenotypes (e.g. all measures pertaining to a particular test) are in the same file. Within each phenotype file only those covariates with a statistically significant association with the phenotypes are included.
Missing data are labelled
The phenotypes measured include:
Genotype and Genetic Map Data
The Genotypes are given as chromosome-specific text files. Each chromosome is described as a pair of files suitable for input into the R HAPPY package: a ped-format
Marker Selection: 15,360 single nucleotide polymorphisms (SNPs) were selected for genotyping based on their predicted diversity between the HS founders' haplotypes. We obtained genotypes for 13,459 SNPs on 1,904 fully phenotyped mice and 298 parents, with an average of 13,441 genotypes per animal and an accuracy of over 99.9%15. 12,534 SNPs were polymorphic in the founder strains and 11,558 were heterozygous in the HS population, indicating that since the inception of the HS 7.8% of markers have drifted to fixation. The mean minor allele frequency was 30.5% in the founders and 26.7% in the HS. The mean interval between markers was 204.4 Kb, (s.d. 231.2 Kb) and 92.5% of the genome is within 500 Kb of a SNP. However, five intervals are larger than 3 Mb of which the largest (11.3Mb) is on the X chromosome
Genome Scan Database
The results of genome scans for 101 phenotypes are available from gscandb. The models used to fit each phenotype are given in the Model Menu, using the R language model syntax. Most phenotypes were fit using a linear modelexcept for three latency phenotypes that were fitted using a survival modelling framework.