Heterogeneous Stock QTL Mapping: Genome-wide genetic association of complex traits in outbred mice

William Valdar(1), Leah C. Solberg(2), Dominique Gauguier(1), Stephanie Burnett(1), Paul Klenerman(3), William O. Cookson(1), Martin Taylor(1), J. Nicholas P. Rawlins(4), Richard Mott(1), Jonathan Flint(1).

1 Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford Roosevelt Drive Oxford OX3 7BN UK.

2 Medical College of Wisconsin, HMGC, 8701 Watertown Plank Road, Milwaukee, WI 53226, USA

3 Peter Medawar Building for Pathogen Research, Nuffield Department of Medicine, University of Oxford, Oxford OX1 3SY, UK.

4 Department of Experimental Psychology, University of Oxford, Oxford, UK.

Difficulties in fine-mapping quantitative trait loci (QTLs) are a major impediment to progress in the molecular dissection of complex traits in mice. Here we show that genome-wide high resolution mapping of multiple phenotypes can be achieved using a stock of genetically heterogeneous mice. We developed a conservative and robust bootstrap analysis to map 843 QTLs with an average 95% confidence interval of 2.8 megabases. The QTLs contribute to variation in 97 traits, including models of human disease (asthma, type 2 diabetes mellitus, obesity and anxiety) as well as immunological, biochemical and haemotological phenotypes. The genetic architecture of almost all phenotypes was complex, with many loci each contributing a small proportion to the total variance. Our data set, freely available at http://gscan.well.ox.ac.uk/, provides an entry point to the functional characterization of genes involved in many complex traits.

The project is funded by a Programme Grant from The Wellcome Trust, with investigators Jonathan Flint, Richard Mott, Nick Rawlins, Dominique Gauguier, Bill Cookson, and by the EU FP6 NOE BIOSAPIENS (PI Richard Mott). Key personnel on the project are William Valdar and Leah Solberg. #

View and interact with the data at gscan.well.ox.ac.uk/gs/wwwqtl.cgi Download the raw data from the links below.

The aims of the project were to:

develop a phenotyping protocol suitable for quantitative trait mapping. The protocol is described in detail in Solberg LC, Valdar W, Gauguier D, Nunez G, Taylor A, Burnett S, Arboledas-Hita C, Hernandez-Pliego P, Davidson S, Burns P, Bhattacharya S, Hough T, Higgs D, Klenerman P, Cookson WO, Zhang Y, Deacon RM, Rawlins JN, Mott R, Flint J. (2006) A protocol for high-throughput phenotyping, suitable for quantitative trait analysis in mice. Mamm Genome. 17(2) :129-46.
Whole-genome genetic association studies in outbred mouse populations represent a novel approach to identifying the molecular basis of naturally occurring genetic variants, the major source of quantitative variation between inbred strains of mice. Measuring multiple phenotypes in parallel on each mouse would make the approach cost effective, but protocols for phenotyping on a large enough scale have not been developed. In this article we describe the development and deployment of a protocol to collect measures on three models of human disease (anxiety, type II diabetes, and asthma) as well as measures of mouse blood biochemistry, immunology, and hematology. We report that the protocol delivers highly significant differences among the eight inbred strains (A/J, AKR/J, BALBc/J, CBA/J, C3H/HeJ, C57BL/6J, DBA/2J, and LP/J), the progenitors of a genetically heterogeneous stock (HS) of mice.We report the successful collection of multiple phenotypes from 2000 outbred HS animals. The phenotypes measured in the protocol form the basis of a large-scale investigation into the genetic basis of complex traits in mice designed to examine interactions between genes and between genes and environment, as well as the main effects of genetic variants on phenotypes.
phenotype a population of approx 2500 HS mice with the protocol. The raw phenotype data are available for download.
genotype the mice at high resolution across the genome using a set of 15360 SNPs. The raw genotype data are available for download . These SNPs are identical to those used in the Wellcome-CTC Mouse Strain SNP Genotype Set.
construct a high-resolution genetic map of the mouse from the final three generations of breeding. .
fine-map Quantitative Trait Loci. The genome scans for the data are accessible via the gscandb browser. We have mapped 843 QTL with sub-centimorgan mapping resolution in many cases.

Future work includes identifying :

pleitropic loci
epistatic loci
gene x environment interactions

Details

Naming conventions

Most mice were microchipped and are named by their barcode, eg A048005080. The exceptions are some HS parents that are named according to cage, eg H2.3:G2.2(3). Mouse families are defined at the level of sibship, and named as "Mother Father".

QTLs are mapped to marker intervals, not points. Each marker interval is about 200kb wide on average and is named by the id of the left-hand or proximal marker. The name of the right-hand marker can be determined easily from the genetic or physical map.

Phenotype Data

The Phenotypes are given as tab-delimited text files with a header, eg.

SUBJECT.NAME	EMO	GENDER	Family	Date.StudyDay	Date.Month	Date.Year
A048005080	0.2607	F	H2.3:G2.2(3) H2.3:C5.2(3)	113	5	2003
A048005112	-0.28775	F	H2.2:G3.1(3) H2.2:C3.1(4)	99	4	2003
A048006063	-0.38815	M	E5.2:D4.1(4) E5.2:H5.1(4)	71	3	2003
A048006555	0.06405	M	E1.3:D1.2(3) E1.3:H1.2(3)	92	4	2003

Each phenotype file always contains a column SUBJECT.NAME, followed by other columns containing phenotype measures (e.g. EMO in this example) and folowed by covariates such as GENDER, Family (defined as sibship, and labelled by the names of the parents), Date.StudyDay etc.

Related phenotypes (e.g. all measures pertaining to a particular test) are in the same file. Within each phenotype file only those covariates with a statistically significant association with the phenotypes are included.

Missing data are labelled NA.

The phenotypes measured include:

Behavioural:
- Open Field Test OFT.txt
- Elevated Plus Maze EPM.txt
- Food Neophagia FN.txt
- Fear Potentiated Startle FPS.txt
- Context Freezing Context.txt
- Cue Conditioning Cue.txt
- Burrowing Burrowing.txt
- Home Cage Activity PAS.txt
Diabetes-related
- Glucose Tolerance Test Glucose.txt
- Insulin Levels InsulinAndDerivedMeasures.txt
- Weight Weight.txt
- Obesity Obesity.txt
- Adrenal Weight AdrenalWeight.txt
Asthma-related
- Plethysmograph Lung Function Plethysmograph.txt
Immunology (CD8, CD4, CD3, B220) Immunology.txt
Haematolology ( ,white blood cell count ,red blood cell count ,hemoglobin ,gemtocrit ,mean cell volume ,mean cell hemoglobin ,platelets ,lymphocytes ,monocytes ,neutrophils ,basophils ,eosinophils) Haematology.txt
Biochemistry ( Creatinine, Chlorides, HDL, LDL, Protein, Sodium, ALT, ALP, AST, Glucose, Phosphorus, Potassium, Cholesterol, Triglycerides, Urea) Biochemistry.txt
Coat Colour EPM.txt

Genotype and Genetic Map Data

The Genotypes are given as chromosome-specific text files. Each chromosome is described as a pair of files suitable for input into the R HAPPY package: a ped-format .data file that contains the HS genotypes and a HAPPY format .alleles file that contains the HS founder genotype information. Full file format details are available. Missing genotypes are coded as NA. the map is based on the build34 sequence map.

The Genetic map for each chromosome is also available with file extension ".map" the complete map is available as build34.genetic.map and the build34 sequence map as build34.physical.map

Marker Selection: 15,360 single nucleotide polymorphisms (SNPs) were selected for genotyping based on their predicted diversity between the HS founders' haplotypes. We obtained genotypes for 13,459 SNPs on 1,904 fully phenotyped mice and 298 parents, with an average of 13,441 genotypes per animal and an accuracy of over 99.9%15. 12,534 SNPs were polymorphic in the founder strains and 11,558 were heterozygous in the HS population, indicating that since the inception of the HS 7.8% of markers have drifted to fixation. The mean minor allele frequency was 30.5% in the founders and 26.7% in the HS. The mean interval between markers was 204.4 Kb, (s.d. 231.2 Kb) and 92.5% of the genome is within 500 Kb of a SNP. However, five intervals are larger than 3 Mb of which the largest (11.3Mb) is on the X chromosome

Genome Scan Database

The results of genome scans for 101 phenotypes are available from gscandb. The models used to fit each phenotype are given in the Model Menu, using the R language model syntax. Most phenotypes were fit using a linear modelexcept for three latency phenotypes that were fitted using a survival modelling framework.

Contact Richard Mott Jonathan Flint or William Valdar for more details.