QTL Mapping using MAGIC Arabidopsis Recombinant Inbred Lines

This document describes in detail how to install and run software to map QTLs in the panel of 703 Arabidopsis MAGIC recombinant inbred lines. The methods and software described here are based around 1200 SNP genotypes.
The MAGIC lines and the genotypes are described in
A Multiparent Advanced Generation Inter-Cross to Fine-Map Quantitative Traits in Arabidopsis thaliana Paula X. Kover, William Valdar, Joseph Trakalo, Nora Scarcelli, Ian M. Ehrenreich, Michael D. Purugganan, Caroline Durrant, Richard Mott PLoS Genetics 2009 10.1371/journal.pgen.1000551.
Note that the original 2009 paper used only 527 MAGIC lines and was based on the TAIR8 genome. The data and resources for MAGIC now available on this page have been expanded to 703 genotyped lines and the SNPs remapped to TAIR9/TAIR10.
See also a different way of analysing these data, based on low-coverage sequencing of MAGIC lines.

Identifying natural allelic variation that underlies quantitative trait variation remains a fundamental problem in genetics. Most studies have employed either simple synthetic populations with restricted allelic variation or performed association mapping on a sample of naturally occurring haplotypes. Both of these approaches have some limitations, therefore alternative resources for the genetic dissection of complex traits continue to be sought. Here we describe one such alternative, the Multiparent Advanced Generation Inter-Cross (MAGIC). This approach is expected to improve the precision with which QTL can be mapped, improving the outlook for QTL cloning. Here, we present the first panel of MAGIC lines developed: a set of 527 recombinant inbred lines (RILs) descended from a heterogeneous stock of 19 intermated accessions of the plant Arabidopsis thaliana. These lines and the 19 founders were genotyped with 1,260 single nucleotide polymorphisms and phenotyped for development-related traits. Analytical methods were developed to fine-map quantitative trait loci (QTL) in the MAGIC lines by reconstructing the genome of each line as a mosaic of the founders. We show by simulation that QTL explaining 10% of the phenotypic variance will be detected in most situations with an average mapping error of about 300 kb, and that if the number of lines were doubled the mapping error would be under 200 kb. We also show how the power to detect a QTL and the mapping accuracy vary, depending on QTL location. We demonstrate the utility of this new mapping population by mapping several known QTL with high precision and by finding novel QTL for germination data and bolting time. Our results provide strong support for similar ongoing efforts to produce MAGIC lines in other organisms.

Prerequisites

Create a working directory into which you will download the data.

Genotypes

Download the MAGIC genotypes, formatted for the HAPPY package analysis, into the analysis directory. Unpack them using the commands

 
%gunzip magic.15012010.tar.gz
%tar xvf magic.15012010.tar
%ls chr*

chr1.MAGIC.alleles  chr2.MAGIC.data	chr3.MAGIC.map	    chr5.MAGIC.alleles
chr1.MAGIC.data     chr2.MAGIC.map	chr4.MAGIC.alleles  chr5.MAGIC.data
chr1.MAGIC.map	    chr3.MAGIC.alleles	chr4.MAGIC.data     chr5.MAGIC.map
chr2.MAGIC.alleles  chr3.MAGIC.data	chr4.MAGIC.map

These data are genotypes of 703 MAGIC RILs genotyped at 1513 SNPs, formatted for analysis by the HAPPY R package.

R code

You will need to install the following R packages (make sure the environment variable R_LIBS includes the directory to where these packages are installed).

happy.hbrem, the HAPPY R package
multicore, an R package to enable parallel processing on multi-core servers
g.data delayed data package.

For example, to install the downloaded happy.hbrem package to the directory

R_package_dir

type the command

 R CMD INSTALL -l R_package_dir happy.hbrem_2.4.tar.gz

You will also need to download the following R scripts in the analysis directory

magic.R , happy.preCC.R

Mapping QTLs

Test that everything is installed by starting an R session in the analysis directory:

mus [70]% R 
WARNING: ignoring environment value of R_HOME

R version 2.9.1 (2009-06-26)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> source("magic.R")
Loading required package: g.data
Loading required package: multicore
Loading required package: splines
>

Now you need to build a database containing the HAPPY descent probability matrices. This step need only be done once. In R type the command

> prepare.database()

which should generate the output

./chr1.MAGIC.data TRUE ./chr1.MAGIC.alleles TRUE ./chr1.MAGIC.map TRUE 
mindist: 1e-05
datafile ./chr1.MAGIC.data allelesfile ./chr1.MAGIC.alleles gen 7
genotype phase: unknown
./chr2.MAGIC.data TRUE ./chr2.MAGIC.alleles TRUE ./chr2.MAGIC.map TRUE 
mindist: 1e-05
datafile ./chr2.MAGIC.data allelesfile ./chr2.MAGIC.alleles gen 7
genotype phase: unknown
./chr3.MAGIC.data TRUE ./chr3.MAGIC.alleles TRUE ./chr3.MAGIC.map TRUE 
mindist: 1e-05
datafile ./chr3.MAGIC.data allelesfile ./chr3.MAGIC.alleles gen 7
genotype phase: unknown
./chr4.MAGIC.data TRUE ./chr4.MAGIC.alleles TRUE ./chr4.MAGIC.map TRUE 
mindist: 1e-05
datafile ./chr4.MAGIC.data allelesfile ./chr4.MAGIC.alleles gen 7
genotype phase: unknown
./chr5.MAGIC.data TRUE ./chr5.MAGIC.alleles TRUE ./chr5.MAGIC.map TRUE 
mindist: 1e-05
datafile ./chr5.MAGIC.data allelesfile ./chr5.MAGIC.alleles gen 7
genotype phase: unknown
a->markers 211
Reading phenotype and genotype data from ped file ./chr2.MAGIC.data
a->markers 275
Reading phenotype and genotype data from ped file ./chr1.MAGIC.data
a->markers 251
Reading phenotype and genotype data from ped file ./chr3.MAGIC.data
a->markers 230
Reading phenotype and genotype data from ped file ./chr4.MAGIC.data
a->markers 292
Reading phenotype and genotype data from ped file ./chr5.MAGIC.data
Number of individuals: 703  
Number of markers:     211  
Number of strains:     19   
Use Parents:           no
Number of subjects with two parents: 0    
null model mean nan var nan
assuming haploid(inbred) genotypes
dfile ./chr2.MAGIC.data afile ./chr2.MAGIC.alleles gen 7
Number of individuals: 703  
Number of markers:     230  
Number of strains:     19   
Use Parents:           no
Number of subjects with two parents: 0    
null model mean nan var nan
assuming haploid(inbred) genotypes
dfile ./chr4.MAGIC.data afile ./chr4.MAGIC.alleles gen 7
Number of individuals: 703  
Number of markers:     251  
Number of strains:     19   
Use Parents:           no
Number of subjects with two parents: 0    
null model mean nan var nan
assuming haploid(inbred) genotypes
dfile ./chr3.MAGIC.data afile ./chr3.MAGIC.alleles gen 7
Number of individuals: 703  
Number of markers:     275  
Number of strains:     19   
Use Parents:           no
Number of subjects with two parents: 0    
null model mean nan var nan
assuming haploid(inbred) genotypes
dfile ./chr1.MAGIC.data afile ./chr1.MAGIC.alleles gen 7
Number of individuals: 703  
Number of markers:     292  
Number of strains:     19   
Use Parents:           no
Number of subjects with two parents: 0    
null model mean nan var nan
assuming haploid(inbred) genotypes
dfile ./chr5.MAGIC.data afile ./chr5.MAGIC.alleles gen 7
>

This will create a subdirectory called CONDENSED which contains the R binary versions of the probability matrices, and is used automatically by the subsequent QTL mapping.

Phenotype Data

An example phenotype file MAGIC.phenotype.example.12102015.txt is provided. Make sure your phenotype data file conforms exactly with these specifications:

The file is tab-delimited
The first row contains the column headings.
One column must be labelled SUBJECT.NAME and must contain the names of the MAGIC lines in the format MAGIC.N where N is an integer (eg MAGIC.100). Note that these line designations are the same as those used by the stock centre, if you order seeds.
Missing data are indicated by the symbol NA.

The simplest way to perform the QTL mapping is with the R command

scan.phenotypes(phenotypefile)

This performs the following steps:

Loads the database of proability matrices from ./CONDENSED
Map QTLs for each column in the phenotype file that can be interpreted as numeric
- Performs a genome scan with 1000 permutations to determine genomewide thresholds for statistical significance. The summary statistics for the scans are written as text files named like "phenotype.scan.txt", eg days.to.bolt.scan.txt. A binary R data object with all the scan information is written to "scans.Rdata".
- Find all QTLs where the logP of genetic association is genome-wide significant with a permutation P-value < 0.1 (by default). These are written to text files named phenotype.qtls.txt, eg. days.to.bolt.qtls.txt. Note that at present confidence intervals are not provided: instead the island intervals are given; these are the segments exceeding the genome-wide significance level.
- Estimate founder accession effects at each QTL by multiple imputation. These are written to text files named "phenotype.marker.imputed.txt", and summarised graphically in the file "phenotype.accession.estimates.pdf" (example: days.to.bolt.accession.estimates.pdf.
Plots histograms of phenotype values, by default to the file histogram.pdf.
Plots genome scans, by default to the file scans.pdf. In the plots, chromosome boundaries are indicated by vertical red lines. The permutation-derived genomewide thresholds at 50%, 90% and 95% are indicated y the grey horizontal lines. The positions of QTLs are indicated by orange dots.

The start of the screen output should look like this:

> scan.phenotypes("../MAGIC.phenotype.example.12102015.txt")
loading condensed db ./CONDENSED 
model additive read  1254  matrices
loading genome summary
loading condensed summary
reading phenotype file  ../MAGIC.phenotype.example.04012009.txt 
Analysing numeric phenotypes  bolt.to.flower days.to.bolt days.to.germ leaves.day.28.given.days.to.germ 
plotting histograms of phenotypes to  histogram.pdf 
scanning bolt.to.flower  ~ 1 
426 subjects analysed with phenotypes
Analysis of Variance Table

Response: bolt.to.flower
           Df  Sum Sq Mean Sq F value Pr(>F)
Residuals 425 2623.91    6.17               
 
writing parameter estimates to  bolt.to.flower.MN4_142943.imputed.txt 
 
writing parameter estimates to  bolt.to.flower.FLC_3090.imputed.txt 
plotting estimates to  bolt.to.flower.accession.estimates.pdf 
scanning days.to.bolt  ~ 1 
426 subjects analysed with phenotypes
Analysis of Variance Table

Response: days.to.bolt
           Df  Sum Sq Mean Sq F value Pr(>F)
Residuals 425 16823.6    39.6               
 
writing parameter estimates to  days.to.bolt.MN1_21908389.imputed.txt 
 
writing parameter estimates to  days.to.bolt.MASC02069.imputed.txt 
 
writing parameter estimates to  days.to.bolt.MASC03765.imputed.txt 
 
writing parameter estimates to  days.to.bolt.FRI_2343.imputed.txt 
 
writing parameter estimates to  days.to.bolt.MN5_3177504.imputed.txt 
plotting estimates to  days.to.bolt.accession.estimates.pdf 
scanning days.to.germ  ~ 1 
426 subjects analysed with phenotypes
Analysis of Variance Table

Response: days.to.germ
           Df  Sum Sq Mean Sq F value Pr(>F)
Residuals 425 1376.30    3.24               
scanning leaves.day.28.given.days.to.germ  ~ 1 
426 subjects analysed with phenotypes
Analysis of Variance Table

Response: leaves.day.28.given.days.to.germ
           Df  Sum Sq Mean Sq F value Pr(>F)
Residuals 344 1441.22    4.19               
 
writing parameter estimates to  leaves.day.28.given.days.to.germ.MN4_48812OK.imputed.txt 
 
writing parameter estimates to  leaves.day.28.given.days.to.germ.FRI_1888.imputed.txt 
 
writing parameter estimates to  leaves.day.28.given.days.to.germ.MN4_428535.imputed.txt 
plotting estimates to  leaves.day.28.given.days.to.germ.accession.estimates.pdf 
saving scans to binary file  scans.RData 
plotting scans to  scans.pdf 
writing scans to  bolt.to.flower.scan.txt 
writing qtls to  bolt.to.flower.qtls.txt 
writing scans to  days.to.bolt.scan.txt 
writing qtls to  days.to.bolt.qtls.txt 
writing scans to  days.to.germ.scan.txt 
writing qtls to  days.to.germ.qtls.txt 
writing scans to  leaves.day.28.given.days.to.germ.scan.txt 
writing qtls to  leaves.day.28.given.days.to.germ.qtls.txt

The command options to scan phenotypes are

 scan.phenotypes <- function( phenotype.file, phenotypes=NULL, dir="./CONDENSED", threshold=0.1, permute=1000, histogram.pdf="histogram.pdf", save.file="scans.RData", scan.plot.pdf= "scans.pdf", mc.cores=5)

where

phenotype.file is the name of the phenotype file
phenotypes is an optional list of phenotype names to analyse. By default all columns in the phenotype file that are numeric are analysed.
dir is the directory containing the probability matrices; by default this is ./CONDENSED
threshold is the permutation p-value threshold for genome wide significance to call a QTL.
permute is the number of permutations to perform, default is 1000.
histogram.pdf is the name of the output PDF file containing histogram plots of the phenotypes.
save.file is the name of the R binary file containing all the genome scans.
scan.plot.file is the name of the output PDF file of genome scans.
mc.cores is the number of parallel processes to run. This should be no more than the number of cores on your server. Default value is 5; set to 1 if the processor has only a single core.