\documentclass[a4paper,11pt]{article} \usepackage[margin=1.0in]{geometry} %\addtolength{\oddsidemargin}{-.875in} %\addtolength{\evensidemargin}{-.875in} %\addtolength{\textwidth}{1.75in} %\addtolength{\topmargin}{-.875in} %\addtolength{\textheight}{1.75in} \usepackage[controls]{animate} \usepackage[colorlinks=true,urlcolor=orange]{hyperref} \usepackage{attachfile} \usepackage{multirow} \title{Networks} \author{Tota Juliusdottir} \begin{document} \SweaveOpts{eval=T,echo=F, width=6, height=4, keep.source=T} %applies to all chunks after this statement, sets the default for all code chunks \setlength{\parindent}{0pt} \maketitle %To make the R code prettier, limit the line length and get rid of "+" <>= options(width=60, continue=" ") @ %sweave customisation for R output %\DefineVerbatimEnvironment{Sinput}{Verbatim} {xleftmargin=2em} %\DefineVerbatimEnvironment{Soutput}{Verbatim}{xleftmargin=2em} %\DefineVerbatimEnvironment{Scode}{Verbatim}{xleftmargin=2em} \fvset{listparameters={\setlength{\topsep}{0pt}}} \renewenvironment{Schunk}{\vspace{\topsep}}{\vspace{\topsep}} \section{Data Overview}\label{overview} All relevant in- and output files are in \textbf{MUS:/data/www/networks/} \begin{description} \item[A modules]: @MUS:/data/www/networks/\textbf{modules/A}\\ The first set of Jon's modules: \begin{itemize} \item hippo \item lung \item liver \end{itemize} \item[B modules]: @MUS:/data/www/networks/\textbf{modules/B} \\ The second set of Jon's modules (the most recent ones): \begin{itemize} \item hippo.desexed \item hippo.male \item hippo.female \item lung.desexed \item lung.male \item lung.female \item liver.desexed \item liver.male \item liver.female \end{itemize} \item[Phenotypes]: @MUS:/data/www/networks/\textbf{infiles/phenotypes.txt}\\ A total of \textbf{123} phenotypes (which I got from Martin) are used in the permutations \item[Gene info from Biomart]: @MUS:/data/www/networks/\textbf{infiles/mart\_export.txt} Contains all mouse gene ids, their positions and descriptions \item[Probe annotation]: @MUS:/data/www/networks/\textbf{infiles/annot\_7\_sept\_2011.txt}\\ The same annotations as in the \textbf{annotdb} on MUS, where some of the gene identifiers have been updated to more recent ones (updated on 07.09.11 based on Ensembl) \end{description} \newpage \section{Methods} \subsection*{Permutations} Initially 1000 permutations are completed and if the observed p-value is below 0.1, an additional 9000 permutations (total of 10,000 ) are carried out. I use the following command script to do the permutations: <>= sh /data/www/networks/networks.sh @ \subsubsection*{Random phenotypes, fixed modules} Every phenotype consists of a number of QTLs. I generated 123 'fake phenotypes', by randomly selecting a start position within the genome and from there I sequentially retrieved the same number of genes as within the 'real phenotype'. This way, the fake QTLs always contained the same number of genes as their real counterparts, but varied in length.\\ I repeated this 10.000 times (only 1000 times for high p-values) and for each module I counted the number of times I saw \textbf{more} genes within the fake phenotype compared to within the real phenotype.\\ So for example, if a module had 15 genes within a given phenotype, I counted how often this module had more than 15 genes within any of the randomly generated phenotypes. I then divided this count by the number of iterations to obtain the p-value. \\ I did permutation analysis for each tissue within both A and B modules. Each permutation study resulted in a matrix of p-values (dim: 123*(number of modules)), providing a total of 12 such matrixes of different sizes (depending on the number of modules for the tissue).\\ \subsection{GO ANALYSIS} \subsubsection{All the genes in all the modules} \subsubsection{Genes within phenotypes} \newpage \section{Results} \subsection*{Overview of modules} %Example of uploading hippo modules \textbf{A} in R: %<>= %read.table("/data/www/networks/modules/A/hippo.txt", header=T, sep="\t") %@ %For different tissues, change the filename (lung.txt, liver.txt etc.), and for B modules, %replace A in the path with B. The twelve modules (see the Data Overview Section) belonging to the 3 tissues (hippo, lung and liver) are of different sizes and consist of varying number of probes. Probes that could not be mapped to single genes were not included in the subsequent analysis \begin{table}[!htp] \footnotesize \begin{center} \begin{tabular}{llccccc c} \hline \multicolumn{2}{c}{\multirow{2}{*}{\textbf{Datasets}}} & \textbf{No. of}& \multicolumn{3}{c}{\textbf{Size of modules}} & {\textbf{No. of}} & \textbf{No. of mapped}\\ \cline{4-6}&& \textbf{modules} & \textbf{mean} & \textbf{min} & \textbf{max} & \textbf{probes} & \textbf{probes} \\ \hline \multicolumn{1}{c}{\multirow{2}{*}{\emph{A}}} & \emph{hippo} & 23 & 544 & 39 & 2997 & 15736 & 13022(83\%)\\ &\multicolumn{1}{l}{\emph{liver}} & 5 & 182 & 45 & 538 & 912 & 713(78\%)\\ &\multicolumn{1}{l}{\emph{lung}} & 12 & 269 & 47 & 941 & 3226 & 2691(83\%)\\ \hline \multicolumn{1}{c}{\multirow{8}{*}{\emph{B}}} & \emph{hippo.desexed} & 10 & 812 & 79 & 2751 & 8121 & 6826 (84\%)\\ &\multicolumn{1}{l}{\emph{hippo.male}} & 12 & 692 & 82 & 3009 & 8301 & 6973(84\%)\\ &\multicolumn{1}{l}{\emph{hippo.female}} & 9 & 873 & 89 & 2666 & 7855 & 6592(84\%)\\ &\multicolumn{1}{l}{\emph{liver.desexed}} & 11 & 655 &145 & 2879 & 7201 & 6074(84\%)\\ &\multicolumn{1}{l}{\emph{liver.male}} & 13 & 563 & 92 & 2598 & 7325 & 6190(85\%)\\ &\multicolumn{1}{l}{\emph{liver.female}} & 12 & 595 & 98 & 2583 & 7142 & 6026(84\%)\\ &\multicolumn{1}{l}{\emph{lung.desexed}} & 15 & 548 & 26 & 2451 & 8216 & 6943(85\%)\\ &\multicolumn{1}{l}{\emph{lung.male}} & 15 & 556 & 52 & 2127 & 8334 & 7041(84\%)\\ &\multicolumn{1}{l}{\emph{lung.female}} & 14 & 580 & 29 & 2646 & 8115 & 6860(85\%) \\ \hline \end{tabular} \caption[Modules summary]{Summary of the 12 modules. The table lists dataset names and the number of modules within each dataset. The mean, min and max sizes of modules within each dataset is also shown. The second last column shows the total numbers of probes within the dataset, where the last column shows how many of these probes were mapped to a gene.} \label{tableModules} \end{center} \end{table} \subsection*{Permutations for fixed modules and random phenotypes} The permutation test resulted in 12 p-value matrices with the following dimensions: <>= no_of_phenotypes * no_of_modules @ These P-value matrices are in:\\ MUS:/data/www/networks/\textbf{results/pval.matrices}\\ To retrieve the .RData files for these matrices, do: <>= load("/data/www/networks/results/pval.matrices/A/hipA.RData") @ %save(lunA.pvals, file=''A/livA.RData'') for \textbf{A} modules, and: <>= load("/data/www/networks/results/pval.matrices/B/hipB.male.RData") @ \subsubsection*{FALSE DISCOVERY RATE (FDR)} Out of the three A modules, hippo is the only one with low FDRs. Both liver and lung have FDR's larger than 1. (see /data/www/networks/results/fdr/A/) <>= fdr.hip=read.delim("results/fdr/A/modA_hip.fdr.txt") library(xtable) print(xtable(fdr.hip, caption="FDR for hippo A"), include.rownames=FALSE) @ For a p-value of 0.01 and a total of 2952 values we would expect to see 29.5 values below 0.01 by change, and we're seeing 80 (see table). For a p-value of 0.001 we expect to see 2.95 values below 0.001 and we see 8. The modules and phenotypes for these 8 are: <>= print(xtable(hip.lowpvals, caption="Hippo A modules and phenotypes with p-values< 0.001", digit=c(1,10,0,0,0), include.rownames=FALSE )) @ <<>= print(xtable(head(hip.lowpvals.ordered, digit=c(1,10,0,0,0), include.rownames=FALSE))) @ To view the genes associated with these modules and phenotypes in R, do (from MUS:/data/www/networks): <<>= hip.genes.pvals=read.delim(file="new_out/modA_hip.pval_0.001andBelow.i10000.txt", header=T) hipA.haem.hgb=hip.genes.pvals[hip.genes.pvals$phenotype == "Haem.HGB",] @ <>= print(xtable(hipA.haem.hgb[,c('phenotype','module', 'ens_id', 'gene_desc')], caption="Genes within module 10 that are within the Haem.HGB QTLs ")) @ For the B modules, the strongest signal is for hip_desexed with 22 p-values below 0.01 when 12,3 are expected. However, there are no p-values below 0.001 for this dataset. All the other datasest for modules B have high FDR values. \subsubsection*{GO analysis)} \end{document}