EVOLUTION-MANAGER
Edit File: MQM-tour.tex
\documentclass[11pt]{article} \setlength{\topmargin}{-.5in} \setlength{\textheight}{23.5cm} \setlength{\textwidth}{17.0cm} \setlength{\oddsidemargin}{.025in} \setlength{\evensidemargin}{.025in} \setlength{\textwidth}{6.25in} \usepackage{amsmath} \usepackage{graphicx} \usepackage{verbatim} % useful for program listings \usepackage{color} % use if color is used in text \usepackage{subfigure} % use for side-by-side figures \usepackage{float} \usepackage{Sweave} \usepackage{url} \newcommand{\mqm}{\emph{MQM}} \newcommand{\MQM}{\mqm} \newcommand{\qtl}{QTL} \newcommand{\QTL}{\qtl} \newcommand{\xqtl}{\emph{x}QTL} \newcommand{\mqtl}{\emph{m}QTL} \newcommand{\eqtl}{\emph{e}QTL} \newcommand{\lod}{LOD} \newcommand{\cM}{cM} \newcommand{\rqtl}{\emph{R/qtl}} \newcommand{\cim}{\emph{CIM}} \newcommand{\At}{\emph{Arabidopsis thaliana}} \newcommand{\FIXME}{({\bf FIXME!})} \newcommand{\CHECK}{({\bf CHECK!})} \newcommand{\NOTE}[1]{({\tt NOTE: #1 })} \newcommand{\intro}[1]{\vspace{0.15in}#1:} \newcommand{\code}{\texttt} \newcommand{\etal}{\emph{et al.}} \newcommand{\Atintro}{\At\ RIL mQTL dataset (multitrait) with 24 metabolites as phenotypes \cite{Keurentjes2006}} \newcommand{\Atintrocolors}{\Atintro\ comparing \mqm\ (\code{mqmscan} in green) and single \qtl\ mapping (\code{scanone} in black)} \title { Tutorial - Multiple-QTL Mapping (MQM) Analysis for R/qtl } \author { Danny Arends, Pjotr Prins, Karl W. Broman and Ritsert C. Jansen } \begin {document} \maketitle \clearpage \setkeys{Gin}{width=6.25in} %% <- change width of figures \section{Introduction} \input{mqm/description.txt} \vspace{0.3in} \input{mqm/advantages_latex.txt} \input{mqm/limitations.txt} Despite these limitations, \mqm\footnote{MQM should not be confused with composite interval mapping (CIM) \cite{CIMa,CIMb}. The advantage of MQM over CIM is reduction of type I error (a QTL is indicated at a location where there is no QTL present) and type II error (a QTL is not detected) for QTL detection \cite{jansen94b}.} is a valuable addition to the \qtl\ mapper's toolbox. It is able to deal with QTL in coupling phase and QTL in repulsion phase. \mqm\ handles missing data and has higher power to detect QTL (linked and unlinked) than other methods. R/qtl's \mqm\ is faster than other implementations and scales on multi-CPU systems and computer clusters. In this tutorial we will show you how to use \mqm\ for \qtl\ mapping. \mqm\ is an integral part of the free \rqtl\ package \cite{rqtlbook,broman09,broman03} for the R statistical language\footnote{We assume the reader knows how to load his data into R using the R/qtl \code{read.cross} function; see also the R/qtl tutorials \cite{broman09} and book \cite{rqtlbook}.}. \section{A quick overview of \mqm} These are the typical steps in an \mqm\ \qtl\ analysis: \begin{itemize} \item Load data into R \item Fill in missing data, using either \code{mqmaugmentdata} or \code{fill.geno} \item Unsupervised backward elimination to analyse \emph{cofactors}, using \code{mqmscan} \item Optionally select \emph{cofactors\/} at markers that are thought to influence \qtl\ at, or near, the location \item Permutation or simulation analysis to get estimates of significance, using \code{mqmpermutation} or \code{mqmscanfdr} \end{itemize} Using maximum likelihood (ML), or restricted maximum likelihood (REML), the algorithm employs a backward elimination strategy to identify \qtl\ underlying the trait. The algorithm passes through the following stages: \begin{itemize} \item Likelihood-based estimation of the full model using all cofactors \item Backward elimination of cofactors, followed by a genome scan for \qtl \item If there are no \emph{cofactors\/} defined, the backward elimination of cofactors is skipped and a genome scan for \qtl\ is performed, testing each genetic (interval) location individually. In this case REML and ML will result in the same \qtl\ profile because there is no full model. \end{itemize} The results created during the genome scan and the \qtl\ model are returned as an (extended) R/qtl \code{scanone} object. Several special plotting routines are available for \mqm\ results. %\clearpage \section{Data augmentation} \label{augmentation} In an ideal world all datasets would be complete (with the genotype for every individual at every marker determined), however in the real world datasets are often incomplete. That is, genotype information is missing, or can have multiple plausible values. \mqm\ automatically expands the dataset by adding all potential variants and attaching a probability to each. For example, information is missing (unknown) at a marker location for one individual. Based on the values of the neighbouring markers, and the (estimated) recombination rate, a probability is attached to all possible genotypes. With \mqm\ all possible genotypes with a probability above the parameter \code{minprob} are considered. When encountering a missing marker genotype (possible genotypes {\bf A} and {\bf B} in a RIL), all possible genotypes at the missing location are created. Thus at the missing location two `individuals' are created in the \emph{augmentation} step, one with genotype {\bf A}, and one with genotype {\bf B}. A probability is attached to both \emph{augmented} individuals. The combined probability of all missing marker locations tells whether a genotype is likely, or unlikely, which allows for weighted analysis later. To see an example of missing data with an F$_2$ intercross, we can visualize the genotypes of the individuals using \code{geno.image}. In Figure~\ref{missing data} there are 2\% missing values in white. The other colors are genotypes at a certain position, for a certain individual. Simulate an F$_2$ dataset with 2\% missing genotypes as follows: \intro{Simulate a dataset with missing data} % set seed so that everything comes out exactly the same \begin{Schunk} \begin{Sinput} > library(qtl) > data(map10) > simcross <- sim.cross(map10, type="f2", n.ind=100, missing.prob=0.02) \end{Sinput} \end{Schunk} and plot the genotype data using \code{geno.image} (Figure~\ref{missing data}): \begin{Schunk} \begin{Sinput} > geno.image(simcross) \end{Sinput} \end{Schunk} \begin{figure}