EVOLUTION-MANAGER
Edit File: MQM-tour.Rnw
\documentclass[11pt]{article} \setlength{\topmargin}{-.5in} \setlength{\textheight}{23.5cm} \setlength{\textwidth}{17.0cm} \setlength{\oddsidemargin}{.025in} \setlength{\evensidemargin}{.025in} \setlength{\textwidth}{6.25in} \usepackage{amsmath} \usepackage{graphicx} \usepackage{verbatim} % useful for program listings \usepackage{color} % use if color is used in text \usepackage{subfigure} % use for side-by-side figures \usepackage{float} \usepackage{Sweave} \usepackage{url} \newcommand{\mqm}{\emph{MQM}} \newcommand{\MQM}{\mqm} \newcommand{\qtl}{QTL} \newcommand{\QTL}{\qtl} \newcommand{\xqtl}{\emph{x}QTL} \newcommand{\mqtl}{\emph{m}QTL} \newcommand{\eqtl}{\emph{e}QTL} \newcommand{\lod}{LOD} \newcommand{\cM}{cM} \newcommand{\rqtl}{\emph{R/qtl}} \newcommand{\cim}{\emph{CIM}} \newcommand{\At}{\emph{Arabidopsis thaliana}} \newcommand{\FIXME}{({\bf FIXME!})} \newcommand{\CHECK}{({\bf CHECK!})} \newcommand{\NOTE}[1]{({\tt NOTE: #1 })} \newcommand{\intro}[1]{\vspace{0.15in}#1:} \newcommand{\code}{\texttt} \newcommand{\etal}{\emph{et al.}} \newcommand{\Atintro}{\At\ RIL mQTL dataset (multitrait) with 24 metabolites as phenotypes \cite{Keurentjes2006}} \newcommand{\Atintrocolors}{\Atintro\ comparing \mqm\ (\code{mqmscan} in green) and single \qtl\ mapping (\code{scanone} in black)} \title { Tutorial - Multiple-QTL Mapping (MQM) Analysis for R/qtl } \author { Danny Arends, Pjotr Prins, Karl W. Broman and Ritsert C. Jansen } \begin {document} \maketitle \clearpage \SweaveOpts{prefix.string=Figs/fig,eps=TRUE} \setkeys{Gin}{width=6.25in} %% <- change width of figures \section{Introduction} \input{mqm/description.txt} \vspace{0.3in} \input{mqm/advantages_latex.txt} \input{mqm/limitations.txt} Despite these limitations, \mqm\footnote{MQM should not be confused with composite interval mapping (CIM) \cite{CIMa,CIMb}. The advantage of MQM over CIM is reduction of type I error (a QTL is indicated at a location where there is no QTL present) and type II error (a QTL is not detected) for QTL detection \cite{jansen94b}.} is a valuable addition to the \qtl\ mapper's toolbox. It is able to deal with QTL in coupling phase and QTL in repulsion phase. \mqm\ handles missing data and has higher power to detect QTL (linked and unlinked) than other methods. R/qtl's \mqm\ is faster than other implementations and scales on multi-CPU systems and computer clusters. In this tutorial we will show you how to use \mqm\ for \qtl\ mapping. \mqm\ is an integral part of the free \rqtl\ package \cite{rqtlbook,broman09,broman03} for the R statistical language\footnote{We assume the reader knows how to load his data into R using the R/qtl \code{read.cross} function; see also the R/qtl tutorials \cite{broman09} and book \cite{rqtlbook}.}. \section{A quick overview of \mqm} These are the typical steps in an \mqm\ \qtl\ analysis: \begin{itemize} \item Load data into R \item Fill in missing data, using either \code{mqmaugmentdata} or \code{fill.geno} \item Unsupervised backward elimination to analyse \emph{cofactors}, using \code{mqmscan} \item Optionally select \emph{cofactors\/} at markers that are thought to influence \qtl\ at, or near, the location \item Permutation or simulation analysis to get estimates of significance, using \code{mqmpermutation} or \code{mqmscanfdr} \end{itemize} Using maximum likelihood (ML), or restricted maximum likelihood (REML), the algorithm employs a backward elimination strategy to identify \qtl\ underlying the trait. The algorithm passes through the following stages: \begin{itemize} \item Likelihood-based estimation of the full model using all cofactors \item Backward elimination of cofactors, followed by a genome scan for \qtl \item If there are no \emph{cofactors\/} defined, the backward elimination of cofactors is skipped and a genome scan for \qtl\ is performed, testing each genetic (interval) location individually. In this case REML and ML will result in the same \qtl\ profile because there is no full model. \end{itemize} The results created during the genome scan and the \qtl\ model are returned as an (extended) R/qtl \code{scanone} object. Several special plotting routines are available for \mqm\ results. %\clearpage \section{Data augmentation} \label{augmentation} In an ideal world all datasets would be complete (with the genotype for every individual at every marker determined), however in the real world datasets are often incomplete. That is, genotype information is missing, or can have multiple plausible values. \mqm\ automatically expands the dataset by adding all potential variants and attaching a probability to each. For example, information is missing (unknown) at a marker location for one individual. Based on the values of the neighbouring markers, and the (estimated) recombination rate, a probability is attached to all possible genotypes. With \mqm\ all possible genotypes with a probability above the parameter \code{minprob} are considered. When encountering a missing marker genotype (possible genotypes {\bf A} and {\bf B} in a RIL), all possible genotypes at the missing location are created. Thus at the missing location two `individuals' are created in the \emph{augmentation} step, one with genotype {\bf A}, and one with genotype {\bf B}. A probability is attached to both \emph{augmented} individuals. The combined probability of all missing marker locations tells whether a genotype is likely, or unlikely, which allows for weighted analysis later. To see an example of missing data with an F$_2$ intercross, we can visualize the genotypes of the individuals using \code{geno.image}. In Figure~\ref{missing data} there are 2\% missing values in white. The other colors are genotypes at a certain position, for a certain individual. Simulate an F$_2$ dataset with 2\% missing genotypes as follows: \intro{Simulate a dataset with missing data} % set seed so that everything comes out exactly the same <<setseed,echo=FALSE>>= set.seed(19696527) @ <<>>= library(qtl) data(map10) simcross <- sim.cross(map10, type="f2", n.ind=100, missing.prob=0.02) @ and plot the genotype data using \code{geno.image} (Figure~\ref{missing data}): <<missingdata,eval=FALSE>>= geno.image(simcross) @ \begin{figure} <<missingdataplot,fig=TRUE,echo=FALSE>>= <<missingdata>> @ \caption{Genotype data for a simulated F$_2$ intercross generated with \code{sim.cross}, with 100 individuals and 2\% missing data. White pixels indicate missing genotypes.\label{missing data}} \end{figure} Before going to the next step (the \qtl\ genome scan), the data has to be completed (i.e. no more missing data). There are two possibilities: use (1) the \mqm\ data augmentation routine \code{mqmaugment} or (2) the imputation routine \code{fill.geno}. Augmentation tries to analyse all possible genotypes of interest by leaving them in the solution space. In contrast, the imputation method \emph{selects} the most likely genotype, and uses that single individual for further analysis. The downside of augmentation is that the addition of many possible genotypes can exceed available computer memory. Currently, augmentation moves an individual to a second augmentation round when it has too many possible genotypes (above the maximum number of augmented individuals \code{maxaugind}). In this second augmentation round the user can specify what needs to be done with these individuals: (1) Only use the most likely genotype, (2) use multiple imputation to create multiple possible genotypes (up to \code{maxaugind}) or (3) remove the original genotype/individual from the analysis. Note that you can opt to use \code{fill.geno}'s imputation method on your dataset, instead of augmentation, when too many individuals are dropped because of missing data. The function \code{mqmaugment} is specific to \mqm\ and the recommended procedure\footnote{Note that after augmentation the resulting object is no longer suitable for the use with other R/qtl mapping functions, like \code{scanone} and \code{cim}, because they can not account for duplicated or dropped individuals.}. In this tutorial we focus on \mqm's augmentation. The function \code{mqmaugment} fills in missing genotypes for us. For each missing genotype data, at a marker, it fills in all possible genotypes and calculates the probability. When the total probability is higher than the \code{minprob} parameter the \emph{augmented} individual is stored in the new cross object, ready for QTL mapping. The important parameters are: \code{cross}, \code{pheno.col}, \code{maxaugind}, \code{minprob} and \code{verbose} (see also the \code{mqmaugment} help page). \code{maxaugind} sets the maximum number of \emph{augmented} genotypes per individual in a dataset. The default of 82 allows six missing markers per individual in a BC, and four in an F$_2$. As a result the user has to increase the \code{maxaugind} parameter when there are more missing markers. The \code{minprob} parameter sets the minimum probability of a genotype for inclusion in the augmented dataset. This genotype probability is calculated for every marker \emph{relative} to the most likely genotype of this individual. Note that setting this value too low may result in moving a lot of individuals to the second augmentation round as the maximum of augmented individuals (the parameter \code{maxaugind}) is quickly reached. Increasing \code{minprob} (towards a value of 1.0) can keep individuals with more missing data inside the first augmentation round; a possible rule of thumb may be to set \code{minprob} to the percentage of data missing. A value of \code{minprob=1.0} makes augmentation behave similar to \code{fill.geno}'s imputation method, though with different resulting genotypes. Use \code{verbose=TRUE} to get more feedback on the augmentation routine and to check how many individuals are moved to the second stage, for imputation or removal\footnote{Augmentation is not always suitable with a lot of missing data, like in the case of selective genotyped datasets (for example the mouse \code{hyper} dataset that comes with R/qtl); these will always be handled with \code{minprob=1.0} (and a warning will be issued).} To start with an example, first run \code{mqmaugment} with \code{minprob=1.0} (Figure~\ref{augment1}): \intro{Plot augmented data using \code{geno.image}} <<>>= # displays warning because MQM ignores the X chromosome in an F2 augmentedcross <- mqmaugment(simcross, minprob=1.0) @ Plot the genotype data as follows: <<augment1,eval=FALSE>>= geno.image(augmentedcross) @ \begin{figure} <<augment1plot,fig=TRUE,echo=FALSE>>= <<augment1>> @ \caption{Genotypes, as visualized with \code{geno.image}, of 100 filled individuals (\code{mqmaugment} with \code{minprob=1.0}. With missing data only a `most likely' individual is used and no real expansion of the dataset takes place, with similar results as \code{fill.geno}'s imputation method).\label{augment1}} \end{figure} With a lower \code{minprob}, more augmented individuals are kept, and the resulting \emph{augmented} dataset will be larger. Adding (weighted) augmented individuals with all possible genotypes theoretically leads to a more accurate mapping when dealing with missing values \cite{jansen93}\footnote{Note again that the augmented dataset can only be used with pure \mqm\ functions. \mqm\ functions recognise expanded individuals as single entities. Other R/qtl functions, like \code{scanone}, assume the augmented individuals are \emph{real} individuals.}. \intro{Try augmentation with \code{minprob=0.1} (Figure~\ref{augment2})} <<>>= augmentedcross <- mqmaugment(simcross, minprob=0.1) @ \intro{Plot the genotype data} <<augment2,eval=FALSE>>= geno.image(augmentedcross) @ \begin{figure} <<augment2plot,fig=TRUE,echo=FALSE>>= <<augment2>> @ \caption{Genotypes, as visualized with \code{geno.image} of the \emph{augmented} genotypes of 100 individuals. There are a total of \Sexpr{nind(augmentedcross)} `expanded' individuals in this plot, because \mqm\ fills in missing markers with all likely genotypes (an average expansion of \Sexpr{round(nind(augmentedcross)/nind(simcross),1)} per individual).\label{augment2}} \end{figure} \label{multitrait} An \mqtl\ dataset (\code{multitrait}), which contains 24 metabolite traits from a RIL population of \At, is now distributed with R/qtl (load the data with \code{data(multitrait)}). This is part of the \At\ RIL selfing experiment with Landsberg erecta (Ler) and Cape Verde Islands (Cvi) with 162 individuals scored 117 markers \cite{Koornneef1998}. The experiment concerned empirical untargeted metabolomics using liquid chromatography time of flight mass spectrometry (LC-QTOF MS). This uncovered many qualitative and quantitative differences in metabolite accumulation between \At\ accessions \cite{Keurentjes2006}. \intro{Simulate missing data by removing some genotype data (5\%, 10\% and 80\%) from the cross object} <<augment3>>= data(multitrait) msim5 <- simulatemissingdata(multitrait, 5) msim10 <- simulatemissingdata(multitrait, 10) msim80 <- simulatemissingdata(multitrait, 80) @ Next use augmentation to fill in the missing genotypes; with more missing data increase the \code{minprob} parameter. When the \code{minprob} parameter is set too low it is possible that an individual cannot be augmented, and is moved to the second round of augmentation (see the description above). <<augment4>>= maug5 <- mqmaugment(msim5) maug10 <- mqmaugment(msim10, minprob=0.25) maug80 <- mqmaugment(msim80, minprob=0.80) @ Taking the 10\% missing set, we can try a lower \code{minprob=0.001}. The output below shows that ten augmented individuals miss too many markers to be augmented. By using the imputation strategy these individuals are kept in the set with a single `most likely' genotype. \intro{Augment with an imputation strategy} <<augmentMinProb>>= maug10minprob <- mqmaugment(msim10, minprob=0.001, verbose=TRUE) maug10minprobImpute <- mqmaugment(msim10, minprob=0.001, strategy="impute", verbose=TRUE) # check how many individuals are expanded: nind(maug10minprob) nind(maug10minprobImpute) @ Next, scan for QTL inside the cross objects with \code{mqmscan} and the single-QTL mapping function \code{scanone} (for reference). The effect of increasing the amount of missing data on QTL mapping, using default values, can be seen in Figure~\ref{augment6}. <<augment5>>= mqm5 <- mqmscan(maug5) mqm10 <- mqmscan(maug10) mqm80 <- mqmscan(maug80) @ <<augment5b>>= msim5 <- calc.genoprob(msim5) one5 <- scanone(msim5) msim10 <- calc.genoprob(msim10) one10 <- scanone(msim10) msim80 <- calc.genoprob(msim80) one80 <- scanone(msim80) @ <<augment6,eval=FALSE,echo=FALSE>>= op <- par(mfrow = c(2,2)) plot(mqm5, mqm10, mqm80, col=c("green","blue","red"), main="MQM missing data") legend("topleft", c("MQM 5%","MQM 10%","MQM 80%"), col=c("green","blue","red"), lwd=1) plot(one5, mqm5, main="5% missing", col=c("black","green")) legend("topleft", c("scanone","MQM"), col=c("black","green"), lwd=1) plot(one10, mqm10, main="10% missing", col=c("black","blue")) legend("topleft", c("scanone","MQM"), col=c("black","blue"), lwd=1) plot(one80, mqm80, main="80% missing", col=c("black","red")) legend("topleft", c("scanone","MQM"), col=c("black","red"), lwd=1) @ \begin{figure} <<fig=TRUE,echo=FALSE>>= <<augment6>> @ \caption{\Atintro. Effect of missing data on \code{mqmscan} after augmentation (green=5\%, blue=10\%, red=80\%) and \code{scanone} (black), after \code{fill.geno} imputation. \label{augment6}} \end{figure} \clearpage \section{Multiple-QTL Mapping (MQM)} \label{QTL modelling} The \code{multitrait} dataset, distributed with R/qtl, contains 24 metabolite traits from a RIL population of \At \cite{Keurentjes2006} (see also section \ref{multitrait} and \code{help(multitrait)} in R). Here we analyse the \code{multitrait} dataset using both \code{scanone} (single-\qtl\ analysis) and \code{mqmscan} (Multiple-QTL Mapping). First augment the data using the \code{mqmaugment} function with \code{minprob=1.0}, to compare against \code{scanone} with imputation (see also section \ref{augmentation}). \intro{Scan for \qtl\ with \code{mqmscan}, after filling missing data with \code{mqmaugment minprob=1.0}} <<>>= data(multitrait) maug_min1 <- mqmaugment(multitrait, minprob=1.0) mqm_min1 <- mqmscan(maug_min1) @ We compare \code{mqmscan} with \code{scanone}. For \code{scanone} one first calculates conditional \qtl\ genotype probabilities via \code{calc.genoprob}. <<>>= mgenop <- calc.genoprob(multitrait, step=5) m_one <- scanone(mgenop) @ Figure~\ref{Atminprob} shows that, without augmentation, the results from \mqm\ are similar to \code{scanone}. \intro{\code{mqmscan} after augmentation, without cofactor selection} <<>>= maug <- mqmaugment(multitrait) mqm <- mqmscan(maug) @ \begin{figure} <<MinprobMulti,fig=TRUE,echo=FALSE,height=3>>= plot(m_one, mqm_min1, col=c("black","green"), lty=1:2) legend("topleft", c("scanone","MQM"), col=c("black","green"), lwd=1) @ \caption{\Atintrocolors. \mqm\ shows similar results as single \qtl\ mapping, when used without \emph{augmentation} (\code{minprob} is 1.0), and with default parameters.\label{Atminprob}} \end{figure} By default \MQM\ introduces fictional markers, or `pseudo markers', at fixed intervals. A pseudo marker has a name like \code{c7.loc25}, which is the pseudo marker at 25 \cM\ on chromosome 7. (Note that this reflects the standard naming used in R/qtl.) Each chromosome is divided into evenly spaced pseudo markers, \code{step.size} \cM\ apart. A \lod\ score for underlying \qtl\ is calculated at these pseudo markers. A small \code{step.size} allows for smoother profiles compared with a pure marker-based mapping approach. The real markers are listed between the pseudo markers. In the result you can remove the pseudo markers by using the function \code{mqmextractmarkers}, as follows: <<>>= real_markers <- mqmextractmarkers(mqm) @ For model selection in \mqm, first supply the algorithm with an initial model. This initial model can be produced in two ways: by (1) building a model by hand (forward stepwise), or (2) by unsupervised backward elimination on a large number of markers (discussed in Section~\ref{backelim}). % @@ First build this initial model by hand using a forward stepwise approach. (Note that the automated procedure is preferred, both for theoretical and practical reasons.) A model consists of a set of markers we want to account for. We can start building the initial model by adding cofactors at markers with high LOD scores scored by using \code{mqmscan} with default values. Figure~\ref{Atminprob} displayed a large QTL peak on chromosome 5 at 35 \cM. So we account for that by setting a cofactor at the marker nearest to the peak on chromosome 5 and running \code{mqmscan} again. (See Figures~\ref{Cofactor4} and \ref{Cofactor4b}.) \intro{Add marker GH.117C (chromosome 5, at 35 \cM) as a cofactor} <<>>= max(mqm) find.marker(maug, chr=5, pos=35) multitoset <- find.markerindex(maug, "GH.117C") setcofactors <- mqmsetcofactors(maug, cofactors=multitoset) mqm_co1 <- mqmscan(maug, setcofactors) @ The function \code{find.marker} identifies the name of the marker closest to 35 \cM. The function \code{find.markerindex} translates the marker name into a cofactor number. The function \code{mqmsetcofactors} sets up a cofactor list for use with \code{mqmscan}. \intro{Plot the results of the genome scan after adding a single cofactor (Figure~\ref{Cofactor4})} <<Cofactor4multi,eval=FALSE>>= par(mfrow = c(2,1)) plot(mqmgetmodel(mqm_co1)) plot(mqm_co1) @ \begin{figure} <<fig=TRUE,echo=FALSE>>= # plot after adding first cofactor <<Cofactor4multi>> @ \caption{\Atintro. \code{mqmscan} after a cofactor is added at the top scoring marker of chromosome 5. During the analysis it is kept in the model. \label{Cofactor4}} \end{figure} % -------------- \intro{Plot the \code{mqmscan} results with \code{scanone} results as follows (Figure~\ref{Cofactor4b})} <<Cofactor4bMULTI,eval=FALSE>>= plot(m_one, mqm_co1, col=c("black","green"), lty=1:2) legend("topleft", c("scanone","MQM"), col=c("black","green"), lwd=1) @ \begin{figure} <<fig=TRUE,echo=FALSE,height=3>>= <<Cofactor4bMULTI>> @ \caption{\Atintro\ after introducing a cofactor on chromosome 5 (GH.117C). \code{mqmscan} (green, dashed) differs from \code{scanone} (black). \label{Cofactor4b}} \end{figure} % --------------------------- Figures~\ref{Cofactor4} and \ref{Cofactor4b} show the effect of setting a single marker as a cofactor related to the \qtl\ on chromosome 5, followed by an \mqm\ scan. The marker is not dropped and it passes initial thresholding to account for the \code{cofactor.significance} level. LOD scores are expected to change slightly, because of variation already explained by the \qtl\ on chromosome 5 (Figure~\ref{Cofactor4b}). Figure~\ref{Cofactor4b} shows the second peak on chromosome 4 at 10 \cM\ increases. Add a cofactor to the model and check if the model with both cofactors changes the \qtl. Combining \code{find.markerindex} with \code{find.marker}, adds the new cofactor to the cofactor already in \code{multitoset} (see Figure~\ref{twowaycomparison}): <<>>= # summary(mqm_co1) multitoset <- c(multitoset, find.markerindex(maug, find.marker(maug,4,10))) setcofactors <- mqmsetcofactors(maug,cofactors=multitoset) mqm_co2 <- mqmscan(maug, setcofactors) @ % ----------- \intro{Plot after adding second cofactor on chromosome 4 at 10 \cM} <<twowaycomparison,eval=FALSE>>= par(mfrow = c(2,1)) plot(mqmgetmodel(mqm_co2)) plot(mqm_co1, mqm_co2, col=c("blue","green"), lty=1:2) legend("topleft", c("one cofactor","two cofactors"), col=c("blue","green"), lwd=1) @) \begin{figure} <<fig=TRUE,echo=FALSE>>= <<twowaycomparison>> @ \caption{\Atintro\ using an added cofactor on chromosome 5 (blue), versus two cofactors, using an additional cofactor on chromosome 4 (green). \label{twowaycomparison}} \end{figure} % --------------------------- \intro{Plot the results with 0, 1 and 2 cofactors as follows} <<threewaycomparisonmulti,eval=FALSE>>= plot(mqm, mqm_co1, mqm_co2, col=c("green","red","blue"), lty=1:3) legend("topleft", c("no cofactors","one cofactor","two cofactors"), col=c("green","red","blue"), lwd=1) @ \begin{figure} <<fig=TRUE,echo=FALSE,height=3>>= # plot closeup of threeway comparison <<threewaycomparisonmulti>> @ \caption{\Atintro. Comparison of \mqm\ adding 0 (green), 1 (red) and 2 (blue) cofactor(s) (note that adding more cofactors does not improve the two QTL model).\label{threewaycomparison}} \end{figure} When using the functions \code{mqmsetcofactors}, or the automated \code{mqmautocofactor} (described in the next section), the number of cofactors is compared against the number of individuals inside the cross object. If there is a danger of setting too many cofactors, an error message is shown. \mqm\ also verifies the \code{cofactor.significance} level specified by the user. In the example the marker on chromosome 1 was informative enough, and included into the model. This way a new initial model consisting of cofactors on chromosome 4 and 5 was created. This (forward) selection of cofactors can continue until there are no more informative markers. Manually determining the markers to set a cofactor can be very time consuming in the case of many \qtl\ underlying a trait. It is also prone to overfitting. Furthermore, manual fitting is generally not feasible for a large number of traits. Fortunately \mqm\ provides unsupervised backward elimination, which is described in the next section. \clearpage \section{Unsupervised cofactor selection through backward elimination\label{backelim}} \mqm\ provides unsupervised backward elimination on a large number of markers by selecting cofactors automatically. Normally the number of markers in a dataset is much larger than the number of individuals. \mqm\ allows using any number of cofactors simultaneously. This can be as low as 0 cofactors up to a maximum of the number of individuals minus 12 ($Inds - 12$), as described in the Handbook of Statistical Genetics\cite{jansen07}. The functions: ``\code{mqmsetcofactor}'' and ``\code{mqmautocofactors}'' both create lists of cofactors that can be used for backward elimination. \code{mqmautocofactor} accounts for the underlying marker density and is therefore suitable for datasets with few individuals. See Figure~\ref{ManualAuto} for a comparison on the \code{multitrait} dataset, using the \code{mqmsetcofactors} function to set cofactors every 5th marker and \code{mqmautocofactor} to set 50 cofactors across the genome. After cofactor selection \mqm\ analyses and drops the least informative cofactor from the model. This step is repeated until a limited number of informative cofactors remain. When taking marker density into account, an extra cofactor is introduced on chromosome 1 (see Figure~\ref{ManualAuto}). After unsupervised backward elimination \code{mqmscan} scans each chromosome using the model with the remaining set of cofactors. For example, starting with 50 cofactors using \code{mqmautocofactor} and \code{mqmsetcofactors}, map \qtl\ for the various traits in \code{multitrait}, which contains 24 metabolite traits from a RIL population of \At\, as described in section \ref{multitrait}. The \qtl\ LOD scores differ between \mqm\ and single \qtl\ mapping with \code{scanone} (see Figures~\ref{Backward1} and \ref{Backward2}). \intro{Unsupervised cofactor selection through backward elimination} <<eval=FALSE>>= autocofactors <- mqmautocofactors(maug, 50) mqm_auto <- mqmscan(maug, autocofactors) setcofactors <- mqmsetcofactors(maug, 5) mqm_backw <- mqmscan(maug, setcofactors) @ <<echo=FALSE>>= autocofactors <- mqmautocofactors(maug, 50) mqm_auto <- mqmscan(maug, autocofactors) setcofactors <- mqmsetcofactors(maug, 5) mqm_backw <- mqmscan(maug, setcofactors) @ \intro{Visual inspection of the initial models} <<ManualAutoStart,eval=FALSE>>= par(mfrow = c(2,1)) mqmplot.cofactors(maug, autocofactors, justdots=TRUE) mqmplot.cofactors(maug, setcofactors, justdots=TRUE) @ \begin{figure} <<fig=TRUE,echo=FALSE>>= # plot result of cofactor selection <<ManualAutoStart>> @ \caption{\Atintro. \code{mqmsetcofactor} after introducing cofactors at every fifth marker (top) and \code{mqmautocofactor} automatic marker selection (bottom). Automatic selection takes the underlying marker density into consideration.\label{ManualAutoStart}} \end{figure} \intro{Plot results} <<ManualAuto,eval=FALSE>>= par(mfrow = c(2,1)) plot(mqmgetmodel(mqm_backw)) plot(mqmgetmodel(mqm_auto)) @ \begin{figure} <<fig=TRUE,echo=FALSE>>= # plot result of cofactor backward elimination <<ManualAuto>> @ \caption{\Atintro. \code{mqmsetcofactor} after introducing cofactors at every fifth marker (top) and \code{mqmautocofactor} automatic marker selection (bottom). \code{mqmautocofactor} places an additional cofactor at chromosome 1 (see also Figure~\ref{ManualAutoStart}). After backward elimination this extra marker remains informative.\label{ManualAuto}} \end{figure} <<Backward1multi,eval=FALSE>>= par(mfrow = c(2,1)) plot(mqmgetmodel(mqm_backw)) plot(mqm_backw) @ \begin{figure} <<fig=TRUE,echo=FALSE>>= # plot result of cofactor backward elimination <<Backward1multi>> @ \caption{\Atintro. Unsupervised cofactor selection through backward elimination using \code{mqmsetcofactor} after introducing cofactors at every fifth marker. \qtl\ mapped for trait X3.Hydroxypropyl on chromosome 4 and 5.\label{Backward1}} \end{figure} The \code{mqmgetmodel} function returns the final model from the output of \code{mqmscan}. This model can be further investigated using the \code{fitqtl} and \code{fitqtl} routines from R/qtl. \code{mqmgetmodel} can only be used after backward elimination produces a significant model. The resulting model can also be used to obtain the location and name of the significant cofactors. \intro{Plot result of \mqm, using unsupervised backward elimination, against that of \code{scanone}} <<Backward2,eval=FALSE>>= plot(m_one, mqm_backw, col=c("black","green"), lty=1:2) legend("topleft", c("scanone","MQM"), col=c("black","green"), lwd=1) @ % ------------------------------ <<Backward2multi,eval=FALSE>>= plot(m_one, mqm_backw, col=c("black","green"), lty=1:2) legend("topleft", c("scanone","MQM"), col=c("black","green"), lwd=1) @ \begin{figure} <<fig=TRUE,echo=FALSE,height=3>>= <<Backward2multi>> @ \caption{\Atintro. Compare \qtl\ mapping of \mqm\, after introducing cofactors at every fifth marker and unsupervised backward elimination of cofactors (green, dashed), and \code{scanone} (black).\label{Backward2}} \end{figure} \mqm\ \qtl\ mapping may result in many significant (informative) cofactors. Figure~\ref{Backward2} shows at \code{cofactor.significance=0.02} chromosomes 4 and 5 are involved. Lowering the significance level from 0.02 to 0.002 may yield a smaller model. In biology extensive models are sometimes preferred, but in general a simpler model is easier to understand and, perhaps, validated. Depending on the trait, and the sample size, increasing cofactor.significance can reduce the number of significant QTL in the model. In this example we have already have a small model, so we don't really expect to lose the two QTL on chromosome 4 and chromosome 5. When decreasing the \code{cofactor.significance} no additional cofactors are dropped from the model (See Figure~\ref{lowalpha}) \clearpage \intro{Plot with lowered \code{cofactor.significance}} <<FigLowAlpha,eval=FALSE>>= mqm_backw_low <- mqmscan(maug, setcofactors, cofactor.significance=0.002) par(mfrow = c(2,1)) plot(mqmgetmodel(mqm_backw_low)) plot(mqm_backw,mqm_backw_low, col=c("blue","green"), lty=1:2) legend("topleft", c("Significance=0.02","Significance=0.002"), col=c("blue","green"), lwd=1) @ \intro{\qtl\ mapped with different \code{cofactor.significance=0.002}, using the same starting markers as Figure~\ref{Backward1}. As can be seen from the plot the models selected are similar. This means the QTL found significant at 0.02 are still significant at a more restrictive cutoff.\label{lowalpha}} \begin{figure} <<fig=TRUE,echo=FALSE>>= <<FigLowAlpha>> @ \caption{\Atintro. \qtl\ mapped with different \code{cofactor.significance}. With the lower \code{cofactor.significance=0.002} the model does not change.\label{FigLowAlpha}} \end{figure} When comparing the \mqm\ scan in Figure~\ref{FigLowAlpha} with the original \code{scanone} result in Figure~\ref{Backward2} there are some notable differences. Some \qtl\ show higher significance (LOD scores) and some others show lower significance and are, therefore, estimated to be less likely involved in this trait. Figures can be reconstructed from the result of \code{mqmscan} using the \code{mqmplot.singletrait} function (see, for example, Figure~\ref{AutoCofactor}). Here the model and \qtl\ profile are retrieved. These functions can only be used with \code{mqmscan} functions, as they require the additional information about the inferred \qtl\ model. The results also contain the \emph{estimated} information content per marker. % <<AutoCofactor,eval=FALSE>>= % mqmplot.singletrait(result, extended=TRUE) % @ <<AutoCofactor,eval=FALSE>>= mqmplot.singletrait(mqm_backw_low, extended=TRUE) @ \begin{figure} <<fig=TRUE>>= <<AutoCofactor>> @ \caption{\Atintro. Plot using \code{mqmplot.singletrait} of the first metabolic trait. The information content per marker (red) and the mapped \qtl\ (black).\label{AutoCofactor}} \end{figure} The information content \code{info} in the result is calculated from the deviation of the `ideal marker distribution'. For example, with a dataset of 100 individuals, when comparing two distinct phenotypes at a marker location, we have most power when both groups are equally divided 50/50. A marker has virtually no power when one group containing 1 individual versus a group of 99. We can multiply the estimated \qtl\ effect by this information content to `clean' the QTL profile by giving less weight to less informative markers. Please note that the sample size already plays a role in calculating \qtl. Meanwhile it allows (informal) further weighting/exploring \emph{information} content (Figure~\ref{AutoCofactor}). \section{MQM effect plots} \label{effectplots} The function \code{mqmplot.directedqtl} is used to plot \lod\ curves with an indication of the sign of the estimated QTL effects. This function because it uses internal R/qtl functions cannot handle augmented cross objects. An error will occur when the object supplied is augmented using \code{mqmaugment}. This requires using \code{mqmscan} with parameter \code{outputmarkers=TRUE} (default). \intro{Create a directed QTL plot (Figure~\ref{QTLeffects})} <<QTLeffects,eval=FALSE>>= dirresults <- mqmplot.directedqtl(multitrait, mqm_backw_low) @ \begin{figure} <<fig=TRUE,echo=FALSE,height=3.5>>= <<QTLeffects>> @ \caption{\Atintro. Like Figure~\ref{AutoCofactor}, but with LOD scores multiplied by $\pm$1 according to the sign of the estimated \qtl\ effect.\label{QTLeffects}} \end{figure} The results in Figure~\ref{FigLowAlpha} imply that \qtl\ on chromosomes 4 and 5 are associated with the metabolite X3.Hydroxypropyl. If we want to investigate the effects of the QTL, we can use the functions \code{plotPXG} and \code{effectplot}. The following plots show these for markers GH.117C (main effect, Figure~\ref{MainEffectsD1}) and the interaction between GH.117C and GA1 (Figure~\ref{epistatic1}). <<MainEffectsD1,echo=FALSE,eval=FALSE>>= plotPXG(multitrait, marker="GH.117C") @ \begin{figure} <<fig=TRUE, echo=FALSE,height=3>>= <<MainEffectsD1>> @ \caption{\Atintro. Main effect plot, with \code{plotPXG}, of marker GH.117C on trait X3.Hydroxypropyl. For each marker genotype the individual phenotype is plotted, with the mean of genotype AA (red) and BB (blue).\label{MainEffectsD1}} \end{figure} %---------------------------------------------- The initial scans for X3.Hydroxypropyl (Figure~\ref{Backward1}) show two possible \qtl\ on chromosome 4 and 5. We can investigate interactions between these main effect QTL using the \code{effectplot} function. To investigate the possible epistatic interaction, select markers GA1 (significant in Figure~\ref{Backward1} and Figure~\ref{Backward2}) and GH.117C (significant in Figure~\ref{Backward2} and \ref{AutoCofactor}). See Figure~\ref{epistatic1}. <<epistatic1,eval=FALSE>>= effectplot(multitrait, mname1="GH.117C", mname2="GA1") @ \begin{figure} <<fig=TRUE,echo=FALSE,height=3.5>>= <<epistatic1>> @ \caption{\Atintro. Explore epistatic interaction, using \code{effectplot}, between markers GH.117C and GA1. GA1 appears to obscure the effect of GH.117C. An individual that has BB at GA1 has no difference in expression between being AA or BB at GH.117C. However when an individual is AA at GA1 there is clear difference between the two genotype means (~1.500 BB versus 12000 when AA) at GH.117C. \label{epistatic1}} \end{figure} %------------------------------------------------ Likewise, in case we are interested in the interactions between the first small hump on chromosomes 1 (marker: PVV4 not significant) and the main efect on 5 (GH.117C), we could make interaction plots between these two markers with a high LOD score on those chromosomes. See Figure~\ref{epistatic2}. <<epistatic2,eval=FALSE>>= effectplot(multitrait, mname1="PVV4", mname2="GH.117C") @ Meanwhile, Figure~\ref{epistatic2} shows no evidence for an interaction between the two markers GH.117C and PVV4, as the lines are close to parallel. \begin{figure} <<fig=TRUE,echo=FALSE,height=3.5>>= <<epistatic2>> @ \caption{\Atintro. \code{effectplot} shows no epistatic effects between markers GH.117C and PVV4, ths can be seen because the two lines run in parallel, the genotype on one location (PVV4) does not affect the effect of the expression on GH.117C other location. \label{epistatic2}} \end{figure} %------------------------------------------------ \clearpage \section{QTL significance} \label{significance} To estimate the significance of \qtl\, and perhaps further exclude markers from a model, permutation testing is provided by the function \code{mqmpermutation}. This step is computationally expensive\footnote{In the tutorial, for all examples, 25 permutations are used. A real experiment should use over 1000 permutation tests.} as the same test is repeated many times on shuffled data. Each test calculates \lod\ scores for non associated (randomly ordered) data. \mqm\ provides parametric and non-parametric bootstrapping to estimate \qtl\ significance. Select the type with the \code{bootmethod} parameter. If you have are lucky enough to have multiple CPUs on your computer you can use the SNOW package \cite{tierney03,tierney04}, which allows parallel computations on multiple CPU/cores. SNOW is available through the Internet R archive CRAN. For example, with Rgui SNOW can be installed by selecting from the menu: `Packages' and `Install Package(s)` from the drop down menu. Select a CRAN mirror near you and select the SNOW package. Rgui will start downloading the package, and install any dependencies needed. Linux users can download a copy of SNOW from \url{https://cran.r-project.org/package=snow}. Once the package has finished downloading the tar.gz file can be installed using \code{R CMD INSTALL snow.tar.gz} To summarize results from \code{mqmpermutation}, \code{mqmprocespermutation} makes the output comparable to \code{scanone} when using the \code{n.perm} parameter for permutation. \intro{Calculate significance - using SNOW parallelization parameters} <<>>= require(snow) results <- mqmpermutation(maug, scanfunction=mqmscan, cofactors=setcofactors, n.cluster=2, n.perm=25, batchsize=25) @ \begin{figure} <<fig=TRUE>>= mqmplot.permutations(results) @ \caption{\Atintro. \qtl\ significance calculated through permutation using \code{mqmpermutation}. Estimate by permuting a single trait (X3.Hydroxypropyl) with randomly distributing trait values amongst individuals, which gives an indication of \lod\ scores found by chance. \qtl\ with a \lod\ higher than 2.5 can be considered significant (at \code{cofactor.significance=0.05} (green) or \code{cofactor.significance=0.10} (blue)). Chromosome are marked by the gray vertical grid lines.} \end{figure} <<>>= resultsrqtl <- mqmprocesspermutation(results) summary(resultsrqtl) @ For small datasets, with a limited amount of classical traits, \code{mqmpermutation} is nice. However, for large expression studies (eQTL) using microarrays, use \code{mqmscanfdr} instead, which estimates false discovery rates (FDR) across the entire dataset at \lod\ cutoff, as described by Breitling \etal\cite{breitling08}. To estimate the FDR, \code{mqmscanfdr} permutes whole genome information, taking correlation between traits into account and giving an unbiased estimate of FDR at different (user specified) thresholds. The function scans the traits and counts observed \qtl\ with a LOD above \code{x}, setting a certain threshold. It permutes all the data leaving the correlation structure between traits intact. Below, very high FDR estimates are calculated because of a small amount of permutations and high correlation between traits. We discover many QTL that map to the same location. This can normally only happen with information sparse marker(s), or correlated traits, as seen in microarray experiments. \intro{Calculate FDR} <<>>= data(multitrait) m_imp <- fill.geno(multitrait) mqmscanfdr(m_imp, mqmscanall, cofactors=setcofactors, n.cluster=2) @ In contrast, the function \code{mqmpermutation} does single trait permutations, and does not take correlation between the traits into account. The advantage is that a permutation threshold is determined for each trait. This leads to different significance levels per trait and could lead to certain \qtl\ being significant at their trait cut-off, which are not significant when a single cut-off. The \mqm\ output needs to be converted to the standard R/qtl format using the \code{mqmprocesspermutation} function. The resulting object is of class \code{scanoneperm} and can be used by the standard R/qtl functions for further analysis. To parallelize calculations \code{n.cluster} sets the number of CPU cores to use. A batch consists of a number of traits to analyze on one core. A large(r) \code{batchsize} (default 10) can also be set to improve efficiency. Every time a batch is sent a new instance of R is started, so it pays to have as few batches as possible. \section{Parallelized \xqtl\ analysis} \mqm\ can handle high throughput \xqtl\ data - the name coined for the family of expression QTL, or eQTL \cite{jansen01}, metabolite QTL (mQTL) and pQTL (protein QTL), where measurements like gene expression on microrray probes are treated as phenotypes. \mqm\ analyses traits simultaneously using parallel computing on multiple CPU/cores, and even computer clusters. \xqtl datasets (expression eQTL, metabolite mQTL) usually contain a large amount of phenotypes with known locations on the genome. These locations can be used for detecting cis/trans regulation, for example. For QTL mapping every phenotype requires one or more calls to \code{mqmscan}. In addition special plots are presented for \xqtl studies. Our example, the mQTL dataset \code{multitrait}, an \At\ RIL cross, containing 24 metabolites measured as phenotypes. Of these 24 phenotypes we will only scan the first five phenotypes by setting the \code{pheno.col} parameter. To map back the regulatory locations of these metabolites one can use plain scanning of all metabolites (initially without cofactors). Next, we plot all the profiles in a heatmap (see Figure~\ref{At.heatmap1}). In this heatmap the colors represent the LOD score, on the x-axis the marker number and on the y-axis the metabolite. The traits are numbered in the plot. Plot heatmap without cofactors and then the heatmap with cofactors and backward elimination. Figure~\ref{At.heatmap2} shows improvement over Figure~\ref{At.heatmap1} because of an improved signal to noise ratio. <<>>= data(multitrait) m_imp <- fill.geno(multitrait) mqm_imp5 <- mqmscan(m_imp, pheno.col=1:5, n.cluster=2) @ \begin{figure} <<fig=TRUE>>= mqmplot.multitrait(mqm_imp5, type="image") @ \caption{\Atintro. Heatmap of five metabolite expression traits, with profiles created using \mqm\ without preselected cofactors. The colors represent the LOD score, on the x-axis the marker number and on the y-axis the metabolite.\label{At.heatmap1}} \end{figure} <<>>= cofactorlist <- mqmsetcofactors(m_imp, 3) mqm_imp5 <- mqmscan(m_imp, pheno.col=1:5 , cofactors=cofactorlist, n.cluster=2) @ \begin{figure} <<fig=TRUE>>= mqmplot.multitrait(mqm_imp5, type="image") @ \caption{\Atintro. Heatmap of metabolite expression traits, with profiles created using \mqm\ with cofactors at each third marker. The colors represent the LOD score, on the x-axis the marker number and on the y-axis the metabolite.\label{At.heatmap2}} \end{figure} Use \code{mqmplot.multitrait} for more graphical output. (Unfortunately this does not show in the generated PDF, but in R it shows the trait profiles) <<>>= mqmplot.multitrait(mqm_imp5, type="lines") @ Next is \code{mqmplot.circle}. The circle plot shows a circular representation of the genome. After using automatic backward selection certain marker/cofactors are found to be significant. These are highlighted and colored. The cofactor size can be scaled, based on significance (see Figures~\ref{circle1} and \ref{circle2}). The plot can be tweaked. For example, \code{highlight} a specific trait, and calculate interactions between the significant cofactors. All other traits are grayed out, but remain partly visible, in this way it is possible to see if significant QTL for this trait are also colocated with other traits. Parameter \code{interactstrength}: highlights interactions between significant markers. However they are only drawn (and reported in the output) if the effect change is larger than \code{interactstrength} multiplied by the summed standard deviation. Parameter \code{spacing} sets space between the chromosomes in Cm. \begin{figure} <<fig=TRUE>>= mqmplot.circle(m_imp, mqm_imp5) @ \caption{\Atintro. Circle plot 1 - Multiple metabolic traits without known locations. Four traits are in the centre connected by a colored spline to their QTL. Significant QTL locations are depicted as solid square circles. A lower \lod\ score is closer to the center. A 'hotspot' of \qtl\ is visible on chromosome 5. \label{circle1}} \end{figure} \begin{figure} <<fig=TRUE>>= mqmplot.circle(m_imp, mqm_imp5, highlight=2) @ \caption{\Atintro. Circle plot 2 - Multiple metabolic traits without known locations. Highlight the second trait. The significant \qtl\ locations are depicted as solid red square circles. The splines show epistatic interactions (see also Figure~\ref{epistatic1}). The blue lines are locations which are modulating expression (higher or lower), the green lines show a flip in effect. To explain this: with two markers, having AA at marker one shows trait mean AA > trait mean BB at marker two however when the individual has BB at marker one the effect at marker two is reversed AA < BB. \label{circle2}} \end{figure} Next a cis-trans plot with \code{mqmplot.cistrans}. This plot is only available when genomic locations of the traits are known, e.g. the genomic probe locations in microarray eQTL studies. By default the R/qtl cross object does not store this data. So the user has to add this information to the cross object using the \code{addloctocross} function. After this operation the cis-trans plot can be created for \qtl\ with associated genome locations. The two axis of the cis-trans plot both show the genetic location. The X-axis is, normally, the \qtl\ location and the Y-axis the locations of the trait. \begin{figure} <<fig=TRUE>>= data(locations) multiloc <- addloctocross(m_imp, locations) mqmplot.cistrans(mqm_imp5, multiloc, 5, FALSE, TRUE) @ \caption{\Atintro. \code{mqmplot.cistrans} can be drawn when \qtl\ have associated genome locations. \qtl\ are plotted against the position on the genome they were measured (here mQTL for \At), cutoff is at \lod=5. Normally these plots are created using 10.000 + traits. However because this tutorial is automatically generated we only use 5 traits to illustrate.} \end{figure} \intro{When having locations we can, again, use the \code{mqmplot.circle} function, now with the extra information} \begin{figure} <<fig=TRUE>>= mqmplot.circle(multiloc, mqm_imp5, highlight=2) @ \caption{\Atintro. Circle plot 3 - Multiple metabolic traits with known locations. Highlight the second trait. The significant \qtl\ locations are depicted as solid red square circles. The known location of the trait is a red triangle. The splines show epistatic interactions (see also Figure~\ref{epistatic1}). The blue lines are locations which are modulating expression (higher or lower), the green lines show a flip in effect.\label{circle3}} \end{figure} \clearpage \section{Overview of all \mqm\ functions} \begin{table}[ht] \caption{Added functionality} \centering \begin{tabular}{| l | l | } \hline mqmaugment:& data augmentation \\ mqmscan:& \mqm\ modelling and scanning \\ mqmsetcofactors:& Set cofactors at markers (or at fixed locations) \\ find.markerindex:& Change marker numbering into mqmformat \\ mqmscanall:& mqmscanall scans all traits using \mqm\\ mqmpermutation:& Single trait permutation \\ mqmscanfdr:& Genome wide False Discovery Rates (FDR) \\ mqmprocesspermutation:& Creates an R/qtl permutationobject \\ & \hspace{0.25in} from the output of the \code{mqmpermutation} function \\ mqmplot.multitrait:& plot multiple traits (MQMmulti object) \\ mqmplot.directedqtl:& plot of single trait with added \qtl\ effect\\ mqmplot.permutations:& plot to show single trait permutations \\ mqmplot.singletrait:& plot of single trait analysis with information content \\ mqmplot.circle:& Genome plot of \qtl\ in a circle (optional: Use of location information)\\ mqmplot.cistrans:& Genomewide plot of cis- and trans-\qtl, above a threshold \\ addloctocross:& Adding genetic locations for traits \\ mqmtestnormal:& Test normality of a trait \\ \hline \end{tabular} \label{tbl:tabel1} \end{table} \clearpage \begin{thebibliography}{9} \bibitem{broman09} Broman, K.W.; 2009 \emph{A brief tour of R/qtl.} https://rqtl.org/tutorials/rqtltour.pdf. \bibitem{rqtlbook} Broman, K. W.; Sen, \'S; 2009 \emph{A Guide to QTL Mapping with R/qtl.} Springer. \bibitem{broman03} Broman, K.W.; Wu, H.; Sen, S.; Churchill, G.A.; 2003 \emph{R/qtl: QTL mapping in experimental crosses.} Bioinformatics, 19:889--890. \bibitem{jansen07} Jansen R. C.; 2007 \emph{Chapter 18 - Quantitative trait loci in inbred lines.} Handbook of Statistical Genetics, 3rd edition. Wiley. \bibitem{tierney04} Tierney, L.; Rossini, A.; Li, N.; and Sevcikova, H.; 2004 \emph{The snow Package: Simple Network of Workstations.} Version 0.2-1. \bibitem{tierney03} Rossini, A.; Tierney, L.; and Li, N.; 2003 \emph{Simple parallel statistical computing.} R. University of Washington Biostatistics working paper series, 193. \bibitem{jansen01} Jansen R. C.; Nap J.P.; 2001 \emph{Genetical genomics: the added value from segregation.} Trends in Genetics, 17, 388--391. \bibitem{jansen94} Jansen R. C.; Stam P.; 1994 \emph{High resolution of quantitative traits into multiple loci via interval mapping.} Genetics, 136, 1447--1455. \bibitem{jansen94b} Jansen R.C.; 1994 \emph{Controlling the Type I and Type II Errors in Mapping Quantitative Trait Loci.} Genetics, Vol 138, 871--881. \bibitem{Churchill94} Churchill, G. A.; and Doerge, R. W.; 1994 \emph{Empirical threshold values for quantitative trait mapping.} Genetics 138, 963--971. \bibitem{jansen93} Jansen R. C.; 1993 \emph{Interval mapping of multiple quantitative trait loci.} Genetics, 135, 205--211. \bibitem{Dempster77} Dempster, A. P.; Laird, N. M. and Rubin, D. B.; 1977 \emph{Maximum likelihood from incomplete data via the EM algorithm.} J. Roy. Statist. Soc. B, 39, 1--38. \bibitem{CIMa} Zeng, Z. B.; 1993 \emph{Theoretical basis for separation of multiple linked gene effects in mapping quantitative trait loci.} Proc. Natl. Acad. Sci. USA, 90, 10972--10976. \bibitem{CIMb} Zeng, Z. B.; 1994 \emph{Precision mapping of quantitative trait loci.} Genetics, 136, 1457--1468 \bibitem{sugiyama} Sugiyama, F.; Churchill, G.A.; Higgins, D.C.; Johns, C.; Makaritsis, K.P.; Gavras, H.; Paigen, B.; 2001 \emph{Concordance of murine quantitative trait loci for salt-induced hypertension with rat and human loci.} Genomics, 71, 70--77. \bibitem{Keurentjes2006} Keurentjes, J. J.; Fu, J.; de Vos, C. H.; Lommen, A.; Hall, R. D.; Bino, R. J.; van der Plas, L. H.; Jansen, R. C.; Vreugdenhil, D.; Koornneef, M.; 2006 \emph{The genetics of plant metabolism.} Nature Genetics. 38, 842--849. \bibitem{Koornneef1998} Alonso-Blanco, C.; Peeters, A. J.; Koornneef, M.; Lister, C.; Dean, C.; van den Bosch, N.; Pot, J.; Kuiper, M. T.; 1998 \emph{ Development of an AFLP based linkage map of Ler, Col; Cvi Arabidopsis thaliana ecotypes; construction of a Ler/Cvi recombinant inbred line population.} Plant J. 14, 259--271. \bibitem{breitling08} Breitling, R.; Li, Y.; Tesson, B. M.; Fu, J.; Wu, C.; Wiltshire, T.; Gerrits, A.; Bystrykh, L. V.; de Haan, G.; Su, A. I.; Jansen, R. C.; 2008 \emph{Genetical genomics: spotlight on QTL hotspots.} PLoS Genet. 4/10. \end{thebibliography} \end{document}