EVOLUTION-MANAGER

Edit File: new_summary_scantwo.Rnw

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Karl W. Broman
% The new summary.scantwo and plot.scantwo
%
% This is an "Sweave" document; see the corresponding PDF.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\documentclass[12pt]{article}

%\usepackage{times}
\usepackage{amsmath}
\usepackage{color}
\usepackage{times}

% revise margins
\setlength{\headheight}{0.0in}
\setlength{\topmargin}{-0.25in}
\setlength{\headsep}{0.0in}
\setlength{\textheight}{9.00in}
\setlength{\footskip}{0.5in}
\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\textwidth}{6.5in}

\setlength{\parskip}{6pt}
\setlength{\parindent}{0pt}

\newcommand{\code}{\texttt}
\newcommand{\lod}{\text{LOD}}

\begin{document}

\SweaveOpts{prefix.string=Figs/scantwo,eps=TRUE}
\setkeys{Gin}{width=\textwidth} %% <- change width of figures
% Try to get the R code from running into the margin
<<echo=FALSE>>=
options(width=77)
@

% Change S input/output font size
\DefineVerbatimEnvironment{Sinput}{Verbatim}{fontsize=\footnotesize, baselinestretch=0.75, formatcom = {\color[rgb]{0, 0, 0.56}}}
\DefineVerbatimEnvironment{Soutput}{Verbatim}{fontsize=\footnotesize, baselinestretch=0.75, formatcom = {\color[rgb]{0.56, 0, 0}}}

\textbf{The new \code{summary.scantwo} and \code{plot.scantwo}} \\
Karl W Broman, 27 Oct 2006 \\
(Added color 26 Oct 2010; slight change 21 Mar 2012: \verb|summary.scantwo.old| is now \verb|summaryScantwoOld|)

\bigskip

In R/qtl version 1.04, the functions \code{summary.scantwo} and
\code{plot.scantwo} have been changed quite substantially.  Also, the
permutations with \code{scantwo} have been changed to match the new
format for \code{summary.scantwo}.

In this document, I describe the revisions and how to use the new
functions.   We'll look at the \code{hyper} data as an example.

First we need to load the package and the data.
<<loaddata,eval=FALSE>>=
library(qtl)
data(hyper)
@

<<loadresults,echo=FALSE>>=
load("hyper_results.RData")
@

I'm going to use \code{scantwo} with \code{method="em"}.  First I run
\code{calc.genoprob}, and then \code{scantwo} as before.

<<scantwo,eval=FALSE>>=
hyper <- calc.genoprob(hyper, step=2.5)
out2 <- scantwo(hyper)
@

The output, in this new version of R/qtl, is slightly different.
The LOD scores for the full model (two QTL plus interaction) are there
as before, but in place of the epistasis LOD scores, I save the LOD
scores for the additive QTL model.  Also, we now always run the
single-QTL analysis with \code{scanone}, since the results are
necessary to make sense of the output of \code{scantwo}.

The big change is in \code{summary.scantwo}.  Consider a pair of
positions in the genome, $s$ and $t$.  We consider four models.
\begin{eqnarray*}
\text{Full: } && y = \mu + \beta_1 q_1 + \beta_2 q_2 + \beta_3 (q_1
\times q_2) + \epsilon \\
\text{Add: } && y = \mu + \beta_1 q_1 + \beta_2 q_2 + \epsilon \\
\text{One: } && y = \mu + \beta_1 q_1 + \epsilon \\
\text{Null: } && y = \mu + \epsilon
\end{eqnarray*}

Let $l_f(s,t)$ be the log$_{10}$ likelihood for the full model with QTL
at $s$ and $t$, $l_a(s,t)$ be the log$_{10}$ likelihood for the additive
model with QTL at $s$ and $t$, $l_1(s)$ be the log$_{10}$ likelihood
for the single-QTL model with the QTL at $s$, and $l_0$ be the
log$_{10}$ likelihood under the null (with no QTL).

Define the LOD scores as follows.
\begin{eqnarray*}
\lod_f(s,t) & = & l_f(s,t) - l_0 \\
\lod_a(s,t) & = & l_a(s,t) - l_0 \\
\lod_1(s) & = & l_1(s) - l_0 
\end{eqnarray*}

Now for the new part.  Following a suggestion from Gary Churchill, we
consider a pair of chromosomes $j$ and $k$.  (We include the case
$j=k$.)  Let $c(s)$ denote the chromosome for position $s$.  We now
consider the maximum LOD scores over that pair of chromosomes.
\begin{eqnarray*}
M_f(j,k) & = & \max_{c(s)=j, c(t)=k} \lod_f(s,t) \\
M_a(j,k) & = & \max_{c(s)=j, c(t)=k} \lod_a(s,t) \\
M_1(j,k) & = & \max_{c(s)=j \text{ or } k} \lod_1(s)
\end{eqnarray*}

So $M_f(j,k)$ is the log$_{10}$ likelihood ratio comparing the full
model with QTL on chromosomes $j$ and $k$ to the null model, and
$M_a(j,k)$ is the analogous thing for the additive model.  Note that
the pair of positions at which the full model is maximized may be
different from the pair of positions at which the additive model is
maximized.  $M_1(j,k)$ is the log$_{10}$ likelihood ratio comparing
the model with a single QTL on either chromosomes $j$ or $k$ to the
null model.

We derive three further LOD scores from the above.
\begin{eqnarray*}
M_i(j,k) & = & M_f(j,k) - M_a(j,k) \\
M_{fv1}(j,k) & = & M_f(j,k) - M_1(j,k) \\
M_{av1}(j,k) & = & M_a(j,k) - M_1(j,k) 
\end{eqnarray*}

$M_i(j,k)$ is the log$_{10}$ likelihood ratio comparing the full model
with QTL on chromosomes $j$ and $k$ to the additive model with QTL on
chromosomes $j$ and $k$, and so indicates evidence for an interaction
between QTL on chromosomes $j$ and $k$, assuming that there is precisely
one QTL on each chromosome (or, for $j=k$, that there are two QTL
on the chromosome).

$M_{fv1}(j,k)$ is the log$_{10}$ likelihood ratio comparing the full model
with QTL on chromosomes $j$ and $k$ to the single-QTL model, with a
single QTL on either chromosome $j$ or $k$.  Thus, it indicates
evidence for a second QTL, allowing for the possibility of epistasis.

$M_{av1}(j,k)$ is the log$_{10}$ likelihood ratio comparing the additive model
with QTL on chromosomes $j$ and $k$ to the single-QTL model, with a
single QTL on either chromosome $j$ or $k$.  Thus, it indicates
evidence for a second QTL, assuming no epistasis.

In \code{summary.scantwo}, we must provide thresholds for each of the
five LOD scores, $M_f(j,k)$, $M_{fv1}(j,k)$, $M_i(j,k)$, $M_a(j,k)$
and $M_{av1}(j,k)$.  A pair of chromosomes $(j,k)$ is reported as
interesting if either of the following holds:
\begin{itemize}
\item $M_f(j,k) \ge T_f$ and [$M_{fv1}(j,k) \ge
  T_{fv1}$ or $M_i(j,k) \ge T_i$]
\item $M_a(j,k) \ge T_a$ and $M_{av1}(j,k) \ge T_{av1}$
\end{itemize}

I'm inclined towards ignoring $M_i(j,k)$ in this rule (i.e. setting
$T_i = \infty$), and using a common significance level ($\alpha$ = 5 or
10\%) for the four remaining thresholds.

By default, \code{summary.scantwo} now calculates the five LOD scores
above, keeping track of the positions at which $M_f(j,k)$ and
$M_a(j,k)$ were maximized.  It either prints the best results on all
pairs of chromosomes, or we must provide five thresholds ($T_f$,
$T_{fv1}$, $T_i$, $T_a$ and $T_{av1}$, in that order).

The thresholds can be obtained by a permutation test (see below), but
this is extremely time-consuming.  For a mouse backcross, we suggest
the thresholds (6.0, 4.7, 4.4, 4.7, 2.6) for the full,
conditional-interactive, interaction, additive, and
conditional-additive LOD scores, respectively.  For a mouse
intercross, we suggest the thresholds (9.1, 7.1, 6.3, 6.3, 3.3) for the
full, conditional-interactive, interaction, additive, and
conditional-additive LOD scores, respectively.  These were obtained by
10,000 simulations of crosses with 250 individuals, markers at a 10 cM
spacing, and analysis by Haley-Knott regression.

<<summaryscantwoA>>=
summary(out2, thresholds=c(6.0, 4.7, 4.4, 4.7, 2.6))
@

$M_f$, $M_{fv1}$ and $M_i$ are \code{lod.full}, \code{lod.fv1} and
\code{lod.int}, respectively, and correspond to positions \code{pos1f}
and \code{pos2f}.

$M_a$ and $M_{av1}$ are \code{lod.add} and \code{lod.av1},
respectively, and correspond to positions \code{pos1a} and
\code{pos2a}.

The above is the default output, with \code{what="best"}.  The
argument \code{what} may also be given as \code{"full"}, \code{"add"},
or \code{"int"}, in which case, for each pair of chromosomes, we pull
out the pair of positions with maximum full, additive, or interactive
LOD score, respectively, and calculate, for example, the interaction
LOD score as the difference between the full and additive LOD scores
\emph{for that fixed pair of positions}, rather than allow the full
and additive models to be maximized at different positions.  (This is
more like what we did before, and is included just for completeness.)
The same set of five thresholds is required.

<<summaryscantwoB>>=
summary(out2, thresholds=c(6.0, 4.7, 4.4, 4.7, 2.6), what="full")
summary(out2, thresholds=c(6.0, 4.7, 4.4, 4.7, 2.6), what="add")
summary(out2, thresholds=c(6.0, 4.7, 4.4, 4.7, 2.6), what="int")
@

One may also restrict the summary to just the case of $j=k$, to look
at evidence for linked QTL on each chromosome, by using
\code{allpairs=FALSE}.

<<summaryscantwoC>>=
summary(out2, allpairs=FALSE)
@

Note also that the degrees of freedom associated with each LOD score
may be displayed, via \code{df=TRUE}:

<<summaryscantwoD>>=
summary(out2, thresholds=c(6.0, 4.7, 4.4, 4.7, 2.6), df=TRUE)
@

The old version of \code{summary.scantwo} is still available, though
it is now called \code{summaryScantwoOld}.

<<oldsummaryscantwo>>=
summaryScantwoOld(out2, thresholds=c(6, 4, 4))
@

The permutation test with \code{scantwo} has also changed.  At each
permutation replicate, we record the maxima for each of the
$M_f(j,k)$, $M_{fv1}(j,k)$, $M_i(j,k)$, $M_a(j,k)$ and
$M_{av1}(j,k)$.  The output is given class \code{"scantwoperm"}, and
there is a \code{summary.scantwoperm} function for getting LOD
thresholds.  These permutations can take a very long time, and so one
would generally use a multi-processor computer or cluster and do
multiple shorter runs in parallel.  And so we have added a function
\code{c.scantwoperm} for combining such runs together.

%In addition, we have added an argument \code{perm.strata} for doing
%stratified permutation tests.  In the \code{hyper} data, the extremes
%were genotyped at most markers, and the rest of the individuals were
%genotyped only in selected regions.  This suggests that one should do
%a stratified permutation test, permuting individuals within
%the more completely genotyped group and separately permuting those
%within the less completely genotyped group.

We could perform the \code{scantwo} permutations in five batches, as
follows.  
<<scantwoperm,eval=FALSE>>=
operm2A <- scantwo(hyper, n.perm=200)
operm2B <- scantwo(hyper, n.perm=200)
operm2C <- scantwo(hyper, n.perm=200)
operm2D <- scantwo(hyper, n.perm=200)
operm2E <- scantwo(hyper, n.perm=200)
operm2 <- c(operm2A, operm2B, operm2C, operm2D, operm2E)
@

The 5 and 20\% thresholds could be calculated as follows.
<<summaryscantwoperm>>=
summary(operm2, alpha=c(0.05,0.20))
@

The permutation results may also be used within the
\code{summary.scantwo} function, to automatically calculate the
thresholds for desired significance levels.  In this case, rather than
provide \code{thresholds}, one provides \code{alphas}, which again
should be a vector of length 5, giving the significance levels for 
$M_f(j,k)$, $M_{fv1}(j,k)$, $M_i(j,k)$, $M_a(j,k)$ and
$M_{av1}(j,k)$, in that order.
<<summaryscantwopermB>>=
summary(out2, perms=operm2, alphas=rep(0.05, 5))
@

My version of the decision rule, in which $M_{i}(j,k)$
is ignored, could be obtained as follows:
<<summaryscantwopermC>>=
summary(out2, perms=operm2, alphas=c(0.05, 0.05, 0, 0.05, 0.05))
@

In the case that permutation results are provided,
genome-scan-adjusted p-values may also be displayed, via 
\code{pvalues=TRUE}:

<<summaryscantwoD>>=
summary(out2, perms=operm2, alphas=c(0.05, 0.05, 0, 0.05, 0.05), 
        pvalues=TRUE)
@

I have also made an important change in \code{plot.scantwo}.  The
arguments \code{upper} and \code{lower} control what is plotted in the
upper-left and lower-right triangles, respectively.
The options are \code{"full"}, \code{"add"}, \code{"cond-int"}, 
\code{"cond-add"}, and \code{"int"}.  The case \code{"full"} is what
was previously called \code{"joint"}, but this and the case
\code{"add"} are not changed; the LOD scores for the full model (two
QTLs plus interaction) and the additive model are displayed.

The other two cases, \code{"cond-int"}, and
\code{"cond-add"}, are quite different.  
We now plot the following LOD scores:
\begin{eqnarray*}
\text{\code{"cond-int"}: } && \lod_{fv1}(s,t) = \lod_f(s,t) - M_1[c(s), c(t)] \\
\text{\code{"cond-add"}: } && \lod_{av1}(s,t) = \lod_a(s,t) - M_1[c(s), c(t)] 
\end{eqnarray*}
When these values are negative, they are replaced with 0.  
Before, we had subtracted off the maximum of the single-QTL LOD scores
at the points $s$ and $t$.  Now we subtract off the maximum of the
single-QTL LOD scores for the chromosomes containing $s$ and $t$.

Note that these LOD scores will be maximized at the same positions as
$\lod_f$ and $\lod_a$.  Indeed, except for the negative values being
changed to 0's, they will have the same shape as $\lod_f$ and
$\lod_a$.

In the following code, we plot $\lod_f$ in the lower triangle and
$\lod_i$ in the upper triangle, for
chromosomes 1, 4, 6, and 15.  The result appears in Figure~\ref{scantwofull}.
<<plotscantwoA,eval=FALSE>>=
plot(out2, chr=c(1,4,6,15))
@

\begin{figure}
\centering
<<plotscantwoAplot,fig=TRUE,echo=FALSE,height=5>>=
plot(out2, chr=c(1,4,6,15),layout=list(cbind(1,2),c(5,1)),
     mar1=c(4,4,0,0)+0.1, mar2=c(4,2,0,2)+0.1)
@ 
\caption{LOD scores for selected chromosomes for a two-dimensional
  scan with the \code{hyper} data.  Epistasis LOD scores are in the
  upper triangle and $\lod_f$ is in the lower triangle.\label{scantwofull}}
\end{figure}

The same plot, but with the $fv1$-type LOD scores in the upper
triangle, would be obtained as follows.  The result appears in
Figure~\ref{scantwocondint}. 
<<plotscantwoB,eval=FALSE>>=
plot(out2, chr=c(1,4,6,15), upper="cond-int")
@

\begin{figure}
\centering
<<plotscantwoBplot,fig=TRUE,echo=FALSE,height=5>>=
plot(out2, chr=c(1,4,6,15), upper="cond-int", 
     layout=list(cbind(1,2),c(5,1)),
     mar1=c(4,4,0,0)+0.1, mar2=c(4,2,0,2)+0.1)
@
\caption{LOD scores for selected chromosomes for a two-dimensional
  scan with the \code{hyper} data.  $\lod_{fv1}$ is in the upper
  triangle and $\lod_f$ is in the lower
  triangle.\label{scantwocondint}}  
\end{figure}

\end{document}