EVOLUTION-MANAGER
Edit File: stepAIC.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Choose a model by AIC in a Stepwise Algorithm</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for stepAIC {MASS}"><tr><td>stepAIC {MASS}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2> Choose a model by AIC in a Stepwise Algorithm </h2> <h3>Description</h3> <p>Performs stepwise model selection by AIC. </p> <h3>Usage</h3> <pre> stepAIC(object, scope, scale = 0, direction = c("both", "backward", "forward"), trace = 1, keep = NULL, steps = 1000, use.start = FALSE, k = 2, ...) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>object</code></td> <td> <p>an object representing a model of an appropriate class. This is used as the initial model in the stepwise search. </p> </td></tr> <tr valign="top"><td><code>scope</code></td> <td> <p>defines the range of models examined in the stepwise search. This should be either a single formula, or a list containing components <code>upper</code> and <code>lower</code>, both formulae. See the details for how to specify the formulae and how they are used. </p> </td></tr> <tr valign="top"><td><code>scale</code></td> <td> <p>used in the definition of the AIC statistic for selecting the models, currently only for <code><a href="../../stats/html/lm.html">lm</a></code> and <code><a href="../../stats/html/aov.html">aov</a></code> models (see <code><a href="../../stats/html/extractAIC.html">extractAIC</a></code> for details). </p> </td></tr> <tr valign="top"><td><code>direction</code></td> <td> <p>the mode of stepwise search, can be one of <code>"both"</code>, <code>"backward"</code>, or <code>"forward"</code>, with a default of <code>"both"</code>. If the <code>scope</code> argument is missing the default for <code>direction</code> is <code>"backward"</code>. </p> </td></tr> <tr valign="top"><td><code>trace</code></td> <td> <p>if positive, information is printed during the running of <code>stepAIC</code>. Larger values may give more information on the fitting process. </p> </td></tr> <tr valign="top"><td><code>keep</code></td> <td> <p>a filter function whose input is a fitted model object and the associated <code>AIC</code> statistic, and whose output is arbitrary. Typically <code>keep</code> will select a subset of the components of the object and return them. The default is not to keep anything. </p> </td></tr> <tr valign="top"><td><code>steps</code></td> <td> <p>the maximum number of steps to be considered. The default is 1000 (essentially as many as required). It is typically used to stop the process early. </p> </td></tr> <tr valign="top"><td><code>use.start</code></td> <td> <p>if true the updated fits are done starting at the linear predictor for the currently selected model. This may speed up the iterative calculations for <code>glm</code> (and other fits), but it can also slow them down. <b>Not used</b> in <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span>. </p> </td></tr> <tr valign="top"><td><code>k</code></td> <td> <p>the multiple of the number of degrees of freedom used for the penalty. Only <code>k = 2</code> gives the genuine AIC: <code>k = log(n)</code> is sometimes referred to as BIC or SBC. </p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p>any additional arguments to <code>extractAIC</code>. (None are currently used.) </p> </td></tr></table> <h3>Details</h3> <p>The set of models searched is determined by the <code>scope</code> argument. The right-hand-side of its <code>lower</code> component is always included in the model, and right-hand-side of the model is included in the <code>upper</code> component. If <code>scope</code> is a single formula, it specifies the <code>upper</code> component, and the <code>lower</code> model is empty. If <code>scope</code> is missing, the initial model is used as the <code>upper</code> model. </p> <p>Models specified by <code>scope</code> can be templates to update <code>object</code> as used by <code><a href="../../stats/html/update.formula.html">update.formula</a></code>. </p> <p>There is a potential problem in using <code><a href="../../stats/html/glm.html">glm</a></code> fits with a variable <code>scale</code>, as in that case the deviance is not simply related to the maximized log-likelihood. The <code>glm</code> method for <code><a href="../../stats/html/extractAIC.html">extractAIC</a></code> makes the appropriate adjustment for a <code>gaussian</code> family, but may need to be amended for other cases. (The <code>binomial</code> and <code>poisson</code> families have fixed <code>scale</code> by default and do not correspond to a particular maximum-likelihood problem for variable <code>scale</code>.) </p> <p>Where a conventional deviance exists (e.g. for <code>lm</code>, <code>aov</code> and <code>glm</code> fits) this is quoted in the analysis of variance table: it is the <em>unscaled</em> deviance. </p> <h3>Value</h3> <p>the stepwise-selected model is returned, with up to two additional components. There is an <code>"anova"</code> component corresponding to the steps taken in the search, as well as a <code>"keep"</code> component if the <code>keep=</code> argument was supplied in the call. The <code>"Resid. Dev"</code> column of the analysis of deviance table refers to a constant minus twice the maximized log likelihood: it will be a deviance only in cases where a saturated model is well-defined (thus excluding <code>lm</code>, <code>aov</code> and <code>survreg</code> fits, for example). </p> <h3>Note</h3> <p>The model fitting must apply the models to the same dataset. This may be a problem if there are missing values and an <code>na.action</code> other than <code>na.fail</code> is used (as is the default in <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span>). We suggest you remove the missing values first. </p> <h3>References</h3> <p>Venables, W. N. and Ripley, B. D. (2002) <em>Modern Applied Statistics with S.</em> Fourth edition. Springer. </p> <h3>See Also</h3> <p><code><a href="addterm.html">addterm</a></code>, <code><a href="dropterm.html">dropterm</a></code>, <code><a href="../../stats/html/step.html">step</a></code> </p> <h3>Examples</h3> <pre> quine.hi <- aov(log(Days + 2.5) ~ .^4, quine) quine.nxt <- update(quine.hi, . ~ . - Eth:Sex:Age:Lrn) quine.stp <- stepAIC(quine.nxt, scope = list(upper = ~Eth*Sex*Age*Lrn, lower = ~1), trace = FALSE) quine.stp$anova cpus1 <- cpus for(v in names(cpus)[2:7]) cpus1[[v]] <- cut(cpus[[v]], unique(quantile(cpus[[v]])), include.lowest = TRUE) cpus0 <- cpus1[, 2:8] # excludes names, authors' predictions cpus.samp <- sample(1:209, 100) cpus.lm <- lm(log10(perf) ~ ., data = cpus1[cpus.samp,2:8]) cpus.lm2 <- stepAIC(cpus.lm, trace = FALSE) cpus.lm2$anova example(birthwt) birthwt.glm <- glm(low ~ ., family = binomial, data = bwt) birthwt.step <- stepAIC(birthwt.glm, trace = FALSE) birthwt.step$anova birthwt.step2 <- stepAIC(birthwt.glm, ~ .^2 + I(scale(age)^2) + I(scale(lwt)^2), trace = FALSE) birthwt.step2$anova quine.nb <- glm.nb(Days ~ .^4, data = quine) quine.nb2 <- stepAIC(quine.nb) quine.nb2$anova </pre> <hr /><div style="text-align: center;">[Package <em>MASS</em> version 7.3-51.4 <a href="00Index.html">Index</a>]</div> </body></html>