EVOLUTION-MANAGER
Edit File: princomp.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Principal Components Analysis</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for princomp {stats}"><tr><td>princomp {stats}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Principal Components Analysis</h2> <h3>Description</h3> <p><code>princomp</code> performs a principal components analysis on the given numeric data matrix and returns the results as an object of class <code>princomp</code>. </p> <h3>Usage</h3> <pre> princomp(x, ...) ## S3 method for class 'formula' princomp(formula, data = NULL, subset, na.action, ...) ## Default S3 method: princomp(x, cor = FALSE, scores = TRUE, covmat = NULL, subset = rep_len(TRUE, nrow(as.matrix(x))), fix_sign = TRUE, ...) ## S3 method for class 'princomp' predict(object, newdata, ...) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>formula</code></td> <td> <p>a formula with no response variable, referring only to numeric variables.</p> </td></tr> <tr valign="top"><td><code>data</code></td> <td> <p>an optional data frame (or similar: see <code><a href="model.frame.html">model.frame</a></code>) containing the variables in the formula <code>formula</code>. By default the variables are taken from <code>environment(formula)</code>.</p> </td></tr> <tr valign="top"><td><code>subset</code></td> <td> <p>an optional vector used to select rows (observations) of the data matrix <code>x</code>.</p> </td></tr> <tr valign="top"><td><code>na.action</code></td> <td> <p>a function which indicates what should happen when the data contain <code>NA</code>s. The default is set by the <code>na.action</code> setting of <code><a href="../../base/html/options.html">options</a></code>, and is <code><a href="na.fail.html">na.fail</a></code> if that is unset. The ‘factory-fresh’ default is <code><a href="na.fail.html">na.omit</a></code>.</p> </td></tr> <tr valign="top"><td><code>x</code></td> <td> <p>a numeric matrix or data frame which provides the data for the principal components analysis.</p> </td></tr> <tr valign="top"><td><code>cor</code></td> <td> <p>a logical value indicating whether the calculation should use the correlation matrix or the covariance matrix. (The correlation matrix can only be used if there are no constant variables.)</p> </td></tr> <tr valign="top"><td><code>scores</code></td> <td> <p>a logical value indicating whether the score on each principal component should be calculated.</p> </td></tr> <tr valign="top"><td><code>covmat</code></td> <td> <p>a covariance matrix, or a covariance list as returned by <code><a href="cov.wt.html">cov.wt</a></code> (and <code><a href="../../MASS/html/cov.rob.html">cov.mve</a></code> or <code><a href="../../MASS/html/cov.rob.html">cov.mcd</a></code> from package <a href="https://CRAN.R-project.org/package=MASS"><span class="pkg">MASS</span></a>). If supplied, this is used rather than the covariance matrix of <code>x</code>.</p> </td></tr> <tr valign="top"><td><code>fix_sign</code></td> <td> <p>Should the signs of the loadings and scores be chosen so that the first element of each loading is non-negative?</p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p>arguments passed to or from other methods. If <code>x</code> is a formula one might specify <code>cor</code> or <code>scores</code>.</p> </td></tr> <tr valign="top"><td><code>object</code></td> <td> <p>Object of class inheriting from <code>"princomp"</code>.</p> </td></tr> <tr valign="top"><td><code>newdata</code></td> <td> <p>An optional data frame or matrix in which to look for variables with which to predict. If omitted, the scores are used. If the original fit used a formula or a data frame or a matrix with column names, <code>newdata</code> must contain columns with the same names. Otherwise it must contain the same number of columns, to be used in the same order. </p> </td></tr> </table> <h3>Details</h3> <p><code>princomp</code> is a generic function with <code>"formula"</code> and <code>"default"</code> methods. </p> <p>The calculation is done using <code><a href="../../base/html/eigen.html">eigen</a></code> on the correlation or covariance matrix, as determined by <code><a href="cor.html">cor</a></code>. This is done for compatibility with the S-PLUS result. A preferred method of calculation is to use <code><a href="../../base/html/svd.html">svd</a></code> on <code>x</code>, as is done in <code>prcomp</code>. </p> <p>Note that the default calculation uses divisor <code>N</code> for the covariance matrix. </p> <p>The <code><a href="../../base/html/print.html">print</a></code> method for these objects prints the results in a nice format and the <code><a href="../../graphics/html/plot.html">plot</a></code> method produces a scree plot (<code><a href="screeplot.html">screeplot</a></code>). There is also a <code><a href="biplot.html">biplot</a></code> method. </p> <p>If <code>x</code> is a formula then the standard NA-handling is applied to the scores (if requested): see <code><a href="nafns.html">napredict</a></code>. </p> <p><code>princomp</code> only handles so-called R-mode PCA, that is feature extraction of variables. If a data matrix is supplied (possibly via a formula) it is required that there are at least as many units as variables. For Q-mode PCA use <code><a href="prcomp.html">prcomp</a></code>. </p> <h3>Value</h3> <p><code>princomp</code> returns a list with class <code>"princomp"</code> containing the following components: </p> <table summary="R valueblock"> <tr valign="top"><td><code>sdev</code></td> <td> <p>the standard deviations of the principal components.</p> </td></tr> <tr valign="top"><td><code>loadings</code></td> <td> <p>the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors). This is of class <code>"loadings"</code>: see <code><a href="loadings.html">loadings</a></code> for its <code>print</code> method.</p> </td></tr> <tr valign="top"><td><code>center</code></td> <td> <p>the means that were subtracted.</p> </td></tr> <tr valign="top"><td><code>scale</code></td> <td> <p>the scalings applied to each variable.</p> </td></tr> <tr valign="top"><td><code>n.obs</code></td> <td> <p>the number of observations.</p> </td></tr> <tr valign="top"><td><code>scores</code></td> <td> <p>if <code>scores = TRUE</code>, the scores of the supplied data on the principal components. These are non-null only if <code>x</code> was supplied, and if <code>covmat</code> was also supplied if it was a covariance list. For the formula method, <code><a href="nafns.html">napredict</a>()</code> is applied to handle the treatment of values omitted by the <code>na.action</code>.</p> </td></tr> <tr valign="top"><td><code>call</code></td> <td> <p>the matched call.</p> </td></tr> <tr valign="top"><td><code>na.action</code></td> <td> <p>If relevant.</p> </td></tr> </table> <h3>Note</h3> <p>The signs of the columns of the loadings and scores are arbitrary, and so may differ between different programs for PCA, and even between different builds of <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span>: <code>fix_sign = TRUE</code> alleviates that. </p> <h3>References</h3> <p>Mardia, K. V., J. T. Kent and J. M. Bibby (1979). <em>Multivariate Analysis</em>, London: Academic Press. </p> <p>Venables, W. N. and B. D. Ripley (2002). <em>Modern Applied Statistics with S</em>, Springer-Verlag. </p> <h3>See Also</h3> <p><code><a href="summary.princomp.html">summary.princomp</a></code>, <code><a href="screeplot.html">screeplot</a></code>, <code><a href="biplot.princomp.html">biplot.princomp</a></code>, <code><a href="prcomp.html">prcomp</a></code>, <code><a href="cor.html">cor</a></code>, <code><a href="cor.html">cov</a></code>, <code><a href="../../base/html/eigen.html">eigen</a></code>. </p> <h3>Examples</h3> <pre> require(graphics) ## The variances of the variables in the ## USArrests data vary by orders of magnitude, so scaling is appropriate (pc.cr <- princomp(USArrests)) # inappropriate princomp(USArrests, cor = TRUE) # =^= prcomp(USArrests, scale=TRUE) ## Similar, but different: ## The standard deviations differ by a factor of sqrt(49/50) summary(pc.cr <- princomp(USArrests, cor = TRUE)) loadings(pc.cr) # note that blank entries are small but not zero ## The signs of the columns of the loadings are arbitrary plot(pc.cr) # shows a screeplot. biplot(pc.cr) ## Formula interface princomp(~ ., data = USArrests, cor = TRUE) ## NA-handling USArrests[1, 2] <- NA pc.cr <- princomp(~ Murder + Assault + UrbanPop, data = USArrests, na.action = na.exclude, cor = TRUE) pc.cr$scores[1:5, ] ## (Simple) Robust PCA: ## Classical: (pc.cl <- princomp(stackloss)) ## Robust: (pc.rob <- princomp(stackloss, covmat = MASS::cov.rob(stackloss))) </pre> <hr /><div style="text-align: center;">[Package <em>stats</em> version 3.6.0 <a href="00Index.html">Index</a>]</div> </body></html>