EVOLUTION-MANAGER
Edit File: prcomp.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Principal Components Analysis</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for prcomp {stats}"><tr><td>prcomp {stats}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Principal Components Analysis</h2> <h3>Description</h3> <p>Performs a principal components analysis on the given data matrix and returns the results as an object of class <code>prcomp</code>.</p> <h3>Usage</h3> <pre> prcomp(x, ...) ## S3 method for class 'formula' prcomp(formula, data = NULL, subset, na.action, ...) ## Default S3 method: prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE, tol = NULL, rank. = NULL, ...) ## S3 method for class 'prcomp' predict(object, newdata, ...) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>formula</code></td> <td> <p>a formula with no response variable, referring only to numeric variables.</p> </td></tr> <tr valign="top"><td><code>data</code></td> <td> <p>an optional data frame (or similar: see <code><a href="model.frame.html">model.frame</a></code>) containing the variables in the formula <code>formula</code>. By default the variables are taken from <code>environment(formula)</code>.</p> </td></tr> <tr valign="top"><td><code>subset</code></td> <td> <p>an optional vector used to select rows (observations) of the data matrix <code>x</code>.</p> </td></tr> <tr valign="top"><td><code>na.action</code></td> <td> <p>a function which indicates what should happen when the data contain <code>NA</code>s. The default is set by the <code>na.action</code> setting of <code><a href="../../base/html/options.html">options</a></code>, and is <code><a href="na.fail.html">na.fail</a></code> if that is unset. The ‘factory-fresh’ default is <code><a href="na.fail.html">na.omit</a></code>.</p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p>arguments passed to or from other methods. If <code>x</code> is a formula one might specify <code>scale.</code> or <code>tol</code>.</p> </td></tr> <tr valign="top"><td><code>x</code></td> <td> <p>a numeric or complex matrix (or data frame) which provides the data for the principal components analysis.</p> </td></tr> <tr valign="top"><td><code>retx</code></td> <td> <p>a logical value indicating whether the rotated variables should be returned.</p> </td></tr> <tr valign="top"><td><code>center</code></td> <td> <p>a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of <code>x</code> can be supplied. The value is passed to <code>scale</code>.</p> </td></tr> <tr valign="top"><td><code>scale.</code></td> <td> <p>a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is <code>FALSE</code> for consistency with S, but in general scaling is advisable. Alternatively, a vector of length equal the number of columns of <code>x</code> can be supplied. The value is passed to <code><a href="../../base/html/scale.html">scale</a></code>.</p> </td></tr> <tr valign="top"><td><code>tol</code></td> <td> <p>a value indicating the magnitude below which components should be omitted. (Components are omitted if their standard deviations are less than or equal to <code>tol</code> times the standard deviation of the first component.) With the default null setting, no components are omitted (unless <code>rank.</code> is specified less than <code>min(dim(x))</code>.). Other settings for tol could be <code>tol = 0</code> or <code>tol = sqrt(.Machine$double.eps)</code>, which would omit essentially constant components.</p> </td></tr> <tr valign="top"><td><code>rank.</code></td> <td> <p>optionally, a number specifying the maximal rank, i.e., maximal number of principal components to be used. Can be set as alternative or in addition to <code>tol</code>, useful notably when the desired rank is considerably smaller than the dimensions of the matrix.</p> </td></tr> <tr valign="top"><td><code>object</code></td> <td> <p>object of class inheriting from <code>"prcomp"</code></p> </td></tr> <tr valign="top"><td><code>newdata</code></td> <td> <p>An optional data frame or matrix in which to look for variables with which to predict. If omitted, the scores are used. If the original fit used a formula or a data frame or a matrix with column names, <code>newdata</code> must contain columns with the same names. Otherwise it must contain the same number of columns, to be used in the same order. </p> </td></tr> </table> <h3>Details</h3> <p>The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by using <code>eigen</code> on the covariance matrix. This is generally the preferred method for numerical accuracy. The <code>print</code> method for these objects prints the results in a nice format and the <code>plot</code> method produces a scree plot. </p> <p>Unlike <code><a href="princomp.html">princomp</a></code>, variances are computed with the usual divisor <i>N - 1</i>. </p> <p>Note that <code>scale = TRUE</code> cannot be used if there are zero or constant (for <code>center = TRUE</code>) variables. </p> <h3>Value</h3> <p><code>prcomp</code> returns a list with class <code>"prcomp"</code> containing the following components: </p> <table summary="R valueblock"> <tr valign="top"><td><code>sdev</code></td> <td> <p>the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix).</p> </td></tr> <tr valign="top"><td><code>rotation</code></td> <td> <p>the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors). The function <code>princomp</code> returns this in the element <code>loadings</code>.</p> </td></tr> <tr valign="top"><td><code>x</code></td> <td> <p>if <code>retx</code> is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the <code>rotation</code> matrix) is returned. Hence, <code>cov(x)</code> is the diagonal matrix <code>diag(sdev^2)</code>. For the formula method, <code><a href="nafns.html">napredict</a>()</code> is applied to handle the treatment of values omitted by the <code>na.action</code>.</p> </td></tr> <tr valign="top"><td><code>center, scale</code></td> <td> <p>the centering and scaling used, or <code>FALSE</code>.</p> </td></tr> </table> <h3>Note</h3> <p>The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span>. </p> <h3>References</h3> <p>Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) <em>The New S Language</em>. Wadsworth & Brooks/Cole. </p> <p>Mardia, K. V., J. T. Kent, and J. M. Bibby (1979) <em>Multivariate Analysis</em>, London: Academic Press. </p> <p>Venables, W. N. and B. D. Ripley (2002) <em>Modern Applied Statistics with S</em>, Springer-Verlag. </p> <h3>See Also</h3> <p><code><a href="biplot.princomp.html">biplot.prcomp</a></code>, <code><a href="screeplot.html">screeplot</a></code>, <code><a href="princomp.html">princomp</a></code>, <code><a href="cor.html">cor</a></code>, <code><a href="cor.html">cov</a></code>, <code><a href="../../base/html/svd.html">svd</a></code>, <code><a href="../../base/html/eigen.html">eigen</a></code>. </p> <h3>Examples</h3> <pre> C <- chol(S <- toeplitz(.9 ^ (0:31))) # Cov.matrix and its root all.equal(S, crossprod(C)) set.seed(17) X <- matrix(rnorm(32000), 1000, 32) Z <- X %*% C ## ==> cov(Z) ~= C'C = S all.equal(cov(Z), S, tol = 0.08) pZ <- prcomp(Z, tol = 0.1) summary(pZ) # only ~14 PCs (out of 32) ## or choose only 3 PCs more directly: pz3 <- prcomp(Z, rank. = 3) summary(pz3) # same numbers as the first 3 above stopifnot(ncol(pZ$rotation) == 14, ncol(pz3$rotation) == 3, all.equal(pz3$sdev, pZ$sdev, tol = 1e-15)) # exactly equal typically ## signs are random require(graphics) ## the variances of the variables in the ## USArrests data vary by orders of magnitude, so scaling is appropriate prcomp(USArrests) # inappropriate prcomp(USArrests, scale = TRUE) prcomp(~ Murder + Assault + Rape, data = USArrests, scale = TRUE) plot(prcomp(USArrests)) summary(prcomp(USArrests, scale = TRUE)) biplot(prcomp(USArrests, scale = TRUE)) </pre> <hr /><div style="text-align: center;">[Package <em>stats</em> version 3.6.0 <a href="00Index.html">Index</a>]</div> </body></html>