EVOLUTION-MANAGER
Edit File: factor.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Factors</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for factor {base}"><tr><td>factor {base}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Factors</h2> <h3>Description</h3> <p>The function <code>factor</code> is used to encode a vector as a factor (the terms ‘category’ and ‘enumerated type’ are also used for factors). If argument <code>ordered</code> is <code>TRUE</code>, the factor levels are assumed to be ordered. For compatibility with S there is also a function <code>ordered</code>. </p> <p><code>is.factor</code>, <code>is.ordered</code>, <code>as.factor</code> and <code>as.ordered</code> are the membership and coercion functions for these classes. </p> <h3>Usage</h3> <pre> factor(x = character(), levels, labels = levels, exclude = NA, ordered = is.ordered(x), nmax = NA) ordered(x, ...) is.factor(x) is.ordered(x) as.factor(x) as.ordered(x) addNA(x, ifany = FALSE) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x</code></td> <td> <p>a vector of data, usually taking a small number of distinct values.</p> </td></tr> <tr valign="top"><td><code>levels</code></td> <td> <p>an optional vector of the unique values (as character strings) that <code>x</code> might have taken. The default is the unique set of values taken by <code><a href="character.html">as.character</a>(x)</code>, sorted into increasing order <em>of <code>x</code></em>. Note that this set can be specified as smaller than <code>sort(unique(x))</code>.</p> </td></tr> <tr valign="top"><td><code>labels</code></td> <td> <p><em>either</em> an optional character vector of labels for the levels (in the same order as <code>levels</code> after removing those in <code>exclude</code>), <em>or</em> a character string of length 1. Duplicated values in <code>labels</code> can be used to map different values of <code>x</code> to the same factor level.</p> </td></tr> <tr valign="top"><td><code>exclude</code></td> <td> <p>a vector of values to be excluded when forming the set of levels. This may be factor with the same level set as <code>x</code> or should be a <code>character</code>.</p> </td></tr> <tr valign="top"><td><code>ordered</code></td> <td> <p>logical flag to determine if the levels should be regarded as ordered (in the order given).</p> </td></tr> <tr valign="top"><td><code>nmax</code></td> <td> <p>an upper bound on the number of levels; see ‘Details’.</p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p>(in <code>ordered(.)</code>): any of the above, apart from <code>ordered</code> itself.</p> </td></tr> <tr valign="top"><td><code>ifany</code></td> <td> <p>only add an <code>NA</code> level if it is used, i.e. if <code>any(is.na(x))</code>.</p> </td></tr> </table> <h3>Details</h3> <p>The type of the vector <code>x</code> is not restricted; it only must have an <code><a href="character.html">as.character</a></code> method and be sortable (by <code><a href="order.html">order</a></code>). </p> <p>Ordered factors differ from factors only in their class, but methods and the model-fitting functions treat the two classes quite differently. </p> <p>The encoding of the vector happens as follows. First all the values in <code>exclude</code> are removed from <code>levels</code>. If <code>x[i]</code> equals <code>levels[j]</code>, then the <code>i</code>-th element of the result is <code>j</code>. If no match is found for <code>x[i]</code> in <code>levels</code> (which will happen for excluded values) then the <code>i</code>-th element of the result is set to <code><a href="NA.html">NA</a></code>. </p> <p>Normally the ‘levels’ used as an attribute of the result are the reduced set of levels after removing those in <code>exclude</code>, but this can be altered by supplying <code>labels</code>. This should either be a set of new labels for the levels, or a character string, in which case the levels are that character string with a sequence number appended. </p> <p><code>factor(x, exclude = NULL)</code> applied to a factor without <code><a href="NA.html">NA</a></code>s is a no-operation unless there are unused levels: in that case, a factor with the reduced level set is returned. If <code>exclude</code> is used, since <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span> version 3.4.0, excluding non-existing character levels is equivalent to excluding nothing, and when <code>exclude</code> is a <code><a href="character.html">character</a></code> vector, that <em>is</em> applied to the levels of <code>x</code>. Alternatively, <code>exclude</code> can be factor with the same level set as <code>x</code> and will exclude the levels present in <code>exclude</code>. </p> <p>The codes of a factor may contain <code><a href="NA.html">NA</a></code>. For a numeric <code>x</code>, set <code>exclude = NULL</code> to make <code><a href="NA.html">NA</a></code> an extra level (prints as <code><NA></code>); by default, this is the last level. </p> <p>If <code>NA</code> is a level, the way to set a code to be missing (as opposed to the code of the missing level) is to use <code><a href="NA.html">is.na</a></code> on the left-hand-side of an assignment (as in <code>is.na(f)[i] <- TRUE</code>; indexing inside <code>is.na</code> does not work). Under those circumstances missing values are currently printed as <code><NA></code>, i.e., identical to entries of level <code>NA</code>. </p> <p><code>is.factor</code> is generic: you can write methods to handle specific classes of objects, see <a href="InternalMethods.html">InternalMethods</a>. </p> <p>Where <code>levels</code> is not supplied, <code><a href="unique.html">unique</a></code> is called. Since factors typically have quite a small number of levels, for large vectors <code>x</code> it is helpful to supply <code>nmax</code> as an upper bound on the number of unique values. </p> <h3>Value</h3> <p><code>factor</code> returns an object of class <code>"factor"</code> which has a set of integer codes the length of <code>x</code> with a <code>"levels"</code> attribute of mode <code><a href="character.html">character</a></code> and unique (<code>!<a href="duplicated.html">anyDuplicated</a>(.)</code>) entries. If argument <code>ordered</code> is true (or <code>ordered()</code> is used) the result has class <code>c("ordered", "factor")</code>. Undocumentedly for a long time, <code>factor(x)</code> loses all <code><a href="attributes.html">attributes</a>(x)</code> but <code>"names"</code>, and resets <code>"levels"</code> and <code>"class"</code>. </p> <p>Applying <code>factor</code> to an ordered or unordered factor returns a factor (of the same type) with just the levels which occur: see also <code><a href="Extract.factor.html">[.factor</a></code> for a more transparent way to achieve this. </p> <p><code>is.factor</code> returns <code>TRUE</code> or <code>FALSE</code> depending on whether its argument is of type factor or not. Correspondingly, <code>is.ordered</code> returns <code>TRUE</code> when its argument is an ordered factor and <code>FALSE</code> otherwise. </p> <p><code>as.factor</code> coerces its argument to a factor. It is an abbreviated (sometimes faster) form of <code>factor</code>. </p> <p><code>as.ordered(x)</code> returns <code>x</code> if this is ordered, and <code>ordered(x)</code> otherwise. </p> <p><code>addNA</code> modifies a factor by turning <code>NA</code> into an extra level (so that <code>NA</code> values are counted in tables, for instance). </p> <p><code>.valid.factor(object)</code> checks the validity of a factor, currently only <code>levels(object)</code>, and returns <code>TRUE</code> if it is valid, otherwise a string describing the validity problem. This function is used for <code><a href="../../methods/html/validObject.html">validObject</a>(<factor>)</code>. </p> <h3>Warning</h3> <p>The interpretation of a factor depends on both the codes and the <code>"levels"</code> attribute. Be careful only to compare factors with the same set of levels (in the same order). In particular, <code>as.numeric</code> applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor <code>f</code> to approximately its original numeric values, <code>as.numeric(levels(f))[f]</code> is recommended and slightly more efficient than <code>as.numeric(as.character(f))</code>. </p> <p>The levels of a factor are by default sorted, but the sort order may well depend on the locale at the time of creation, and should not be assumed to be ASCII. </p> <p>There are some anomalies associated with factors that have <code>NA</code> as a level. It is suggested to use them sparingly, e.g., only for tabulation purposes. </p> <h3>Comparison operators and group generic methods</h3> <p>There are <code>"factor"</code> and <code>"ordered"</code> methods for the <a href="groupGeneric.html">group generic</a> <code><a href="groupGeneric.html">Ops</a></code> which provide methods for the <a href="Comparison.html">Comparison</a> operators, and for the <code><a href="Extremes.html">min</a></code>, <code><a href="Extremes.html">max</a></code>, and <code><a href="range.html">range</a></code> generics in <code><a href="groupGeneric.html">Summary</a></code> of <code>"ordered"</code>. (The rest of the groups and the <code><a href="groupGeneric.html">Math</a></code> group generate an error as they are not meaningful for factors.) </p> <p>Only <code>==</code> and <code>!=</code> can be used for factors: a factor can only be compared to another factor with an identical set of levels (not necessarily in the same ordering) or to a character vector. Ordered factors are compared in the same way, but the general dispatch mechanism precludes comparing ordered and unordered factors. </p> <p>All the comparison operators are available for ordered factors. Collation is done by the levels of the operands: if both operands are ordered factors they must have the same level set. </p> <h3>Note</h3> <p>In earlier versions of <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span>, storing character data as a factor was more space efficient if there is even a small proportion of repeats. However, identical character strings now share storage, so the difference is small in most cases. (Integer values are stored in 4 bytes whereas each reference to a character string needs a pointer of 4 or 8 bytes.) </p> <h3>References</h3> <p>Chambers, J. M. and Hastie, T. J. (1992) <em>Statistical Models in S</em>. Wadsworth & Brooks/Cole. </p> <h3>See Also</h3> <p><code><a href="Extract.factor.html">[.factor</a></code> for subsetting of factors. </p> <p><code><a href="gl.html">gl</a></code> for construction of balanced factors and <code><a href="../../stats/html/zC.html">C</a></code> for factors with specified contrasts. <code><a href="levels.html">levels</a></code> and <code><a href="nlevels.html">nlevels</a></code> for accessing the levels, and <code><a href="class.html">unclass</a></code> to get integer codes. </p> <h3>Examples</h3> <pre> (ff <- factor(substring("statistics", 1:10, 1:10), levels = letters)) as.integer(ff) # the internal codes (f. <- factor(ff)) # drops the levels that do not occur ff[, drop = TRUE] # the same, more transparently factor(letters[1:20], labels = "letter") class(ordered(4:1)) # "ordered", inheriting from "factor" z <- factor(LETTERS[3:1], ordered = TRUE) ## and "relational" methods work: stopifnot(sort(z)[c(1,3)] == range(z), min(z) < max(z)) ## suppose you want "NA" as a level, and to allow missing values. (x <- factor(c(1, 2, NA), exclude = NULL)) is.na(x)[2] <- TRUE x # [1] 1 <NA> <NA> is.na(x) # [1] FALSE TRUE FALSE ## More rational, since R 3.4.0 : factor(c(1:2, NA), exclude = "" ) # keeps <NA> , as factor(c(1:2, NA), exclude = NULL) # always did ## exclude = <character> z # ordered levels 'A < B < C' factor(z, exclude = "C") # does exclude factor(z, exclude = "B") # ditto ## Now, labels maybe duplicated: ## factor() with duplicated labels allowing to "merge levels" x <- c("Man", "Male", "Man", "Lady", "Female") ## Map from 4 different values to only two levels: (xf <- factor(x, levels = c("Male", "Man" , "Lady", "Female"), labels = c("Male", "Male", "Female", "Female"))) #> [1] Male Male Male Female Female #> Levels: Male Female ## Using addNA() Month <- airquality$Month table(addNA(Month)) table(addNA(Month, ifany = TRUE)) </pre> <hr /><div style="text-align: center;">[Package <em>base</em> version 3.6.0 <a href="00Index.html">Index</a>]</div> </body></html>