EVOLUTION-MANAGER
Edit File: cut.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Convert Numeric to Factor</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for cut {base}"><tr><td>cut {base}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Convert Numeric to Factor</h2> <h3>Description</h3> <p><code>cut</code> divides the range of <code>x</code> into intervals and codes the values in <code>x</code> according to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on. </p> <h3>Usage</h3> <pre> cut(x, ...) ## Default S3 method: cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3, ordered_result = FALSE, ...) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x</code></td> <td> <p>a numeric vector which is to be converted to a factor by cutting.</p> </td></tr> <tr valign="top"><td><code>breaks</code></td> <td> <p>either a numeric vector of two or more unique cut points or a single number (greater than or equal to 2) giving the number of intervals into which <code>x</code> is to be cut.</p> </td></tr> <tr valign="top"><td><code>labels</code></td> <td> <p>labels for the levels of the resulting category. By default, labels are constructed using <code>"(a,b]"</code> interval notation. If <code>labels = FALSE</code>, simple integer codes are returned instead of a factor.</p> </td></tr> <tr valign="top"><td><code>include.lowest</code></td> <td> <p>logical, indicating if an ‘x[i]’ equal to the lowest (or highest, for <code>right = FALSE</code>) ‘breaks’ value should be included.</p> </td></tr> <tr valign="top"><td><code>right</code></td> <td> <p>logical, indicating if the intervals should be closed on the right (and open on the left) or vice versa.</p> </td></tr> <tr valign="top"><td><code>dig.lab</code></td> <td> <p>integer which is used when labels are not given. It determines the number of digits used in formatting the break numbers.</p> </td></tr> <tr valign="top"><td><code>ordered_result</code></td> <td> <p>logical: should the result be an ordered factor?</p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p>further arguments passed to or from other methods.</p> </td></tr> </table> <h3>Details</h3> <p>When <code>breaks</code> is specified as a single number, the range of the data is divided into <code>breaks</code> pieces of equal length, and then the outer limits are moved away by 0.1% of the range to ensure that the extreme values both fall within the break intervals. (If <code>x</code> is a constant vector, equal-length intervals are created, one of which includes the single value.) </p> <p>If a <code>labels</code> parameter is specified, its values are used to name the factor levels. If none is specified, the factor level labels are constructed as <code>"(b1, b2]"</code>, <code>"(b2, b3]"</code> etc. for <code>right = TRUE</code> and as <code>"[b1, b2)"</code>, ... if <code>right = FALSE</code>. In this case, <code>dig.lab</code> indicates the minimum number of digits should be used in formatting the numbers <code>b1</code>, <code>b2</code>, .... A larger value (up to 12) will be used if needed to distinguish between any pair of endpoints: if this fails labels such as <code>"Range3"</code> will be used. Formatting is done by <code><a href="formatc.html">formatC</a></code>. </p> <p>The default method will sort a numeric vector of <code>breaks</code>, but other methods are not required to and <code>labels</code> will correspond to the intervals after sorting. </p> <p>As from <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span> 3.2.0, <code>getOption("OutDec")</code> is consulted when labels are constructed for <code>labels = NULL</code>. </p> <h3>Value</h3> <p>A <code><a href="factor.html">factor</a></code> is returned, unless <code>labels = FALSE</code> which results in an integer vector of level codes. </p> <p>Values which fall outside the range of <code>breaks</code> are coded as <code>NA</code>, as are <code>NaN</code> and <code>NA</code> values. </p> <h3>Note</h3> <p>Instead of <code>table(cut(x, br))</code>, <code>hist(x, br, plot = FALSE)</code> is more efficient and less memory hungry. Instead of <code>cut(*, labels = FALSE)</code>, <code><a href="findInterval.html">findInterval</a>()</code> is more efficient. </p> <h3>References</h3> <p>Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) <em>The New S Language</em>. Wadsworth & Brooks/Cole. </p> <h3>See Also</h3> <p><code><a href="split.html">split</a></code> for splitting a variable according to a group factor; <code><a href="factor.html">factor</a></code>, <code><a href="tabulate.html">tabulate</a></code>, <code><a href="table.html">table</a></code>, <code><a href="findInterval.html">findInterval</a></code>. </p> <p><code><a href="../../stats/html/quantile.html">quantile</a></code> for ways of choosing breaks of roughly equal content (rather than length). </p> <p><code><a href="bincode.html">.bincode</a></code> for a bare-bones version. </p> <h3>Examples</h3> <pre> Z <- stats::rnorm(10000) table(cut(Z, breaks = -6:6)) sum(table(cut(Z, breaks = -6:6, labels = FALSE))) sum(graphics::hist(Z, breaks = -6:6, plot = FALSE)$counts) cut(rep(1,5), 4) #-- dummy tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5) x <- rep(0:8, tx0) stopifnot(table(x) == tx0) table( cut(x, b = 8)) table( cut(x, breaks = 3*(-2:5))) table( cut(x, breaks = 3*(-2:5), right = FALSE)) ##--- some values OUTSIDE the breaks : table(cx <- cut(x, breaks = 2*(0:4))) table(cxl <- cut(x, breaks = 2*(0:4), right = FALSE)) which(is.na(cx)); x[is.na(cx)] #-- the first 9 values 0 which(is.na(cxl)); x[is.na(cxl)] #-- the last 5 values 8 ## Label construction: y <- stats::rnorm(100) table(cut(y, breaks = pi/3*(-3:3))) table(cut(y, breaks = pi/3*(-3:3), dig.lab = 4)) table(cut(y, breaks = 1*(-3:3), dig.lab = 4)) # extra digits don't "harm" here table(cut(y, breaks = 1*(-3:3), right = FALSE)) #- the same, since no exact INT! ## sometimes the default dig.lab is not enough to be avoid confusion: aaa <- c(1,2,3,4,5,2,3,4,5,6,7) cut(aaa, 3) cut(aaa, 3, dig.lab = 4, ordered = TRUE) ## one way to extract the breakpoints labs <- levels(cut(aaa, 3)) cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ), upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) )) </pre> <hr /><div style="text-align: center;">[Package <em>base</em> version 3.6.0 <a href="00Index.html">Index</a>]</div> </body></html>