EVOLUTION-MANAGER
Edit File: fct_lump.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Lump together factor levels into "other"</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for fct_lump {forcats}"><tr><td>fct_lump {forcats}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Lump together factor levels into "other"</h2> <h3>Description</h3> <p>A family for lumping together levels that meet some criteria. </p> <ul> <li> <p><code>fct_lump_min()</code>: lumps levels that appear fewer than <code>min</code> times. </p> </li> <li> <p><code>fct_lump_prop()</code>: lumps levels that appear in fewer <code>prop * n</code> times. </p> </li> <li> <p><code>fct_lump_n()</code> lumps all levels except for the <code>n</code> most frequent (or least frequent if <code>n < 0</code>) </p> </li> <li> <p><code>fct_lump_lowfreq()</code> lumps together the least frequent levels, ensuring that "other" is still the smallest level. </p> </li></ul> <p><code>fct_lump()</code> exists primarily for historical reasons, as it automatically picks between these different methods depending on its arguments. We no longer recommend that you use it. </p> <h3>Usage</h3> <pre> fct_lump( f, n, prop, w = NULL, other_level = "Other", ties.method = c("min", "average", "first", "last", "random", "max") ) fct_lump_min(f, min, w = NULL, other_level = "Other") fct_lump_prop(f, prop, w = NULL, other_level = "Other") fct_lump_n( f, n, w = NULL, other_level = "Other", ties.method = c("min", "average", "first", "last", "random", "max") ) fct_lump_lowfreq(f, other_level = "Other") </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>f</code></td> <td> <p>A factor (or character vector).</p> </td></tr> <tr valign="top"><td><code>n</code></td> <td> <p>Positive <code>n</code> preserves the most common <code>n</code> values. Negative <code>n</code> preserves the least common <code>-n</code> values. It there are ties, you will get at least <code>abs(n)</code> values.</p> </td></tr> <tr valign="top"><td><code>prop</code></td> <td> <p>Positive <code>prop</code> lumps values which do not appear at least <code>prop</code> of the time. Negative <code>prop</code> lumps values that do not appear at most <code>-prop</code> of the time.</p> </td></tr> <tr valign="top"><td><code>w</code></td> <td> <p>An optional numeric vector giving weights for frequency of each value (not level) in f.</p> </td></tr> <tr valign="top"><td><code>other_level</code></td> <td> <p>Value of level used for "other" values. Always placed at end of levels.</p> </td></tr> <tr valign="top"><td><code>ties.method</code></td> <td> <p>A character string specifying how ties are treated. See <code><a href="../../base/html/rank.html">rank()</a></code> for details.</p> </td></tr> <tr valign="top"><td><code>min</code></td> <td> <p>Preserve levels that appear at least <code>min</code> number of times.</p> </td></tr> </table> <h3>See Also</h3> <p><code><a href="fct_other.html">fct_other()</a></code> to convert specified levels to other. </p> <h3>Examples</h3> <pre> x <- factor(rep(LETTERS[1:9], times = c(40, 10, 5, 27, 1, 1, 1, 1, 1))) x %>% table() x %>% fct_lump_n(3) %>% table() x %>% fct_lump_prop(0.10) %>% table() x %>% fct_lump_min(5) %>% table() x %>% fct_lump_lowfreq() %>% table() x <- factor(letters[rpois(100, 5)]) x table(x) table(fct_lump_lowfreq(x)) # Use positive values to collapse the rarest fct_lump_n(x, n = 3) fct_lump_prop(x, prop = 0.1) # Use negative values to collapse the most common fct_lump_n(x, n = -3) fct_lump_prop(x, prop = -0.1) # Use weighted frequencies w <- c(rep(2, 50), rep(1, 50)) fct_lump_n(x, n = 5, w = w) # Use ties.method to control how tied factors are collapsed fct_lump_n(x, n = 6) fct_lump_n(x, n = 6, ties.method = "max") # Use fct_lump_min() to lump together all levels with fewer than `n` values table(fct_lump_min(x, min = 10)) table(fct_lump_min(x, min = 15)) </pre> <hr /><div style="text-align: center;">[Package <em>forcats</em> version 0.5.0 <a href="00Index.html">Index</a>]</div> </body></html>