EVOLUTION-MANAGER
Edit File: forcats.html
<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset="utf-8" /> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <meta name="generator" content="pandoc" /> <meta http-equiv="X-UA-Compatible" content="IE=EDGE" /> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="author" content="Emily Robinson" /> <title>Introduction to forcats</title> <style type="text/css">code{white-space: pre;}</style> <style type="text/css" data-origin="pandoc"> code.sourceCode > span { display: inline-block; line-height: 1.25; } code.sourceCode > span { color: inherit; text-decoration: inherit; } code.sourceCode > span:empty { height: 1.2em; } .sourceCode { overflow: visible; } code.sourceCode { white-space: pre; position: relative; } div.sourceCode { margin: 1em 0; } pre.sourceCode { margin: 0; } @media screen { div.sourceCode { overflow: auto; } } @media print { code.sourceCode { white-space: pre-wrap; } code.sourceCode > span { text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } pre.numberSource code > span { position: relative; left: -4em; counter-increment: source-line; } pre.numberSource code > span > a:first-child::before { content: counter(source-line); position: relative; left: -1em; text-align: right; vertical-align: baseline; border: none; display: inline-block; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none; padding: 0 4px; width: 4em; color: #aaaaaa; } pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; } div.sourceCode { } @media screen { code.sourceCode > span > a:first-child::before { text-decoration: underline; } } code span.al { color: #ff0000; font-weight: bold; } /* Alert */ code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ code span.at { color: #7d9029; } /* Attribute */ code span.bn { color: #40a070; } /* BaseN */ code span.bu { } /* BuiltIn */ code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */ code span.ch { color: #4070a0; } /* Char */ code span.cn { color: #880000; } /* Constant */ code span.co { color: #60a0b0; font-style: italic; } /* Comment */ code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ code span.do { color: #ba2121; font-style: italic; } /* Documentation */ code span.dt { color: #902000; } /* DataType */ code span.dv { color: #40a070; } /* DecVal */ code span.er { color: #ff0000; font-weight: bold; } /* Error */ code span.ex { } /* Extension */ code span.fl { color: #40a070; } /* Float */ code span.fu { color: #06287e; } /* Function */ code span.im { } /* Import */ code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ code span.kw { color: #007020; font-weight: bold; } /* Keyword */ code span.op { color: #666666; } /* Operator */ code span.ot { color: #007020; } /* Other */ code span.pp { color: #bc7a00; } /* Preprocessor */ code span.sc { color: #4070a0; } /* SpecialChar */ code span.ss { color: #bb6688; } /* SpecialString */ code span.st { color: #4070a0; } /* String */ code span.va { color: #19177c; } /* Variable */ code span.vs { color: #4070a0; } /* VerbatimString */ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */ </style> <script> // apply pandoc div.sourceCode style to pre.sourceCode instead (function() { var sheets = document.styleSheets; for (var i = 0; i < sheets.length; i++) { if (sheets[i].ownerNode.dataset["origin"] !== "pandoc") continue; try { var rules = sheets[i].cssRules; } catch (e) { continue; } for (var j = 0; j < rules.length; j++) { var rule = rules[j]; // check if there is a div.sourceCode rule if (rule.type !== rule.STYLE_RULE || rule.selectorText !== "div.sourceCode") continue; var style = rule.style.cssText; // check if color or background-color is set if (rule.style.color === '' && rule.style.backgroundColor === '') continue; // replace div.sourceCode by a pre.sourceCode rule sheets[i].deleteRule(j); sheets[i].insertRule('pre.sourceCode{' + style + '}', j); } } })(); </script> <style type="text/css">body { background-color: #fff; margin: 1em auto; max-width: 700px; overflow: visible; padding-left: 2em; padding-right: 2em; font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px; line-height: 1.35; } #header { text-align: center; } #TOC { clear: both; margin: 0 0 10px 10px; padding: 4px; width: 400px; border: 1px solid #CCCCCC; border-radius: 5px; background-color: #f6f6f6; font-size: 13px; line-height: 1.3; } #TOC .toctitle { font-weight: bold; font-size: 15px; margin-left: 5px; } #TOC ul { padding-left: 40px; margin-left: -1.5em; margin-top: 5px; margin-bottom: 5px; } #TOC ul ul { margin-left: -2em; } #TOC li { line-height: 16px; } table { margin: 1em auto; border-width: 1px; border-color: #DDDDDD; border-style: outset; border-collapse: collapse; } table th { border-width: 2px; padding: 5px; border-style: inset; } table td { border-width: 1px; border-style: inset; line-height: 18px; padding: 5px 5px; } table, table th, table td { border-left-style: none; border-right-style: none; } table thead, table tr.even { background-color: #f7f7f7; } p { margin: 0.5em 0; } blockquote { background-color: #f6f6f6; padding: 0.25em 0.75em; } hr { border-style: solid; border: none; border-top: 1px solid #777; margin: 28px 0; } dl { margin-left: 0; } dl dd { margin-bottom: 13px; margin-left: 13px; } dl dt { font-weight: bold; } ul { margin-top: 0; } ul li { list-style: circle outside; } ul ul { margin-bottom: 0; } pre, code { background-color: #f7f7f7; border-radius: 3px; color: #333; white-space: pre-wrap; } pre { border-radius: 3px; margin: 5px 0px 10px 0px; padding: 10px; } pre:not([class]) { background-color: #f7f7f7; } code { font-family: Consolas, Monaco, 'Courier New', monospace; font-size: 85%; } p > code, li > code { padding: 2px 0px; } div.figure { text-align: center; } img { background-color: #FFFFFF; padding: 2px; border: 1px solid #DDDDDD; border-radius: 3px; border: 1px solid #CCCCCC; margin: 0 5px; } h1 { margin-top: 0; font-size: 35px; line-height: 40px; } h2 { border-bottom: 4px solid #f7f7f7; padding-top: 10px; padding-bottom: 2px; font-size: 145%; } h3 { border-bottom: 2px solid #f7f7f7; padding-top: 10px; font-size: 120%; } h4 { border-bottom: 1px solid #f7f7f7; margin-left: 8px; font-size: 105%; } h5, h6 { border-bottom: 1px solid #ccc; font-size: 105%; } a { color: #0033dd; text-decoration: none; } a:hover { color: #6666ff; } a:visited { color: #800080; } a:visited:hover { color: #BB00BB; } a[href^="http:"] { text-decoration: underline; } a[href^="https:"] { text-decoration: underline; } code > span.kw { color: #555; font-weight: bold; } code > span.dt { color: #902000; } code > span.dv { color: #40a070; } code > span.bn { color: #d14; } code > span.fl { color: #d14; } code > span.ch { color: #d14; } code > span.st { color: #d14; } code > span.co { color: #888888; font-style: italic; } code > span.ot { color: #007020; } code > span.al { color: #ff0000; font-weight: bold; } code > span.fu { color: #900; font-weight: bold; } code > span.er { color: #a61717; background-color: #e3d2d2; } </style> </head> <body> <h1 class="title toc-ignore">Introduction to forcats</h1> <h4 class="author">Emily Robinson</h4> <p>The goal of the <strong>forcats</strong> package is to provide a suite of useful tools that solve common problems with factors. Factors are useful when you have categorical data, variables that have a fixed and known set of values, and when you want to display character vectors in non-alphabetical order. If you want to learn more, the best place to start is the <a href="http://r4ds.had.co.nz/factors.html">chapter on factors</a> in R for Data Science.</p> <div id="ordering-by-frequency" class="section level2"> <h2>Ordering by frequency</h2> <div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1"></a><span class="kw">library</span>(dplyr)</span> <span id="cb1-2"><a href="#cb1-2"></a><span class="kw">library</span>(ggplot2)</span> <span id="cb1-3"><a href="#cb1-3"></a><span class="kw">library</span>(forcats)</span></code></pre></div> <p>Let’s try answering the question, “what are the most common hair colors of star wars characters?” Let’s start off by making a bar plot:</p> <div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1"></a><span class="kw">ggplot</span>(starwars, <span class="kw">aes</span>(<span class="dt">x =</span> hair_color)) <span class="op">+</span><span class="st"> </span></span> <span id="cb2-2"><a href="#cb2-2"></a><span class="st"> </span><span class="kw">geom_bar</span>() <span class="op">+</span><span class="st"> </span></span> <span id="cb2-3"><a href="#cb2-3"></a><span class="st"> </span><span class="kw">coord_flip</span>()</span></code></pre></div> <p><img src="" /><!-- --></p> <p>That’s okay, but it would be more helpful the graph was ordered by count. This is a case of an <strong>unordered</strong> categorical variable where we want it ordered by its frequency. To do so, we can use the function <code>fct_infreq()</code>:</p> <div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1"></a><span class="kw">ggplot</span>(starwars, <span class="kw">aes</span>(<span class="dt">x =</span> <span class="kw">fct_infreq</span>(hair_color))) <span class="op">+</span><span class="st"> </span></span> <span id="cb3-2"><a href="#cb3-2"></a><span class="st"> </span><span class="kw">geom_bar</span>() <span class="op">+</span><span class="st"> </span></span> <span id="cb3-3"><a href="#cb3-3"></a><span class="st"> </span><span class="kw">coord_flip</span>()</span></code></pre></div> <p><img src="" /><!-- --></p> <p>Note that <code>fct_infreq()</code> it automatically puts NA at the top, even though that doesn’t have the smallest number of entries.</p> </div> <div id="combining-levels" class="section level2"> <h2>Combining levels</h2> <p>Let’s take a look at skin color now:</p> <div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1"></a>starwars <span class="op">%>%</span></span> <span id="cb4-2"><a href="#cb4-2"></a><span class="st"> </span><span class="kw">count</span>(skin_color, <span class="dt">sort =</span> <span class="ot">TRUE</span>)</span> <span id="cb4-3"><a href="#cb4-3"></a><span class="co">#> # A tibble: 31 x 2</span></span> <span id="cb4-4"><a href="#cb4-4"></a><span class="co">#> skin_color n</span></span> <span id="cb4-5"><a href="#cb4-5"></a><span class="co">#> <chr> <int></span></span> <span id="cb4-6"><a href="#cb4-6"></a><span class="co">#> 1 fair 17</span></span> <span id="cb4-7"><a href="#cb4-7"></a><span class="co">#> 2 light 11</span></span> <span id="cb4-8"><a href="#cb4-8"></a><span class="co">#> 3 dark 6</span></span> <span id="cb4-9"><a href="#cb4-9"></a><span class="co">#> 4 green 6</span></span> <span id="cb4-10"><a href="#cb4-10"></a><span class="co">#> 5 grey 6</span></span> <span id="cb4-11"><a href="#cb4-11"></a><span class="co">#> 6 pale 5</span></span> <span id="cb4-12"><a href="#cb4-12"></a><span class="co">#> 7 brown 4</span></span> <span id="cb4-13"><a href="#cb4-13"></a><span class="co">#> 8 blue 2</span></span> <span id="cb4-14"><a href="#cb4-14"></a><span class="co">#> 9 blue, grey 2</span></span> <span id="cb4-15"><a href="#cb4-15"></a><span class="co">#> 10 orange 2</span></span> <span id="cb4-16"><a href="#cb4-16"></a><span class="co">#> # … with 21 more rows</span></span></code></pre></div> <p>We see that there’s 31 different skin colors - if we want to make a plot this would be way too many to display! Let’s reduce it to only be the top 5. We can use <code>fct_lump()</code> to “lump” all the infrequent colors into one factor, “other.” The argument <code>n</code> is the number of levels we want to keep.</p> <div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1"></a>starwars <span class="op">%>%</span></span> <span id="cb5-2"><a href="#cb5-2"></a><span class="st"> </span><span class="kw">mutate</span>(<span class="dt">skin_color =</span> <span class="kw">fct_lump</span>(skin_color, <span class="dt">n =</span> <span class="dv">5</span>)) <span class="op">%>%</span></span> <span id="cb5-3"><a href="#cb5-3"></a><span class="st"> </span><span class="kw">count</span>(skin_color, <span class="dt">sort =</span> <span class="ot">TRUE</span>)</span> <span id="cb5-4"><a href="#cb5-4"></a><span class="co">#> # A tibble: 6 x 2</span></span> <span id="cb5-5"><a href="#cb5-5"></a><span class="co">#> skin_color n</span></span> <span id="cb5-6"><a href="#cb5-6"></a><span class="co">#> <fct> <int></span></span> <span id="cb5-7"><a href="#cb5-7"></a><span class="co">#> 1 Other 41</span></span> <span id="cb5-8"><a href="#cb5-8"></a><span class="co">#> 2 fair 17</span></span> <span id="cb5-9"><a href="#cb5-9"></a><span class="co">#> 3 light 11</span></span> <span id="cb5-10"><a href="#cb5-10"></a><span class="co">#> 4 dark 6</span></span> <span id="cb5-11"><a href="#cb5-11"></a><span class="co">#> 5 green 6</span></span> <span id="cb5-12"><a href="#cb5-12"></a><span class="co">#> 6 grey 6</span></span></code></pre></div> <p>We could also have used <code>prop</code> instead, which keeps all the levels that appear at least <code>prop</code> of the time. For example, let’s keep skin colors that at least 10% of the characters have:</p> <div class="sourceCode" id="cb6"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb6-1"><a href="#cb6-1"></a>starwars <span class="op">%>%</span></span> <span id="cb6-2"><a href="#cb6-2"></a><span class="st"> </span><span class="kw">mutate</span>(<span class="dt">skin_color =</span> <span class="kw">fct_lump</span>(skin_color, <span class="dt">prop =</span> <span class="fl">.1</span>)) <span class="op">%>%</span></span> <span id="cb6-3"><a href="#cb6-3"></a><span class="st"> </span><span class="kw">count</span>(skin_color, <span class="dt">sort =</span> <span class="ot">TRUE</span>)</span> <span id="cb6-4"><a href="#cb6-4"></a><span class="co">#> # A tibble: 3 x 2</span></span> <span id="cb6-5"><a href="#cb6-5"></a><span class="co">#> skin_color n</span></span> <span id="cb6-6"><a href="#cb6-6"></a><span class="co">#> <fct> <int></span></span> <span id="cb6-7"><a href="#cb6-7"></a><span class="co">#> 1 Other 59</span></span> <span id="cb6-8"><a href="#cb6-8"></a><span class="co">#> 2 fair 17</span></span> <span id="cb6-9"><a href="#cb6-9"></a><span class="co">#> 3 light 11</span></span></code></pre></div> <p>Only light and fair remain; everything else is other.</p> <p>If you wanted to call it something than “other”, you can change it with the argument <code>other_level</code>:</p> <div class="sourceCode" id="cb7"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb7-1"><a href="#cb7-1"></a>starwars <span class="op">%>%</span></span> <span id="cb7-2"><a href="#cb7-2"></a><span class="st"> </span><span class="kw">mutate</span>(<span class="dt">skin_color =</span> <span class="kw">fct_lump</span>(skin_color, <span class="dt">prop =</span> <span class="fl">.1</span>, <span class="dt">other_level =</span> <span class="st">"extra"</span>)) <span class="op">%>%</span></span> <span id="cb7-3"><a href="#cb7-3"></a><span class="st"> </span><span class="kw">count</span>(skin_color, <span class="dt">sort =</span> <span class="ot">TRUE</span>)</span> <span id="cb7-4"><a href="#cb7-4"></a><span class="co">#> # A tibble: 3 x 2</span></span> <span id="cb7-5"><a href="#cb7-5"></a><span class="co">#> skin_color n</span></span> <span id="cb7-6"><a href="#cb7-6"></a><span class="co">#> <fct> <int></span></span> <span id="cb7-7"><a href="#cb7-7"></a><span class="co">#> 1 extra 59</span></span> <span id="cb7-8"><a href="#cb7-8"></a><span class="co">#> 2 fair 17</span></span> <span id="cb7-9"><a href="#cb7-9"></a><span class="co">#> 3 light 11</span></span></code></pre></div> <p>What if we wanted to see if the average mass differed by eye color? We’ll only look at the 6 most popular eye colors and remove <code>NA</code>s.</p> <div class="sourceCode" id="cb8"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb8-1"><a href="#cb8-1"></a>avg_mass_eye_color <-<span class="st"> </span>starwars <span class="op">%>%</span></span> <span id="cb8-2"><a href="#cb8-2"></a><span class="st"> </span><span class="kw">mutate</span>(<span class="dt">eye_color =</span> <span class="kw">fct_lump</span>(eye_color, <span class="dt">n =</span> <span class="dv">6</span>)) <span class="op">%>%</span></span> <span id="cb8-3"><a href="#cb8-3"></a><span class="st"> </span><span class="kw">group_by</span>(eye_color) <span class="op">%>%</span></span> <span id="cb8-4"><a href="#cb8-4"></a><span class="st"> </span><span class="kw">summarise</span>(<span class="dt">mean_mass =</span> <span class="kw">mean</span>(mass, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))</span> <span id="cb8-5"><a href="#cb8-5"></a></span> <span id="cb8-6"><a href="#cb8-6"></a>avg_mass_eye_color</span> <span id="cb8-7"><a href="#cb8-7"></a><span class="co">#> # A tibble: 7 x 2</span></span> <span id="cb8-8"><a href="#cb8-8"></a><span class="co">#> eye_color mean_mass</span></span> <span id="cb8-9"><a href="#cb8-9"></a><span class="co">#> <fct> <dbl></span></span> <span id="cb8-10"><a href="#cb8-10"></a><span class="co">#> 1 black 76.3</span></span> <span id="cb8-11"><a href="#cb8-11"></a><span class="co">#> 2 blue 86.5</span></span> <span id="cb8-12"><a href="#cb8-12"></a><span class="co">#> 3 brown 66.1</span></span> <span id="cb8-13"><a href="#cb8-13"></a><span class="co">#> 4 orange 282. </span></span> <span id="cb8-14"><a href="#cb8-14"></a><span class="co">#> 5 red 81.4</span></span> <span id="cb8-15"><a href="#cb8-15"></a><span class="co">#> 6 yellow 81.1</span></span> <span id="cb8-16"><a href="#cb8-16"></a><span class="co">#> 7 Other 68.4</span></span></code></pre></div> </div> <div id="ordering-by-another-variable" class="section level2"> <h2>Ordering by another variable</h2> <p>It looks like people (or at least one person) with orange eyes are definitely heavier! If we wanted to make a graph, it would be nice if it was ordered by <code>mean_mass</code>. We can do this with <code>fct_reorder()</code>, which reorders one variable by another.</p> <div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb9-1"><a href="#cb9-1"></a>avg_mass_eye_color <span class="op">%>%</span></span> <span id="cb9-2"><a href="#cb9-2"></a><span class="st"> </span><span class="kw">mutate</span>(<span class="dt">eye_color =</span> <span class="kw">fct_reorder</span>(eye_color, mean_mass)) <span class="op">%>%</span></span> <span id="cb9-3"><a href="#cb9-3"></a><span class="st"> </span><span class="kw">ggplot</span>(<span class="kw">aes</span>(<span class="dt">x =</span> eye_color, <span class="dt">y =</span> mean_mass)) <span class="op">+</span><span class="st"> </span></span> <span id="cb9-4"><a href="#cb9-4"></a><span class="st"> </span><span class="kw">geom_col</span>()</span></code></pre></div> <p><img src="" /><!-- --></p> </div> <div id="manually-reordering" class="section level2"> <h2>Manually reordering</h2> <p>Let’s switch to using another dataset, <code>gss_cat</code>, the general social survey. What is the income distribution among the respondents?</p> <div class="sourceCode" id="cb10"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb10-1"><a href="#cb10-1"></a>gss_cat <span class="op">%>%</span></span> <span id="cb10-2"><a href="#cb10-2"></a><span class="st"> </span><span class="kw">count</span>(rincome)</span> <span id="cb10-3"><a href="#cb10-3"></a><span class="co">#> # A tibble: 16 x 2</span></span> <span id="cb10-4"><a href="#cb10-4"></a><span class="co">#> rincome n</span></span> <span id="cb10-5"><a href="#cb10-5"></a><span class="co">#> <fct> <int></span></span> <span id="cb10-6"><a href="#cb10-6"></a><span class="co">#> 1 No answer 183</span></span> <span id="cb10-7"><a href="#cb10-7"></a><span class="co">#> 2 Don't know 267</span></span> <span id="cb10-8"><a href="#cb10-8"></a><span class="co">#> 3 Refused 975</span></span> <span id="cb10-9"><a href="#cb10-9"></a><span class="co">#> 4 $25000 or more 7363</span></span> <span id="cb10-10"><a href="#cb10-10"></a><span class="co">#> 5 $20000 - 24999 1283</span></span> <span id="cb10-11"><a href="#cb10-11"></a><span class="co">#> 6 $15000 - 19999 1048</span></span> <span id="cb10-12"><a href="#cb10-12"></a><span class="co">#> 7 $10000 - 14999 1168</span></span> <span id="cb10-13"><a href="#cb10-13"></a><span class="co">#> 8 $8000 to 9999 340</span></span> <span id="cb10-14"><a href="#cb10-14"></a><span class="co">#> 9 $7000 to 7999 188</span></span> <span id="cb10-15"><a href="#cb10-15"></a><span class="co">#> 10 $6000 to 6999 215</span></span> <span id="cb10-16"><a href="#cb10-16"></a><span class="co">#> 11 $5000 to 5999 227</span></span> <span id="cb10-17"><a href="#cb10-17"></a><span class="co">#> 12 $4000 to 4999 226</span></span> <span id="cb10-18"><a href="#cb10-18"></a><span class="co">#> 13 $3000 to 3999 276</span></span> <span id="cb10-19"><a href="#cb10-19"></a><span class="co">#> 14 $1000 to 2999 395</span></span> <span id="cb10-20"><a href="#cb10-20"></a><span class="co">#> 15 Lt $1000 286</span></span> <span id="cb10-21"><a href="#cb10-21"></a><span class="co">#> 16 Not applicable 7043</span></span></code></pre></div> <p>Notice that the income levels are in the correct order - they start with the non-answers and then go from highest to lowest. This is the same order you’d see if you plotted it as a bar chart. This is not a coincidence. When you’re working with ordinal data, where there is an order, you can have an ordered factor. You can examine them with the base function <code>levels()</code>, which prints them in order:</p> <div class="sourceCode" id="cb11"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb11-1"><a href="#cb11-1"></a><span class="kw">levels</span>(gss_cat<span class="op">$</span>rincome)</span> <span id="cb11-2"><a href="#cb11-2"></a><span class="co">#> [1] "No answer" "Don't know" "Refused" "$25000 or more"</span></span> <span id="cb11-3"><a href="#cb11-3"></a><span class="co">#> [5] "$20000 - 24999" "$15000 - 19999" "$10000 - 14999" "$8000 to 9999" </span></span> <span id="cb11-4"><a href="#cb11-4"></a><span class="co">#> [9] "$7000 to 7999" "$6000 to 6999" "$5000 to 5999" "$4000 to 4999" </span></span> <span id="cb11-5"><a href="#cb11-5"></a><span class="co">#> [13] "$3000 to 3999" "$1000 to 2999" "Lt $1000" "Not applicable"</span></span></code></pre></div> <p>But what if your factor came in the wrong order? Let’s simulate that by reordering the levels of <code>rincome</code> randomly with <code>fct_shuffle()</code>:</p> <div class="sourceCode" id="cb12"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb12-1"><a href="#cb12-1"></a>reshuffled_income <-<span class="st"> </span>gss_cat<span class="op">$</span>rincome <span class="op">%>%</span></span> <span id="cb12-2"><a href="#cb12-2"></a><span class="st"> </span><span class="kw">fct_shuffle</span>()</span> <span id="cb12-3"><a href="#cb12-3"></a></span> <span id="cb12-4"><a href="#cb12-4"></a><span class="kw">levels</span>(reshuffled_income)</span> <span id="cb12-5"><a href="#cb12-5"></a><span class="co">#> [1] "$10000 - 14999" "$5000 to 5999" "Not applicable" "$15000 - 19999"</span></span> <span id="cb12-6"><a href="#cb12-6"></a><span class="co">#> [5] "No answer" "$20000 - 24999" "$4000 to 4999" "Lt $1000" </span></span> <span id="cb12-7"><a href="#cb12-7"></a><span class="co">#> [9] "$25000 or more" "$7000 to 7999" "$1000 to 2999" "Refused" </span></span> <span id="cb12-8"><a href="#cb12-8"></a><span class="co">#> [13] "$8000 to 9999" "Don't know" "$3000 to 3999" "$6000 to 6999"</span></span></code></pre></div> <p>Now if we plotted it, it would show in this order, which is all over the place! How can we fix this and put it in the right order?</p> <p>We can use the function <code>fct_relevel()</code> when we need to manually reorder our factor levels. In addition to the factor, you give it a character vector of level names, and specify where you want to move them. It defaults to moving them to the front, but you can move them after another level with the argument <code>after</code>. If you want to move it to the end, you set <code>after</code> equal to <code>Inf</code>.</p> <p>For example, let’s say we wanted to move <code>Lt $1000</code> and <code>$1000 to 2999</code> to the front. We would write:</p> <div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb13-1"><a href="#cb13-1"></a><span class="kw">fct_relevel</span>(reshuffled_income, <span class="kw">c</span>(<span class="st">"Lt $1000"</span>, <span class="st">"$1000 to 2999"</span>)) <span class="op">%>%</span></span> <span id="cb13-2"><a href="#cb13-2"></a><span class="st"> </span><span class="kw">levels</span>()</span> <span id="cb13-3"><a href="#cb13-3"></a><span class="co">#> [1] "Lt $1000" "$1000 to 2999" "$10000 - 14999" "$5000 to 5999" </span></span> <span id="cb13-4"><a href="#cb13-4"></a><span class="co">#> [5] "Not applicable" "$15000 - 19999" "No answer" "$20000 - 24999"</span></span> <span id="cb13-5"><a href="#cb13-5"></a><span class="co">#> [9] "$4000 to 4999" "$25000 or more" "$7000 to 7999" "Refused" </span></span> <span id="cb13-6"><a href="#cb13-6"></a><span class="co">#> [13] "$8000 to 9999" "Don't know" "$3000 to 3999" "$6000 to 6999"</span></span></code></pre></div> <p>What if we want to move them to the second and third place?</p> <div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb14-1"><a href="#cb14-1"></a><span class="kw">fct_relevel</span>(reshuffled_income, <span class="kw">c</span>(<span class="st">"Lt $1000"</span>, <span class="st">"$1000 to 2999"</span>), <span class="dt">after =</span> <span class="dv">1</span>) <span class="op">%>%</span></span> <span id="cb14-2"><a href="#cb14-2"></a><span class="st"> </span><span class="kw">levels</span>()</span> <span id="cb14-3"><a href="#cb14-3"></a><span class="co">#> [1] "$10000 - 14999" "Lt $1000" "$1000 to 2999" "$5000 to 5999" </span></span> <span id="cb14-4"><a href="#cb14-4"></a><span class="co">#> [5] "Not applicable" "$15000 - 19999" "No answer" "$20000 - 24999"</span></span> <span id="cb14-5"><a href="#cb14-5"></a><span class="co">#> [9] "$4000 to 4999" "$25000 or more" "$7000 to 7999" "Refused" </span></span> <span id="cb14-6"><a href="#cb14-6"></a><span class="co">#> [13] "$8000 to 9999" "Don't know" "$3000 to 3999" "$6000 to 6999"</span></span></code></pre></div> </div> <!-- code folding --> <!-- dynamically load mathjax for compatibility with self-contained --> <script> (function () { var script = document.createElement("script"); script.type = "text/javascript"; script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"; document.getElementsByTagName("head")[0].appendChild(script); })(); </script> </body> </html>