EVOLUTION-MANAGER
Edit File: setorder.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Fast row reordering of a data.table by reference</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for setorder {data.table}"><tr><td>setorder {data.table}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Fast row reordering of a data.table by reference</h2> <h3>Description</h3> <p>In <code>data.table</code> parlance, all <code>set*</code> functions change their input <em>by reference</em>. That is, no copy is made at all, other than temporary working memory, which is as large as one column. The only other <code>data.table</code> operator that modifies input by reference is <code><a href="assign.html">:=</a></code>. Check out the <code>See Also</code> section below for other <code>set*</code> function <code>data.table</code> provides. </p> <p><code>setorder</code> (and <code>setorderv</code>) reorders the rows of a <code>data.table</code> based on the columns (and column order) provided. It reorders the table <em>by reference</em> and is therefore very memory efficient. </p> <p>Note that queries like <code>x[order(.)]</code> are optimised internally to use <code>data.table</code>'s fast order. </p> <p>Also note that <code>data.table</code> always reorders in "C-locale" (see Details). To sort by session locale, use <code>x[base::order(.)]</code>. </p> <p><code>bit64::integer64</code> type is also supported for reordering rows of a <code>data.table</code>. </p> <h3>Usage</h3> <pre> setorder(x, ..., na.last=FALSE) setorderv(x, cols = colnames(x), order=1L, na.last=FALSE) # optimised to use data.table's internal fast order # x[order(., na.last=TRUE)] </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x</code></td> <td> <p> A <code>data.table</code>. </p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p> The columns to sort by. Do not quote column names. If <code>...</code> is missing (ex: <code>setorder(x)</code>), <code>x</code> is rearranged based on all columns in ascending order by default. To sort by a column in descending order prefix the symbol <code>"-"</code> which means "descending" (<em>not</em> "negative", in this context), i.e., <code>setorder(x, a, -b, c)</code>. The <code>-b</code> works when <code>b</code> is of type <code>character</code> as well. </p> </td></tr> <tr valign="top"><td><code>cols</code></td> <td> <p> A character vector of column names of <code>x</code> by which to order. By default, sorts over all columns; <code>cols = NULL</code> will return <code>x</code> untouched. Do not add <code>"-"</code> here. Use <code>order</code> argument instead. </p> </td></tr> <tr valign="top"><td><code>order</code></td> <td> <p> An integer vector with only possible values of <code>1</code> and <code>-1</code>, corresponding to ascending and descending order. The length of <code>order</code> must be either <code>1</code> or equal to that of <code>cols</code>. If <code>length(order) == 1</code>, it is recycled to <code>length(cols)</code>. </p> </td></tr> <tr valign="top"><td><code>na.last</code></td> <td> <p><code>logical</code>. If <code>TRUE</code>, missing values in the data are placed last; if <code>FALSE</code>, they are placed first; if <code>NA</code> they are removed. <code>na.last=NA</code> is valid only for <code>x[order(., na.last)]</code> and its default is <code>TRUE</code>. <code>setorder</code> and <code>setorderv</code> only accept <code>TRUE</code>/<code>FALSE</code> with default <code>FALSE</code>. </p> </td></tr> </table> <h3>Details</h3> <p><code>data.table</code> implements its own fast radix-based ordering. See the references for some exposition on the concept of radix sort. </p> <p><code>setorder</code> accepts unquoted column names (with names preceded with a <code>-</code> sign for descending order) and reorders <code>data.table</code> rows <em>by reference</em>, for e.g., <code>setorder(x, a, -b, c)</code>. We emphasize that this means "descending" and not "negative" because the implementation simply reverses the sort order, as opposed to sorting the opposite of the input (which would be inefficient). </p> <p>Note that <code>-b</code> also works with columns of type <code>character</code> unlike <code><a href="../../base/html/order.html">order</a></code>, which requires <code>-xtfrm(y)</code> instead (which is slow). <code>setorderv</code> in turn accepts a character vector of column names and an integer vector of column order separately. </p> <p>Note that <code><a href="setkey.html">setkey</a></code> still requires and will always sort only in ascending order, and is different from <code>setorder</code> in that it additionally sets the <code>sorted</code> attribute. </p> <p><code>na.last</code> argument, by default, is <code>FALSE</code> for <code>setorder</code> and <code>setorderv</code> to be consistent with <code>data.table</code>'s <code>setkey</code> and is <code>TRUE</code> for <code>x[order(.)]</code> to be consistent with <code>base::order</code>. Only <code>x[order(.)]</code> can have <code>na.last = NA</code> as it is a subset operation as opposed to <code>setorder</code> or <code>setorderv</code> which reorders the data.table by reference. </p> <p><code>data.table</code> always reorders in "C-locale". As a consequence, the ordering may be different to that obtained by <code>base::order</code>. In English locales, for example, sorting is case-sensitive in C-locale. Thus, sorting <code>c("c", "a", "B")</code> returns <code>c("B", "a", "c")</code> in <code>data.table</code> but <code>c("a", "B", "c")</code> in <code>base::order</code>. Note this makes no difference in most cases of data; both return identical results on ids where only upper-case or lower-case letters are present (<code>"AB123" < "AC234"</code> is true in both), or on country names and other proper nouns which are consistently capitalized. For example, neither <code>"America" < "Brazil"</code> nor <code>"america" < "brazil"</code> are affected since the first letter is consistently capitalized. </p> <p>Using C-locale makes the behaviour of sorting in <code>data.table</code> more consistent across sessions and locales. The behaviour of <code>base::order</code> depends on assumptions about the locale of the R session. In English locales, <code>"america" < "BRAZIL"</code> is true by default but false if you either type <code>Sys.setlocale(locale="C")</code> or the R session has been started in a C locale for you – which can happen on servers/services since the locale comes from the environment the R session was started in. By contrast, <code>"america" < "BRAZIL"</code> is always <code>FALSE</code> in <code>data.table</code> regardless of the way your R session was started. </p> <p>If <code>setorder</code> results in reordering of the rows of a keyed <code>data.table</code>, then its key will be set to <code>NULL</code>. </p> <h3>Value</h3> <p>The input is modified by reference, and returned (invisibly) so it can be used in compound statements; e.g., <code>setorder(DT,a,-b)[, cumsum(c), by=list(a,b)]</code>. If you require a copy, take a copy first (using <code>DT2 = copy(DT)</code>). See <code><a href="copy.html">copy</a></code>. </p> <h3>References</h3> <p><a href="https://en.wikipedia.org/wiki/Radix_sort">https://en.wikipedia.org/wiki/Radix_sort</a><br /> <a href="https://en.wikipedia.org/wiki/Counting_sort">https://en.wikipedia.org/wiki/Counting_sort</a><br /> <a href="http://stereopsis.com/radix.html">http://stereopsis.com/radix.html</a><br /> <a href="https://codercorner.com/RadixSortRevisited.htm">https://codercorner.com/RadixSortRevisited.htm</a><br /> <a href="https://medium.com/basecs/getting-to-the-root-of-sorting-with-radix-sort-f8e9240d4224">https://medium.com/basecs/getting-to-the-root-of-sorting-with-radix-sort-f8e9240d4224</a> </p> <h3>See Also</h3> <p><code><a href="setkey.html">setkey</a></code>, <code><a href="setcolorder.html">setcolorder</a></code>, <code><a href="setattr.html">setattr</a></code>, <code><a href="setattr.html">setnames</a></code>, <code><a href="assign.html">set</a></code>, <code><a href="assign.html">:=</a></code>, <code><a href="setDT.html">setDT</a></code>, <code><a href="setDF.html">setDF</a></code>, <code><a href="copy.html">copy</a></code>, <code><a href="setNumericRounding.html">setNumericRounding</a></code> </p> <h3>Examples</h3> <pre> set.seed(45L) DT = data.table(A=sample(3, 10, TRUE), B=sample(letters[1:3], 10, TRUE), C=sample(10)) # setorder setorder(DT, A, -B) # same as above, but using setorderv setorderv(DT, c("A", "B"), c(1, -1)) </pre> <hr /><div style="text-align: center;">[Package <em>data.table</em> version 1.14.4 <a href="00Index.html">Index</a>]</div> </body></html>