EVOLUTION-MANAGER
Edit File: split.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Split data.table into chunks in a list</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for split {data.table}"><tr><td>split {data.table}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2> Split data.table into chunks in a list </h2> <h3>Description</h3> <p>Split method for data.table. Faster and more flexible. Be aware that processing list of data.tables will be generally much slower than manipulation in single data.table by group using <code>by</code> argument, read more on <code><a href="data.table.html">data.table</a></code>. </p> <h3>Usage</h3> <pre> ## S3 method for class 'data.table' split(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TRUE, flatten = TRUE, ..., verbose = getOption("datatable.verbose")) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x</code></td> <td> <p>data.table </p> </td></tr> <tr valign="top"><td><code>f</code></td> <td> <p>factor or list of factors. Same as <code><a href="../../base/html/split.html">split.data.frame</a></code>. Use <code>by</code> argument instead, this is just for consistency with data.frame method.</p> </td></tr> <tr valign="top"><td><code>drop</code></td> <td> <p>logical. Default <code>FALSE</code> will not drop empty list elements caused by factor levels not referred by that factors. Works also with new arguments of split data.table method.</p> </td></tr> <tr valign="top"><td><code>by</code></td> <td> <p>character vector. Column names on which split should be made. For <code>length(by) > 1L</code> and <code>flatten</code> FALSE it will result nested lists with data.tables on leafs.</p> </td></tr> <tr valign="top"><td><code>sorted</code></td> <td> <p>When default <code>FALSE</code> it will retain the order of groups we are splitting on. When <code>TRUE</code> then sorted list(s) are returned. Does not have effect for <code>f</code> argument.</p> </td></tr> <tr valign="top"><td><code>keep.by</code></td> <td> <p>logical default <code>TRUE</code>. Keep column provided to <code>by</code> argument.</p> </td></tr> <tr valign="top"><td><code>flatten</code></td> <td> <p>logical default <code>TRUE</code> will unlist nested lists of data.tables. When using <code>f</code> results are always flattened to list of data.tables.</p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p>passed to data.frame way of processing when using <code>f</code> argument.</p> </td></tr> <tr valign="top"><td><code>verbose</code></td> <td> <p>logical default <code>FALSE</code>. When <code>TRUE</code> it will print to console data.table split query used to split data.</p> </td></tr> </table> <h3>Details</h3> <p>Argument <code>f</code> is just for consistency in usage to data.frame method. Recommended is to use <code>by</code> argument instead, it will be faster, more flexible, and by default will preserve order according to order in data. </p> <h3>Value</h3> <p>List of <code>data.table</code>s. If using <code>flatten</code> FALSE and <code>length(by) > 1L</code> then recursively nested lists having <code>data.table</code>s as leafs of grouping according to <code>by</code> argument. </p> <h3>See Also</h3> <p><code><a href="data.table.html">data.table</a></code>, <code><a href="rbindlist.html">rbindlist</a></code> </p> <h3>Examples</h3> <pre> set.seed(123) DT = data.table(x1 = rep(letters[1:2], 6), x2 = rep(letters[3:5], 4), x3 = rep(letters[5:8], 3), y = rnorm(12)) DT = DT[sample(.N)] DF = as.data.frame(DT) # split consistency with data.frame: `x, f, drop` all.equal( split(DT, list(DT$x1, DT$x2)), lapply(split(DF, list(DF$x1, DF$x2)), setDT) ) # nested list using `flatten` arguments split(DT, by=c("x1", "x2")) split(DT, by=c("x1", "x2"), flatten=FALSE) # dealing with factors fdt = DT[, c(lapply(.SD, as.factor), list(y=y)), .SDcols=x1:x3] fdf = as.data.frame(fdt) sdf = split(fdf, list(fdf$x1, fdf$x2)) all.equal( split(fdt, by=c("x1", "x2"), sorted=TRUE), lapply(sdf[sort(names(sdf))], setDT) ) # factors having unused levels, drop FALSE, TRUE fdt = DT[, .(x1 = as.factor(c(as.character(x1), "c"))[-13L], x2 = as.factor(c("a", as.character(x2)))[-1L], x3 = as.factor(c("a", as.character(x3), "z"))[c(-1L,-14L)], y = y)] fdf = as.data.frame(fdt) sdf = split(fdf, list(fdf$x1, fdf$x2)) all.equal( split(fdt, by=c("x1", "x2"), sorted=TRUE), lapply(sdf[sort(names(sdf))], setDT) ) sdf = split(fdf, list(fdf$x1, fdf$x2), drop=TRUE) all.equal( split(fdt, by=c("x1", "x2"), sorted=TRUE, drop=TRUE), lapply(sdf[sort(names(sdf))], setDT) ) </pre> <hr /><div style="text-align: center;">[Package <em>data.table</em> version 1.14.4 <a href="00Index.html">Index</a>]</div> </body></html>