EVOLUTION-MANAGER
Edit File: froll.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Rolling functions</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for roll {data.table}"><tr><td>roll {data.table}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Rolling functions</h2> <h3>Description</h3> <p>Fast rolling functions to calculate aggregates on sliding window. Function name and arguments are experimental. </p> <h3>Usage</h3> <pre> frollmean(x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) frollsum(x, n, fill=NA, algo=c("fast","exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) frollapply(x, n, FUN, ..., fill=NA, align=c("right", "left", "center")) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x</code></td> <td> <p> vector, list, data.frame or data.table of numeric or logical columns. </p> </td></tr> <tr valign="top"><td><code>n</code></td> <td> <p> integer vector, for adaptive rolling function also list of integer vectors, rolling window size. </p> </td></tr> <tr valign="top"><td><code>fill</code></td> <td> <p> numeric, value to pad by. Defaults to <code>NA</code>. </p> </td></tr> <tr valign="top"><td><code>algo</code></td> <td> <p> character, default <code>"fast"</code>. When set to <code>"exact"</code>, then slower algorithm is used. It suffers less from floating point rounding error, performs extra pass to adjust rounding error correction and carefully handles all non-finite values. If available it will use multiple cores. See details for more information. </p> </td></tr> <tr valign="top"><td><code>align</code></td> <td> <p> character, define if rolling window covers preceding rows (<code>"right"</code>), following rows (<code>"left"</code>) or centered (<code>"center"</code>). Defaults to <code>"right"</code>. </p> </td></tr> <tr valign="top"><td><code>na.rm</code></td> <td> <p> logical. Should missing values be removed when calculating window? Defaults to <code>FALSE</code>. For details on handling other non-finite values, see details below. </p> </td></tr> <tr valign="top"><td><code>hasNA</code></td> <td> <p> logical. If it is known that <code>x</code> contains <code>NA</code> then setting to <code>TRUE</code> will speed up. Defaults to <code>NA</code>. </p> </td></tr> <tr valign="top"><td><code>adaptive</code></td> <td> <p> logical, should adaptive rolling function be calculated, default <code>FALSE</code>. See details below. </p> </td></tr> <tr valign="top"><td><code>FUN</code></td> <td> <p> the function to be applied in rolling fashion; see Details for restrictions </p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p> extra arguments passed to <code>FUN</code> in <code>frollapply</code>. </p> </td></tr> </table> <h3>Details</h3> <p><code>froll*</code> functions accepts vectors, lists, data.frames or data.tables. They always return a list except when the input is a <code>vector</code> and <code>length(n)==1</code> in which case a <code>vector</code> is returned, for convenience. Thus rolling functions can be used conveniently within data.table syntax. </p> <p>Argument <code>n</code> allows multiple values to apply rolling functions on multiple window sizes. If <code>adaptive=TRUE</code>, then it expects a list. Each list element must be integer vector of window sizes corresponding to every single observation in each column. </p> <p>When <code>algo="fast"</code> then <em>on-line</em> algorithm is used, also any <code>NaN, +Inf, -Inf</code> is treated as <code>NA</code>. Setting <code>algo="exact"</code> will make rolling functions to use compute-intensive algorithm that suffers less from floating point rounding error. It also handles <code>NaN, +Inf, -Inf</code> consistently to base R. In case of some functions (like <em>mean</em>), it will additionally make extra pass to perform floating point error correction. Error corrections might not be truly exact on some platforms (like Windows) when using multiple threads. </p> <p>Adaptive rolling functions are special cases where for each single observation has own corresponding rolling window width. Due to the logic of adaptive rolling functions, following restrictions apply: </p> <ul> <li> <p><code>align</code> only <code>"right"</code>. </p> </li> <li><p> if list of vectors is passed to <code>x</code>, then all list vectors must have equal length. </p> </li></ul> <p>When multiple columns or multiple windows width are provided, then they are run in parallel. Except for the <code>algo="exact"</code> which runs in parallel already. </p> <p><code>frollapply</code> computes rolling aggregate on arbitrary R functions. The input <code>x</code> (first argument) to the function <code>FUN</code> is coerced to <em>numeric</em> beforehand and <code>FUN</code> has to return a scalar <em>numeric</em> value. Checks for that are made only during the first iteration when <code>FUN</code> is evaluated. Edge cases can be found in examples below. Any R function is supported, but it is not optimized using our own C implementation – hence, for example, using <code>frollapply</code> to compute a rolling average is inefficient. It is also always single-threaded because there is no thread-safe API to R's C <code>eval</code>. Nevertheless we've seen the computation speed up vis-a-vis versions implemented in base R. </p> <h3>Value</h3> <p>A list except when the input is a <code>vector</code> and <code>length(n)==1</code> in which case a <code>vector</code> is returned. </p> <h3>Note</h3> <p>Users coming from most popular package for rolling functions <code>zoo</code> might expect following differences in <code>data.table</code> implementation. </p> <ul> <li><p> rolling function will always return result of the same length as input. </p> </li> <li> <p><code>fill</code> defaults to <code>NA</code>. </p> </li> <li> <p><code>fill</code> accepts only constant values. It does not support for <em>na.locf</em> or other functions. </p> </li> <li> <p><code>align</code> defaults to <code>"right"</code>. </p> </li> <li> <p><code>na.rm</code> is respected, and other functions are not needed when input contains <code>NA</code>. </p> </li> <li><p> integers and logical are always coerced to double. </p> </li> <li><p> when <code>adaptive=FALSE</code> (default), then <code>n</code> must be a numeric vector. List is not accepted. </p> </li> <li><p> when <code>adaptive=TRUE</code>, then <code>n</code> must be vector of length equal to <code>nrow(x)</code>, or list of such vectors. </p> </li> <li> <p><code>partial</code> window feature is not supported, although it can be accomplished by using <code>adaptive=TRUE</code>, see examples. </p> </li></ul> <p>Be aware that rolling functions operates on the physical order of input. If the intent is to roll values in a vector by a logical window, for example an hour, or a day, one has to ensure that there are no gaps in input. For details see <a href="https://github.com/Rdatatable/data.table/issues/3241">issue #3241</a>. </p> <h3>References</h3> <p><a href="https://en.wikipedia.org/wiki/Round-off_error">Round-off error</a> </p> <h3>See Also</h3> <p><code><a href="shift.html">shift</a></code>, <code><a href="data.table.html">data.table</a></code> </p> <h3>Examples</h3> <pre> d = as.data.table(list(1:6/2, 3:8/4)) # rollmean of single vector and single window frollmean(d[, V1], 3) # multiple columns at once frollmean(d, 3) # multiple windows at once frollmean(d[, .(V1)], c(3, 4)) # multiple columns and multiple windows at once frollmean(d, c(3, 4)) ## three calls above will use multiple cores when available # partial window using adaptive rolling function an = function(n, len) c(seq.int(n), rep(n, len-n)) n = an(3, nrow(d)) frollmean(d, n, adaptive=TRUE) # frollsum frollsum(d, 3:4) # frollapply frollapply(d, 3:4, sum) f = function(x, ...) if (sum(x, ...)>5) min(x, ...) else max(x, ...) frollapply(d, 3:4, f, na.rm=TRUE) # performance vs exactness set.seed(108) x = sample(c(rnorm(1e3, 1e6, 5e5), 5e9, 5e-9)) n = 15 ma = function(x, n, na.rm=FALSE) { ans = rep(NA_real_, nx<-length(x)) for (i in n:nx) ans[i] = mean(x[(i-n+1):i], na.rm=na.rm) ans } fastma = function(x, n, na.rm) { if (!missing(na.rm)) stop("NAs are unsupported, wrongly propagated by cumsum") cs = cumsum(x) scs = shift(cs, n) scs[n] = 0 as.double((cs-scs)/n) } system.time(ans1<-ma(x, n)) system.time(ans2<-fastma(x, n)) system.time(ans3<-frollmean(x, n)) system.time(ans4<-frollmean(x, n, algo="exact")) system.time(ans5<-frollapply(x, n, mean)) anserr = list( fastma = ans2-ans1, froll_fast = ans3-ans1, froll_exact = ans4-ans1, frollapply = ans5-ans1 ) errs = sapply(lapply(anserr, abs), sum, na.rm=TRUE) sapply(errs, format, scientific=FALSE) # roundoff # frollapply corner cases f = function(x) head(x, 2) ## FUN returns non length 1 try(frollapply(1:5, 3, f)) f = function(x) { ## FUN sometimes returns non length 1 n = length(x) # length 1 will be returned only for first iteration where we check length if (n==x[n]) x[1L] else range(x) # range(x)[2L] is silently ignored! } frollapply(1:5, 3, f) options(datatable.verbose=TRUE) x = c(1,2,1,1,1,2,3,2) frollapply(x, 3, uniqueN) ## FUN returns integer numUniqueN = function(x) as.numeric(uniqueN(x)) frollapply(x, 3, numUniqueN) x = c(1,2,1,1,NA,2,NA,2) frollapply(x, 3, anyNA) ## FUN returns logical as.logical(frollapply(x, 3, anyNA)) options(datatable.verbose=FALSE) f = function(x) { ## FUN returns character if (sum(x)>5) "big" else "small" } try(frollapply(1:5, 3, f)) f = function(x) { ## FUN is not type-stable n = length(x) # double type will be returned only for first iteration where we check type if (n==x[n]) 1 else NA # NA logical turns into garbage without coercion to double } try(frollapply(1:5, 3, f)) </pre> <hr /><div style="text-align: center;">[Package <em>data.table</em> version 1.14.4 <a href="00Index.html">Index</a>]</div> </body></html>