EVOLUTION-MANAGER

Edit File: froll.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Rolling functions</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="R.css" />
</head><body>

<table width="100%" summary="page for roll {data.table}"><tr><td>roll {data.table}</td><td style="text-align: right;">R Documentation</td></tr></table>

<h2>Rolling functions</h2>

<h3>Description</h3>

<p>Fast rolling functions to calculate aggregates on sliding window. Function name and arguments are experimental.
</p>

<h3>Usage</h3>

<pre>
frollmean(x, n, fill=NA, algo=c("fast", "exact"), align=c("right",
  "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE)
frollsum(x, n, fill=NA, algo=c("fast","exact"), align=c("right", "left",
  "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE)
frollapply(x, n, FUN, ..., fill=NA, align=c("right", "left", "center"))
</pre>

<h3>Arguments</h3>

<table summary="R argblock">
<tr valign="top"><td><code>x</code></td>
<td>
<p> vector, list, data.frame or data.table of numeric or logical columns. </p>
</td></tr>
<tr valign="top"><td><code>n</code></td>
<td>
<p> integer vector, for adaptive rolling function also list of
integer vectors, rolling window size. </p>
</td></tr>
<tr valign="top"><td><code>fill</code></td>
<td>
<p> numeric, value to pad by. Defaults to <code>NA</code>. </p>
</td></tr>
<tr valign="top"><td><code>algo</code></td>
<td>
<p> character, default <code>"fast"</code>. When set to <code>"exact"</code>,
then slower algorithm is used. It suffers less from floating point
rounding error, performs extra pass to adjust rounding error
correction and carefully handles all non-finite values. If available
it will use multiple cores. See details for more information. </p>
</td></tr>
<tr valign="top"><td><code>align</code></td>
<td>
<p> character, define if rolling window covers preceding rows
(<code>"right"</code>), following rows (<code>"left"</code>) or centered
(<code>"center"</code>). Defaults to <code>"right"</code>. </p>
</td></tr>
<tr valign="top"><td><code>na.rm</code></td>
<td>
<p> logical. Should missing values be removed when
calculating window? Defaults to <code>FALSE</code>. For details on handling
other non-finite values, see details below. </p>
</td></tr>
<tr valign="top"><td><code>hasNA</code></td>
<td>
<p> logical. If it is known that <code>x</code> contains <code>NA</code>
then setting to <code>TRUE</code> will speed up. Defaults to <code>NA</code>. </p>
</td></tr>
<tr valign="top"><td><code>adaptive</code></td>
<td>
<p> logical, should adaptive rolling function be
calculated, default <code>FALSE</code>. See details below. </p>
</td></tr>
<tr valign="top"><td><code>FUN</code></td>
<td>
<p> the function to be applied in rolling fashion; see Details for restrictions </p>
</td></tr>
<tr valign="top"><td><code>...</code></td>
<td>
<p> extra arguments passed to <code>FUN</code> in <code>frollapply</code>. </p>
</td></tr>
</table>

<h3>Details</h3>

<p><code>froll*</code> functions accepts vectors, lists, data.frames or
data.tables. They always return a list except when the input is a
<code>vector</code> and <code>length(n)==1</code> in which case a <code>vector</code>
is returned, for convenience. Thus rolling functions can be used
conveniently within data.table syntax.
</p>
<p>Argument <code>n</code> allows multiple values to apply rolling functions on
multiple window sizes. If <code>adaptive=TRUE</code>, then it expects a list.
Each list element must be integer vector of window sizes corresponding
to every single observation in each column.
</p>
<p>When <code>algo="fast"</code> then <em>on-line</em> algorithm is used, also
any <code>NaN, +Inf, -Inf</code> is treated as <code>NA</code>.
Setting <code>algo="exact"</code> will make rolling functions to use
compute-intensive algorithm that suffers less from floating point
rounding error. It also handles <code>NaN, +Inf, -Inf</code> consistently to
base R. In case of some functions (like <em>mean</em>), it will additionally
make extra pass to perform floating point error correction. Error
corrections might not be truly exact on some platforms (like Windows)
when using multiple threads.
</p>
<p>Adaptive rolling functions are special cases where for each single
observation has own corresponding rolling window width. Due to the logic
of adaptive rolling functions, following restrictions apply:
</p>

<ul>
<li> <p><code>align</code> only <code>"right"</code>. 
</p>
</li>
<li><p> if list of vectors is passed to <code>x</code>, then all
list vectors must have equal length. 
</p>
</li></ul>

<p>When multiple columns or multiple windows width are provided, then they
are run in parallel. Except for the <code>algo="exact"</code> which runs in
parallel already.
</p>
<p><code>frollapply</code> computes rolling aggregate on arbitrary R functions.
The input <code>x</code> (first argument) to the function <code>FUN</code>
is coerced to <em>numeric</em> beforehand and <code>FUN</code>
has to return a scalar <em>numeric</em> value. Checks for that are made only
during the first iteration when <code>FUN</code> is evaluated. Edge cases can be
found in examples below. Any R function is supported, but it is not optimized
using our own C implementation &ndash; hence, for example, using <code>frollapply</code>
to compute a rolling average is inefficient. It is also always single-threaded
because there is no thread-safe API to R's C <code>eval</code>. Nevertheless we've
seen the computation speed up vis-a-vis versions implemented in base R.
</p>

<h3>Value</h3>

<p>A list except when the input is a <code>vector</code> and
<code>length(n)==1</code> in which case a <code>vector</code> is returned.
</p>

<p>Users coming from most popular package for rolling functions
<code>zoo</code> might expect following differences in <code>data.table</code>
implementation.
</p>

<ul>
<li><p> rolling function will always return result of the same length
as input. 
</p>
</li>
<li> <p><code>fill</code> defaults to <code>NA</code>. 
</p>
</li>
<li> <p><code>fill</code> accepts only constant values. It does not support
for <em>na.locf</em> or other functions. 
</p>
</li>
<li> <p><code>align</code> defaults to <code>"right"</code>. 
</p>
</li>
<li> <p><code>na.rm</code> is respected, and other functions are not needed
when input contains <code>NA</code>. 
</p>
</li>
<li><p> integers and logical are always coerced to double. 
</p>
</li>
<li><p> when <code>adaptive=FALSE</code> (default), then <code>n</code> must be a
numeric vector. List is not accepted. 
</p>
</li>
<li><p> when <code>adaptive=TRUE</code>, then <code>n</code> must be vector of
length equal to <code>nrow(x)</code>, or list of such vectors. 
</p>
</li>
<li> <p><code>partial</code> window feature is not supported, although it can
be accomplished by using <code>adaptive=TRUE</code>, see examples. 
</p>
</li></ul>

<p>Be aware that rolling functions operates on the physical order of input.
If the intent is to roll values in a vector by a logical window, for
example an hour, or a day, one has to ensure that there are no gaps in
input. For details see <a href="https://github.com/Rdatatable/data.table/issues/3241">issue #3241</a>.
</p>

<h3>References</h3>

<p><a href="https://en.wikipedia.org/wiki/Round-off_error">Round-off error</a>
</p>

<p><code><a href="shift.html">shift</a></code>, <code><a href="data.table.html">data.table</a></code>
</p>

<h3>Examples</h3>

<pre>
d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three calls above will use multiple cores when available

# partial window using adaptive rolling function
an = function(n, len) c(seq.int(n), rep(n, len-n))
n = an(3, nrow(d))
frollmean(d, n, adaptive=TRUE)

# frollsum
frollsum(d, 3:4)

# frollapply
frollapply(d, 3:4, sum)
f = function(x, ...) if (sum(x, ...)&gt;5) min(x, ...) else max(x, ...)
frollapply(d, 3:4, f, na.rm=TRUE)

# performance vs exactness
set.seed(108)
x = sample(c(rnorm(1e3, 1e6, 5e5), 5e9, 5e-9))
n = 15
ma = function(x, n, na.rm=FALSE) {
  ans = rep(NA_real_, nx&lt;-length(x))
  for (i in n:nx) ans[i] = mean(x[(i-n+1):i], na.rm=na.rm)
  ans
}
fastma = function(x, n, na.rm) {
  if (!missing(na.rm)) stop("NAs are unsupported, wrongly propagated by cumsum")
  cs = cumsum(x)
  scs = shift(cs, n)
  scs[n] = 0
  as.double((cs-scs)/n)
}
system.time(ans1&lt;-ma(x, n))
system.time(ans2&lt;-fastma(x, n))
system.time(ans3&lt;-frollmean(x, n))
system.time(ans4&lt;-frollmean(x, n, algo="exact"))
system.time(ans5&lt;-frollapply(x, n, mean))
anserr = list(
  fastma = ans2-ans1,
  froll_fast = ans3-ans1,
  froll_exact = ans4-ans1,
  frollapply = ans5-ans1
)
errs = sapply(lapply(anserr, abs), sum, na.rm=TRUE)
sapply(errs, format, scientific=FALSE) # roundoff

# frollapply corner cases
f = function(x) head(x, 2)     ## FUN returns non length 1
try(frollapply(1:5, 3, f))
f = function(x) {              ## FUN sometimes returns non length 1
  n = length(x)
  # length 1 will be returned only for first iteration where we check length
  if (n==x[n]) x[1L] else range(x) # range(x)[2L] is silently ignored!
}
frollapply(1:5, 3, f)
options(datatable.verbose=TRUE)
x = c(1,2,1,1,1,2,3,2)
frollapply(x, 3, uniqueN)     ## FUN returns integer
numUniqueN = function(x) as.numeric(uniqueN(x))
frollapply(x, 3, numUniqueN)
x = c(1,2,1,1,NA,2,NA,2)
frollapply(x, 3, anyNA)       ## FUN returns logical
as.logical(frollapply(x, 3, anyNA))
options(datatable.verbose=FALSE)
f = function(x) {             ## FUN returns character
  if (sum(x)&gt;5) "big" else "small"
}
try(frollapply(1:5, 3, f))
f = function(x) {             ## FUN is not type-stable
  n = length(x)
  # double type will be returned only for first iteration where we check type
  if (n==x[n]) 1 else NA # NA logical turns into garbage without coercion to double
}
try(frollapply(1:5, 3, f))
</pre>

<hr /><div style="text-align: center;">[Package <em>data.table</em> version 1.14.4 <a href="00Index.html">Index</a>]</div>
</body></html>