EVOLUTION-MANAGER

Edit File: howto-faq-coercion-data-frame.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: FAQ - How to implement ptype2 and cast methods? (Data frames)</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="R.css" />
</head><body>

<table width="100%" summary="page for howto-faq-coercion-data-frame {vctrs}"><tr><td>howto-faq-coercion-data-frame {vctrs}</td><td style="text-align: right;">R Documentation</td></tr></table>

<h2>FAQ - How to implement ptype2 and cast methods? (Data frames)</h2>

<h3>Description</h3>

<p>This guide provides a practical recipe for implementing <code>vec_ptype2()</code>
and <code>vec_cast()</code> methods for coercions of data frame subclasses. Related
topics:
</p>

<ul>
<li><p> For an overview of the coercion mechanism in vctrs, see
<code><a href="theory-faq-coercion.html">?theory-faq-coercion</a></code>.
</p>
</li>
<li><p> For an example of implementing coercion methods for simple vectors,
see <code><a href="howto-faq-coercion.html">?howto-faq-coercion</a></code>.
</p>
</li></ul>

<p>Coercion of data frames occurs when different data frame classes are
combined in some way. The two main methods of combination are currently
row-binding with <code><a href="vec_bind.html">vec_rbind()</a></code> and col-binding with
<code><a href="vec_bind.html">vec_cbind()</a></code> (which are in turn used by a number of
dplyr and tidyr functions). These functions take multiple data frame
inputs and automatically coerce them to their common type.
</p>
<p>vctrs is generally strict about the kind of automatic coercions that are
performed when combining inputs. In the case of data frames we have
decided to be a bit less strict for convenience. Instead of throwing an
incompatible type error, we fall back to a base data frame or a tibble
if we don’t know how to combine two data frame subclasses. It is still a
good idea to specify the proper coercion behaviour for your data frame
subclasses as soon as possible.
</p>
<p>We will see two examples in this guide. The first example is about a
data frame subclass that has no particular attributes to manage. In the
second example, we implement coercion methods for a tibble subclass that
includes potentially incompatible attributes.
</p>

<h4>Roxygen workflow</h4>

<p>To implement methods for generics, first import the generics in your
namespace and redocument:
</p>
<div class="sourceCode r"><pre>#' @importFrom vctrs vec_ptype2 vec_cast
NULL
</pre></div>
<p>Note that for each batches of methods that you add to your package, you
need to export the methods and redocument immediately, even during
development. Otherwise they won’t be in scope when you run unit tests
e.g. with testthat.
</p>
<p>Implementing double dispatch methods is very similar to implementing
regular S3 methods. In these examples we are using roxygen2 tags to
register the methods, but you can also register the methods manually in
your NAMESPACE file or lazily with <code>s3_register()</code>.
</p>

<h4>Parent methods</h4>

<p>Most of the common type determination should be performed by the parent
class. In vctrs, double dispatch is implemented in such a way that you
need to call the methods for the parent class manually. For
<code>vec_ptype2()</code> this means you need to call <code>df_ptype2()</code> (for data frame
subclasses) or <code>tib_ptype2()</code> (for tibble subclasses). Similarly,
<code>df_cast()</code> and <code>tib_cast()</code> are the workhorses for <code>vec_cast()</code> methods
of subtypes of <code>data.frame</code> and <code>tbl_df</code>. These functions take the union
of the columns in <code>x</code> and <code>y</code>, and ensure shared columns have the same
type.
</p>
<p>These functions are much less strict than <code>vec_ptype2()</code> and
<code>vec_cast()</code> as they accept any subclass of data frame as input. They
always return a <code>data.frame</code> or a <code>tbl_df</code>. You will probably want to
write similar functions for your subclass to avoid repetition in your
code. You may want to export them as well if you are expecting other
people to derive from your class.
</p>

<h4>A <code>data.table</code> example</h4>

<p>This example is the actual implementation of vctrs coercion methods for
<code>data.table</code>. This is a simple example because we don’t have to keep
track of attributes for this class or manage incompatibilities. See the
tibble section for a more complicated example.
</p>
<p>We first create the <code>dt_ptype2()</code> and <code>dt_cast()</code> helpers. They wrap
around the parent methods <code>df_ptype2()</code> and <code>df_cast()</code>, and transform
the common type or converted input to a data table. You may want to
export these helpers if you expect other packages to derive from your
data frame class.
</p>
<p>These helpers should always return data tables. To this end we use the
conversion generic <code>as.data.table()</code>. Depending on the tools available
for the particular class at hand, a constructor might be appropriate as
well.
</p>
<div class="sourceCode r"><pre>dt_ptype2 &lt;- function(x, y, ...) {
  as.data.table(df_ptype2(x, y, ...))
}
dt_cast &lt;- function(x, to, ...) {
  as.data.table(df_cast(x, to, ...))
}
</pre></div>
<p>We start with the self-self method:
</p>
<div class="sourceCode r"><pre>#' @export
vec_ptype2.data.table.data.table &lt;- function(x, y, ...) {
  dt_ptype2(x, y, ...)
}
</pre></div>
<p>Between a data frame and a data table, we consider the richer type to be
data table. This decision is not based on the value coverage of each
data structures, but on the idea that data tables have richer behaviour.
Since data tables are the richer type, we call <code>dt_type2()</code> from the
<code>vec_ptype2()</code> method. It always returns a data table, no matter the
order of arguments:
</p>
<div class="sourceCode r"><pre>#' @export
vec_ptype2.data.table.data.frame &lt;- function(x, y, ...) {
  dt_ptype2(x, y, ...)
}
#' @export
vec_ptype2.data.frame.data.table &lt;- function(x, y, ...) {
  dt_ptype2(x, y, ...)
}
</pre></div>
<p>The <code>vec_cast()</code> methods follow the same pattern, but note how the
method for coercing to data frame uses <code>df_cast()</code> rather than
<code>dt_cast()</code>.
</p>
<p>Also, please note that for historical reasons, the order of the classes
in the method name is in reverse order of the arguments in the function
signature. The first class represents <code>to</code>, whereas the second class
represents <code>x</code>.
</p>
<div class="sourceCode r"><pre>#' @export
vec_cast.data.table.data.table &lt;- function(x, to, ...) {
  dt_cast(x, to, ...)
}
#' @export
vec_cast.data.table.data.frame &lt;- function(x, to, ...) {
  # `x` is a data.frame to be converted to a data.table
  dt_cast(x, to, ...)
}
#' @export
vec_cast.data.frame.data.table &lt;- function(x, to, ...) {
  # `x` is a data.table to be converted to a data.frame
  df_cast(x, to, ...)
}
</pre></div>
<p>With these methods vctrs is now able to combine data tables with data
frames:
</p>
<div class="sourceCode r"><pre>vec_cbind(data.frame(x = 1:3), data.table(y = "foo"))
#&gt;    x   y
#&gt; 1: 1 foo
#&gt; 2: 2 foo
#&gt; 3: 3 foo
</pre></div>

<h4>A tibble example</h4>

<p>In this example we implement coercion methods for a tibble subclass that
carries a colour as a scalar metadata:
</p>
<div class="sourceCode r"><pre># User constructor
my_tibble &lt;- function(colour = NULL, ...) {
  new_my_tibble(tibble::tibble(...), colour = colour)
}
# Developer constructor
new_my_tibble &lt;- function(x, colour = NULL) {
  stopifnot(is.data.frame(x))
  tibble::new_tibble(
    x,
    colour = colour,
    class = "my_tibble",
    nrow = nrow(x)
  )
}

df_colour &lt;- function(x) {
  if (inherits(x, "my_tibble")) {
    attr(x, "colour")
  } else {
    NULL
  }
}

#'@export
print.my_tibble &lt;- function(x, ...) {
  cat(sprintf("&lt;%s: %s&gt;\n", class(x)[[1]], df_colour(x)))
  cli::cat_line(format(x)[-1])
}
</pre></div>
<p>This subclass is very simple. All it does is modify the header.
</p>
<div class="sourceCode r"><pre>red &lt;- my_tibble("red", x = 1, y = 1:2)
red
#&gt; &lt;my_tibble: red&gt;
#&gt;       x     y
#&gt;   &lt;dbl&gt; &lt;int&gt;
#&gt; 1     1     1
#&gt; 2     1     2

red[2]
#&gt; &lt;my_tibble: red&gt;
#&gt;       y
#&gt;   &lt;int&gt;
#&gt; 1     1
#&gt; 2     2

green &lt;- my_tibble("green", z = TRUE)
green
#&gt; &lt;my_tibble: green&gt;
#&gt;   z    
#&gt;   &lt;lgl&gt;
#&gt; 1 TRUE
</pre></div>
<p>Combinations do not work properly out of the box, instead vctrs falls
back to a bare tibble:
</p>
<div class="sourceCode r"><pre>vec_rbind(red, tibble::tibble(x = 10:12))
#&gt; # A tibble: 5 x 2
#&gt;       x     y
#&gt;   &lt;dbl&gt; &lt;int&gt;
#&gt; 1     1     1
#&gt; 2     1     2
#&gt; 3    10    NA
#&gt; 4    11    NA
#&gt; 5    12    NA
</pre></div>
<p>Instead of falling back to a data frame, we would like to return a
<code style="white-space: pre;">&lt;my_tibble&gt;</code> when combined with a data frame or a tibble. Because this
subclass has more metadata than normal data frames (it has a colour), it
is a <em>supertype</em> of tibble and data frame, i.e. it is the richer type.
This is similar to how a grouped tibble is a more general type than a
tibble or a data frame. Conceptually, the latter are pinned to a single
constant group.
</p>
<p>The coercion methods for data frames operate in two steps:
</p>

<ul>
<li><p> They check for compatible subclass attributes. In our case the tibble
colour has to be the same, or be undefined.
</p>
</li>
<li><p> They call their parent methods, in this case
<code><a href="df_ptype2.html">tib_ptype2()</a></code> and <code><a href="df_ptype2.html">tib_cast()</a></code> because
we have a subclass of tibble. This eventually calls the data frame
methods <code><a href="df_ptype2.html">df_ptype2()</a></code> and
<code><a href="df_ptype2.html">tib_ptype2()</a></code> which match the columns and their
types.
</p>
</li></ul>

<p>This process should usually be wrapped in two functions to avoid
repetition. Consider exporting these if you expect your class to be
derived by other subclasses.
</p>
<p>We first implement a helper to determine if two data frames have
compatible colours. We use the <code>df_colour()</code> accessor which returns
<code>NULL</code> when the data frame colour is undefined.
</p>
<div class="sourceCode r"><pre>has_compatible_colours &lt;- function(x, y) {
  x_colour &lt;- df_colour(x) %||% df_colour(y)
  y_colour &lt;- df_colour(y) %||% x_colour
  identical(x_colour, y_colour)
}
</pre></div>
<p>Next we implement the coercion helpers. If the colours are not
compatible, we call <code>stop_incompatible_cast()</code> or
<code>stop_incompatible_type()</code>. These strict coercion semantics are
justified because in this class colour is a <em>data</em> attribute. If it were
a non essential <em>detail</em> attribute, like the timezone in a datetime, we
would just standardise it to the value of the left-hand side.
</p>
<p>In simpler cases (like the data.table example), these methods do not
need to take the arguments suffixed in <code style="white-space: pre;">_arg</code>. Here we do need to take
these arguments so we can pass them to the <code>stop_</code> functions when we
detect an incompatibility. They also should be passed to the parent
methods.
</p>
<div class="sourceCode r"><pre>#' @export
my_tib_cast &lt;- function(x, to, ..., x_arg = "", to_arg = "") {
  out &lt;- tib_cast(x, to, ..., x_arg = x_arg, to_arg = to_arg)

if (!has_compatible_colours(x, to)) {
    stop_incompatible_cast(
      x,
      to,
      x_arg = x_arg,
      to_arg = to_arg,
      details = "Can't combine colours."
    )
  }

colour &lt;- df_colour(x) %||% df_colour(to)
  new_my_tibble(out, colour = colour)
}
#' @export
my_tib_ptype2 &lt;- function(x, y, ..., x_arg = "", y_arg = "") {
  out &lt;- tib_ptype2(x, y, ..., x_arg = x_arg, y_arg = y_arg)

if (!has_compatible_colours(x, y)) {
    stop_incompatible_type(
      x,
      y,
      x_arg = x_arg,
      y_arg = y_arg,
      details = "Can't combine colours."
    )
  }

colour &lt;- df_colour(x) %||% df_colour(y)
  new_my_tibble(out, colour = colour)
}
</pre></div>
<p>Let’s now implement the coercion methods, starting with the self-self
methods.
</p>
<div class="sourceCode r"><pre>#' @export
vec_ptype2.my_tibble.my_tibble &lt;- function(x, y, ...) {
  my_tib_ptype2(x, y, ...)
}
#' @export
vec_cast.my_tibble.my_tibble &lt;- function(x, to, ...) {
  my_tib_cast(x, to, ...)
}
</pre></div>
<p>We can now combine compatible instances of our class!
</p>
<div class="sourceCode r"><pre>vec_rbind(red, red)
#&gt; &lt;my_tibble: red&gt;
#&gt;       x     y
#&gt;   &lt;dbl&gt; &lt;int&gt;
#&gt; 1     1     1
#&gt; 2     1     2
#&gt; 3     1     1
#&gt; 4     1     2

vec_rbind(green, green)
#&gt; &lt;my_tibble: green&gt;
#&gt;   z    
#&gt;   &lt;lgl&gt;
#&gt; 1 TRUE 
#&gt; 2 TRUE

vec_rbind(green, red)
#&gt; Error in `my_tib_ptype2()`:
#&gt; ! Can't combine `..1` &lt;my_tibble&gt; and `..2` &lt;my_tibble&gt;.
#&gt; Can't combine colours.
</pre></div>
<p>The methods for combining our class with tibbles follow the same
pattern. For ptype2 we return our class in both cases because it is the
richer type:
</p>
<div class="sourceCode r"><pre>#' @export
vec_ptype2.my_tibble.tbl_df &lt;- function(x, y, ...) {
  my_tib_ptype2(x, y, ...)
}
#' @export
vec_ptype2.tbl_df.my_tibble &lt;- function(x, y, ...) {
  my_tib_ptype2(x, y, ...)
}
</pre></div>
<p>For cast are careful about returning a tibble when casting to a tibble.
Note the call to <code>vctrs::tib_cast()</code>:
</p>
<div class="sourceCode r"><pre>#' @export
vec_cast.my_tibble.tbl_df &lt;- function(x, to, ...) {
  my_tib_cast(x, to, ...)
}
#' @export
vec_cast.tbl_df.my_tibble &lt;- function(x, to, ...) {
  tib_cast(x, to, ...)
}
</pre></div>
<p>From this point, we get correct combinations with tibbles:
</p>
<div class="sourceCode r"><pre>vec_rbind(red, tibble::tibble(x = 10:12))
#&gt; &lt;my_tibble: red&gt;
#&gt;       x     y
#&gt;   &lt;dbl&gt; &lt;int&gt;
#&gt; 1     1     1
#&gt; 2     1     2
#&gt; 3    10    NA
#&gt; 4    11    NA
#&gt; 5    12    NA
</pre></div>
<p>However we are not done yet. Because the coercion hierarchy is different
from the class hierarchy, there is no inheritance of coercion methods.
We’re not getting correct behaviour for data frames yet because we
haven’t explicitly specified the methods for this class:
</p>
<div class="sourceCode r"><pre>vec_rbind(red, data.frame(x = 10:12))
#&gt; # A tibble: 5 x 2
#&gt;       x     y
#&gt;   &lt;dbl&gt; &lt;int&gt;
#&gt; 1     1     1
#&gt; 2     1     2
#&gt; 3    10    NA
#&gt; 4    11    NA
#&gt; 5    12    NA
</pre></div>
<p>Let’s finish up the boiler plate:
</p>
<div class="sourceCode r"><pre>#' @export
vec_ptype2.my_tibble.data.frame &lt;- function(x, y, ...) {
  my_tib_ptype2(x, y, ...)
}
#' @export
vec_ptype2.data.frame.my_tibble &lt;- function(x, y, ...) {
  my_tib_ptype2(x, y, ...)
}

#' @export
vec_cast.my_tibble.data.frame &lt;- function(x, to, ...) {
  my_tib_cast(x, to, ...)
}
#' @export
vec_cast.data.frame.my_tibble &lt;- function(x, to, ...) {
  df_cast(x, to, ...)
}
</pre></div>
<p>This completes the implementation:
</p>
<div class="sourceCode r"><pre>vec_rbind(red, data.frame(x = 10:12))
#&gt; &lt;my_tibble: red&gt;
#&gt;       x     y
#&gt;   &lt;dbl&gt; &lt;int&gt;
#&gt; 1     1     1
#&gt; 2     1     2
#&gt; 3    10    NA
#&gt; 4    11    NA
#&gt; 5    12    NA
</pre></div>

<hr /><div style="text-align: center;">[Package <em>vctrs</em> version 0.5.0 <a href="00Index.html">Index</a>]</div>
</body></html>