EVOLUTION-MANAGER
Edit File: howto-faq-coercion-data-frame.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: FAQ - How to implement ptype2 and cast methods? (Data frames)</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for howto-faq-coercion-data-frame {vctrs}"><tr><td>howto-faq-coercion-data-frame {vctrs}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>FAQ - How to implement ptype2 and cast methods? (Data frames)</h2> <h3>Description</h3> <p>This guide provides a practical recipe for implementing <code>vec_ptype2()</code> and <code>vec_cast()</code> methods for coercions of data frame subclasses. Related topics: </p> <ul> <li><p> For an overview of the coercion mechanism in vctrs, see <code><a href="theory-faq-coercion.html">?theory-faq-coercion</a></code>. </p> </li> <li><p> For an example of implementing coercion methods for simple vectors, see <code><a href="howto-faq-coercion.html">?howto-faq-coercion</a></code>. </p> </li></ul> <p>Coercion of data frames occurs when different data frame classes are combined in some way. The two main methods of combination are currently row-binding with <code><a href="vec_bind.html">vec_rbind()</a></code> and col-binding with <code><a href="vec_bind.html">vec_cbind()</a></code> (which are in turn used by a number of dplyr and tidyr functions). These functions take multiple data frame inputs and automatically coerce them to their common type. </p> <p>vctrs is generally strict about the kind of automatic coercions that are performed when combining inputs. In the case of data frames we have decided to be a bit less strict for convenience. Instead of throwing an incompatible type error, we fall back to a base data frame or a tibble if we don’t know how to combine two data frame subclasses. It is still a good idea to specify the proper coercion behaviour for your data frame subclasses as soon as possible. </p> <p>We will see two examples in this guide. The first example is about a data frame subclass that has no particular attributes to manage. In the second example, we implement coercion methods for a tibble subclass that includes potentially incompatible attributes. </p> <h4>Roxygen workflow</h4> <p>To implement methods for generics, first import the generics in your namespace and redocument: </p> <div class="sourceCode r"><pre>#' @importFrom vctrs vec_ptype2 vec_cast NULL </pre></div> <p>Note that for each batches of methods that you add to your package, you need to export the methods and redocument immediately, even during development. Otherwise they won’t be in scope when you run unit tests e.g. with testthat. </p> <p>Implementing double dispatch methods is very similar to implementing regular S3 methods. In these examples we are using roxygen2 tags to register the methods, but you can also register the methods manually in your NAMESPACE file or lazily with <code>s3_register()</code>. </p> <h4>Parent methods</h4> <p>Most of the common type determination should be performed by the parent class. In vctrs, double dispatch is implemented in such a way that you need to call the methods for the parent class manually. For <code>vec_ptype2()</code> this means you need to call <code>df_ptype2()</code> (for data frame subclasses) or <code>tib_ptype2()</code> (for tibble subclasses). Similarly, <code>df_cast()</code> and <code>tib_cast()</code> are the workhorses for <code>vec_cast()</code> methods of subtypes of <code>data.frame</code> and <code>tbl_df</code>. These functions take the union of the columns in <code>x</code> and <code>y</code>, and ensure shared columns have the same type. </p> <p>These functions are much less strict than <code>vec_ptype2()</code> and <code>vec_cast()</code> as they accept any subclass of data frame as input. They always return a <code>data.frame</code> or a <code>tbl_df</code>. You will probably want to write similar functions for your subclass to avoid repetition in your code. You may want to export them as well if you are expecting other people to derive from your class. </p> <h4>A <code>data.table</code> example</h4> <p>This example is the actual implementation of vctrs coercion methods for <code>data.table</code>. This is a simple example because we don’t have to keep track of attributes for this class or manage incompatibilities. See the tibble section for a more complicated example. </p> <p>We first create the <code>dt_ptype2()</code> and <code>dt_cast()</code> helpers. They wrap around the parent methods <code>df_ptype2()</code> and <code>df_cast()</code>, and transform the common type or converted input to a data table. You may want to export these helpers if you expect other packages to derive from your data frame class. </p> <p>These helpers should always return data tables. To this end we use the conversion generic <code>as.data.table()</code>. Depending on the tools available for the particular class at hand, a constructor might be appropriate as well. </p> <div class="sourceCode r"><pre>dt_ptype2 <- function(x, y, ...) { as.data.table(df_ptype2(x, y, ...)) } dt_cast <- function(x, to, ...) { as.data.table(df_cast(x, to, ...)) } </pre></div> <p>We start with the self-self method: </p> <div class="sourceCode r"><pre>#' @export vec_ptype2.data.table.data.table <- function(x, y, ...) { dt_ptype2(x, y, ...) } </pre></div> <p>Between a data frame and a data table, we consider the richer type to be data table. This decision is not based on the value coverage of each data structures, but on the idea that data tables have richer behaviour. Since data tables are the richer type, we call <code>dt_type2()</code> from the <code>vec_ptype2()</code> method. It always returns a data table, no matter the order of arguments: </p> <div class="sourceCode r"><pre>#' @export vec_ptype2.data.table.data.frame <- function(x, y, ...) { dt_ptype2(x, y, ...) } #' @export vec_ptype2.data.frame.data.table <- function(x, y, ...) { dt_ptype2(x, y, ...) } </pre></div> <p>The <code>vec_cast()</code> methods follow the same pattern, but note how the method for coercing to data frame uses <code>df_cast()</code> rather than <code>dt_cast()</code>. </p> <p>Also, please note that for historical reasons, the order of the classes in the method name is in reverse order of the arguments in the function signature. The first class represents <code>to</code>, whereas the second class represents <code>x</code>. </p> <div class="sourceCode r"><pre>#' @export vec_cast.data.table.data.table <- function(x, to, ...) { dt_cast(x, to, ...) } #' @export vec_cast.data.table.data.frame <- function(x, to, ...) { # `x` is a data.frame to be converted to a data.table dt_cast(x, to, ...) } #' @export vec_cast.data.frame.data.table <- function(x, to, ...) { # `x` is a data.table to be converted to a data.frame df_cast(x, to, ...) } </pre></div> <p>With these methods vctrs is now able to combine data tables with data frames: </p> <div class="sourceCode r"><pre>vec_cbind(data.frame(x = 1:3), data.table(y = "foo")) #> x y #> 1: 1 foo #> 2: 2 foo #> 3: 3 foo </pre></div> <h4>A tibble example</h4> <p>In this example we implement coercion methods for a tibble subclass that carries a colour as a scalar metadata: </p> <div class="sourceCode r"><pre># User constructor my_tibble <- function(colour = NULL, ...) { new_my_tibble(tibble::tibble(...), colour = colour) } # Developer constructor new_my_tibble <- function(x, colour = NULL) { stopifnot(is.data.frame(x)) tibble::new_tibble( x, colour = colour, class = "my_tibble", nrow = nrow(x) ) } df_colour <- function(x) { if (inherits(x, "my_tibble")) { attr(x, "colour") } else { NULL } } #'@export print.my_tibble <- function(x, ...) { cat(sprintf("<%s: %s>\n", class(x)[[1]], df_colour(x))) cli::cat_line(format(x)[-1]) } </pre></div> <p>This subclass is very simple. All it does is modify the header. </p> <div class="sourceCode r"><pre>red <- my_tibble("red", x = 1, y = 1:2) red #> <my_tibble: red> #> x y #> <dbl> <int> #> 1 1 1 #> 2 1 2 red[2] #> <my_tibble: red> #> y #> <int> #> 1 1 #> 2 2 green <- my_tibble("green", z = TRUE) green #> <my_tibble: green> #> z #> <lgl> #> 1 TRUE </pre></div> <p>Combinations do not work properly out of the box, instead vctrs falls back to a bare tibble: </p> <div class="sourceCode r"><pre>vec_rbind(red, tibble::tibble(x = 10:12)) #> # A tibble: 5 x 2 #> x y #> <dbl> <int> #> 1 1 1 #> 2 1 2 #> 3 10 NA #> 4 11 NA #> 5 12 NA </pre></div> <p>Instead of falling back to a data frame, we would like to return a <code style="white-space: pre;"><my_tibble></code> when combined with a data frame or a tibble. Because this subclass has more metadata than normal data frames (it has a colour), it is a <em>supertype</em> of tibble and data frame, i.e. it is the richer type. This is similar to how a grouped tibble is a more general type than a tibble or a data frame. Conceptually, the latter are pinned to a single constant group. </p> <p>The coercion methods for data frames operate in two steps: </p> <ul> <li><p> They check for compatible subclass attributes. In our case the tibble colour has to be the same, or be undefined. </p> </li> <li><p> They call their parent methods, in this case <code><a href="df_ptype2.html">tib_ptype2()</a></code> and <code><a href="df_ptype2.html">tib_cast()</a></code> because we have a subclass of tibble. This eventually calls the data frame methods <code><a href="df_ptype2.html">df_ptype2()</a></code> and <code><a href="df_ptype2.html">tib_ptype2()</a></code> which match the columns and their types. </p> </li></ul> <p>This process should usually be wrapped in two functions to avoid repetition. Consider exporting these if you expect your class to be derived by other subclasses. </p> <p>We first implement a helper to determine if two data frames have compatible colours. We use the <code>df_colour()</code> accessor which returns <code>NULL</code> when the data frame colour is undefined. </p> <div class="sourceCode r"><pre>has_compatible_colours <- function(x, y) { x_colour <- df_colour(x) %||% df_colour(y) y_colour <- df_colour(y) %||% x_colour identical(x_colour, y_colour) } </pre></div> <p>Next we implement the coercion helpers. If the colours are not compatible, we call <code>stop_incompatible_cast()</code> or <code>stop_incompatible_type()</code>. These strict coercion semantics are justified because in this class colour is a <em>data</em> attribute. If it were a non essential <em>detail</em> attribute, like the timezone in a datetime, we would just standardise it to the value of the left-hand side. </p> <p>In simpler cases (like the data.table example), these methods do not need to take the arguments suffixed in <code style="white-space: pre;">_arg</code>. Here we do need to take these arguments so we can pass them to the <code>stop_</code> functions when we detect an incompatibility. They also should be passed to the parent methods. </p> <div class="sourceCode r"><pre>#' @export my_tib_cast <- function(x, to, ..., x_arg = "", to_arg = "") { out <- tib_cast(x, to, ..., x_arg = x_arg, to_arg = to_arg) if (!has_compatible_colours(x, to)) { stop_incompatible_cast( x, to, x_arg = x_arg, to_arg = to_arg, details = "Can't combine colours." ) } colour <- df_colour(x) %||% df_colour(to) new_my_tibble(out, colour = colour) } #' @export my_tib_ptype2 <- function(x, y, ..., x_arg = "", y_arg = "") { out <- tib_ptype2(x, y, ..., x_arg = x_arg, y_arg = y_arg) if (!has_compatible_colours(x, y)) { stop_incompatible_type( x, y, x_arg = x_arg, y_arg = y_arg, details = "Can't combine colours." ) } colour <- df_colour(x) %||% df_colour(y) new_my_tibble(out, colour = colour) } </pre></div> <p>Let’s now implement the coercion methods, starting with the self-self methods. </p> <div class="sourceCode r"><pre>#' @export vec_ptype2.my_tibble.my_tibble <- function(x, y, ...) { my_tib_ptype2(x, y, ...) } #' @export vec_cast.my_tibble.my_tibble <- function(x, to, ...) { my_tib_cast(x, to, ...) } </pre></div> <p>We can now combine compatible instances of our class! </p> <div class="sourceCode r"><pre>vec_rbind(red, red) #> <my_tibble: red> #> x y #> <dbl> <int> #> 1 1 1 #> 2 1 2 #> 3 1 1 #> 4 1 2 vec_rbind(green, green) #> <my_tibble: green> #> z #> <lgl> #> 1 TRUE #> 2 TRUE vec_rbind(green, red) #> Error in `my_tib_ptype2()`: #> ! Can't combine `..1` <my_tibble> and `..2` <my_tibble>. #> Can't combine colours. </pre></div> <p>The methods for combining our class with tibbles follow the same pattern. For ptype2 we return our class in both cases because it is the richer type: </p> <div class="sourceCode r"><pre>#' @export vec_ptype2.my_tibble.tbl_df <- function(x, y, ...) { my_tib_ptype2(x, y, ...) } #' @export vec_ptype2.tbl_df.my_tibble <- function(x, y, ...) { my_tib_ptype2(x, y, ...) } </pre></div> <p>For cast are careful about returning a tibble when casting to a tibble. Note the call to <code>vctrs::tib_cast()</code>: </p> <div class="sourceCode r"><pre>#' @export vec_cast.my_tibble.tbl_df <- function(x, to, ...) { my_tib_cast(x, to, ...) } #' @export vec_cast.tbl_df.my_tibble <- function(x, to, ...) { tib_cast(x, to, ...) } </pre></div> <p>From this point, we get correct combinations with tibbles: </p> <div class="sourceCode r"><pre>vec_rbind(red, tibble::tibble(x = 10:12)) #> <my_tibble: red> #> x y #> <dbl> <int> #> 1 1 1 #> 2 1 2 #> 3 10 NA #> 4 11 NA #> 5 12 NA </pre></div> <p>However we are not done yet. Because the coercion hierarchy is different from the class hierarchy, there is no inheritance of coercion methods. We’re not getting correct behaviour for data frames yet because we haven’t explicitly specified the methods for this class: </p> <div class="sourceCode r"><pre>vec_rbind(red, data.frame(x = 10:12)) #> # A tibble: 5 x 2 #> x y #> <dbl> <int> #> 1 1 1 #> 2 1 2 #> 3 10 NA #> 4 11 NA #> 5 12 NA </pre></div> <p>Let’s finish up the boiler plate: </p> <div class="sourceCode r"><pre>#' @export vec_ptype2.my_tibble.data.frame <- function(x, y, ...) { my_tib_ptype2(x, y, ...) } #' @export vec_ptype2.data.frame.my_tibble <- function(x, y, ...) { my_tib_ptype2(x, y, ...) } #' @export vec_cast.my_tibble.data.frame <- function(x, to, ...) { my_tib_cast(x, to, ...) } #' @export vec_cast.data.frame.my_tibble <- function(x, to, ...) { df_cast(x, to, ...) } </pre></div> <p>This completes the implementation: </p> <div class="sourceCode r"><pre>vec_rbind(red, data.frame(x = 10:12)) #> <my_tibble: red> #> x y #> <dbl> <int> #> 1 1 1 #> 2 1 2 #> 3 10 NA #> 4 11 NA #> 5 12 NA </pre></div> <hr /><div style="text-align: center;">[Package <em>vctrs</em> version 0.5.0 <a href="00Index.html">Index</a>]</div> </body></html>