EVOLUTION-MANAGER
Edit File: converting.Rmd
--- title: "Converting from Rcpp" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Converting from Rcpp} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) should_run_benchmarks <- function(x) { get("requireNamespace")("cpp11test") && asNamespace("cpp11test")$should_run_benchmarks() } ``` In many cases there is no need to convert a package from Rcpp. If the code is already written and you don't have a very compelling need to use cpp11 I would recommend you continue to use Rcpp. However if you _do_ feel like your project will benefit from using cpp11 this vignette will provide some guidance and doing the conversion. It is also a place to highlight some of the largest differences between Rcpp and cpp11. ## Class comparison table | Rcpp | cpp11 (read-only) | cpp11 (writable) | cpp11 header | | --- | --- | --- | --- | | NumericVector | doubles | writable::doubles | <cpp11/doubles.hpp> | | IntegerVector | integers | writable::integers | <cpp11/integers.hpp> | | CharacterVector | strings | writable::strings | <cpp11/strings.hpp> | | RawVector | raws | writable::raws | <cpp11/raws.hpp> | | List | list | writable::list | <cpp11/list.hpp> | | RObject | sexp | | <cpp11/sexp.hpp> | | XPtr | | external_pointer | <cpp11/external_pointer.hpp> | | Environment | | environment | <cpp11/environment.hpp> | | Function | | function | <cpp11/function.hpp> | | Environment (namespace) | | package | <cpp11/function.hpp> | | wrap | | as_sexp | <cpp11/as.hpp> | | as | | as_cpp | <cpp11/as.hpp> | | stop | stop | | <cpp11/protect.hpp> | ## Incomplete list of Rcpp features not included in cpp11 - None of [Modules](https://CRAN.R-project.org/package=Rcpp/vignettes/Rcpp-modules.pdf) - None of [Sugar](https://CRAN.R-project.org/package=Rcpp/vignettes/Rcpp-sugar.pdf) - Some parts of [Attributes](https://CRAN.R-project.org/package=Rcpp/vignettes/Rcpp-attributes.pdf) - No dependencies - No random number generator restoration - No support for roxygen2 comments - No interfaces ## Read-only vs writable vectors The largest difference between cpp11 and Rcpp classes is that Rcpp classes modify their data in place, whereas cpp11 classes require copying the data to a writable class for modification. The default classes, e.g. `cpp11::doubles` are *read-only* classes that do not permit modification. If you want to modify the data you need to use the classes in the `cpp11::writable` namespace, e.g. `cpp11::writable::doubles`. In addition use the `writable` variants if you need to create a new R vector entirely in C++. ## Fewer implicit conversions Rcpp also allows very flexible implicit conversions, e.g. if you pass a `REALSXP` to a function that takes a `Rcpp::IntegerVector()` it is implicitly converted to a `INTSXP`. These conversions are nice for usability, but require (implicit) duplication of the data, with the associated runtime costs. cpp11 throws an error in these cases. If you want the implicit coercions you can add a call to `as.integer()` or `as.double()` as appropriate from R when you call the function. ## Calling R functions from C++ Calling R functions from C++ is similar to using Rcpp. ```c++ Rcpp::Function as_tibble("as_tibble", Rcpp::Environment::namespace_env("tibble")); as_tibble(x, Rcpp::Named(".rows", num_rows), Rcpp::Named(".name_repair", name_repair)); ``` ```c++ using namespace cpp11::literals; // so we can use ""_nm syntax auto as_tibble = cpp11::package("tibble")["as_tibble"]; as_tibble(x, ".rows"_nm = num_rows, ".name_repair"_nm = name_repair); ``` ## Appending behavior One major difference in Rcpp and cpp11 is how vectors are grown. Rcpp vectors have a `push_back()` method, but unlike `std::vector()` no additional space is reserved when pushing. This makes calling `push_back()` repeatably very expensive, as the entire vector has to be copied each call. In contrast `cpp11` vectors grow efficiently, reserving extra space. Because of this you can do ~10,000,000 vector appends with cpp11 in approximately the same amount of time that Rcpp does 10,000, as this benchmark demonstrates. ```{r, message = FALSE, eval = should_run_benchmarks()} library(cpp11test) grid <- expand.grid(len = 10 ^ (0:7), pkg = "cpp11", stringsAsFactors = FALSE) grid <- rbind( grid, expand.grid(len = 10 ^ (0:4), pkg = "rcpp", stringsAsFactors = FALSE) ) b_grow <- bench::press(.grid = grid, { fun = match.fun(sprintf("%sgrow_", ifelse(pkg == "cpp11", "", paste0(pkg, "_")))) bench::mark( fun(len) ) } )[c("len", "pkg", "min", "mem_alloc", "n_itr", "n_gc")] saveRDS(b_grow, "growth.Rds", version = 2) ``` ```{r, echo = FALSE, dev = "svg", fig.ext = "svg"} b_grow <- readRDS("growth.Rds") library(ggplot2) ggplot(b_grow, aes(x = len, y = min, color = pkg)) + geom_point() + geom_line() + bench::scale_y_bench_time() + scale_x_log10( breaks = scales::trans_breaks("log10", function(x) 10^x), labels = scales::trans_format("log10", scales::math_format(10^.x)) ) + coord_fixed() + theme(panel.grid.minor = element_blank()) + labs(title = "log-log plot of vector size vs construction time", x = NULL, y = NULL) ``` ```{r, echo = FALSE} knitr::kable(b_grow) ``` ## Random Number behavior Rcpp unconditionally includes calls to `GetRNGstate()` and `PutRNGstate()` before each wrapped function. This ensures that if any C++ code calls the R API functions `unif_rand()`, `norm_rand()`, `exp_rand()` or `R_unif_index()` the random seed state is set accordingly. cpp11 does _not_ do this, so you must include the calls to `GetRNGstate()` and `PutRNGstate()` _yourself_ if you use any of those functions in your C++ code. See [R-exts 6.3 - Random number generation](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Random-numbers) for details on these functions. ## Mechanics of converting a package from Rcpp 1. Add cpp11 to `LinkingTo` 1. Add C++11 to `SystemRequirements` 1. Convert all instances of `// [[Rcpp::export]]` to `[[cpp11::register]]` 1. Clean and recompile the package, e.g. `pkgbuild::clean_dll()` `pkgload::load_all()` 1. Run tests `devtools::test()` 1. Start converting function by function - Remember you can usually inter-convert between cpp11 and Rcpp classes by going through `SEXP` if needed. - Converting the code a bit at a time (and regularly running your tests) is the best way to do the conversion correctly and make progress - Doing a separate commit after converting each file (or possibly each function) can make finding any regressions with [git bisect](https://youtu.be/KKeucpfAuuA) much easier in the future. ## Common issues when converting ### Include order Due to cpp11 redefining the Rboolean enum any cpp11 header needs to come _before_ any Rcpp or Rinternals.h include. Errors like > error: redefinition of enumerator 'FALSE' Indicate this is a problem and can be resolved by ensuring the cpp11 headers are included first. ### STL includes Rcpp.h includes a number of STL headers automatically, notably `<string>` and `<vector>`, however the cpp11 headers generally do not. If you have errors like > error: no type named 'string' in namespace 'std' You will need to include the appropriate STL header, in this case `<string>`. ### R API includes cpp11 defines a compatible but different version of the Rboolean enum. This means that you must ensure that at least one cpp11 header is included _before_ any R headers. One easy way to ensure this is to replace `#include <Rinternals.h>` with `#include <cpp11/R.hpp>`.