EVOLUTION-MANAGER
Edit File: mutate-joins.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Mutating joins</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for mutate-joins {dplyr}"><tr><td>mutate-joins {dplyr}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Mutating joins</h2> <h3>Description</h3> <p>The mutating joins add columns from <code>y</code> to <code>x</code>, matching rows based on the keys: </p> <ul> <li> <p><code>inner_join()</code>: includes all rows in <code>x</code> and <code>y</code>. </p> </li> <li> <p><code>left_join()</code>: includes all rows in <code>x</code>. </p> </li> <li> <p><code>right_join()</code>: includes all rows in <code>y</code>. </p> </li> <li> <p><code>full_join()</code>: includes all rows in <code>x</code> or <code>y</code>. </p> </li></ul> <p>If a row in <code>x</code> matches multiple rows in <code>y</code>, all the rows in <code>y</code> will be returned once for each matching row in <code>x</code>. </p> <h3>Usage</h3> <pre> inner_join(x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ...) ## S3 method for class 'data.frame' inner_join( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., na_matches = c("na", "never") ) left_join( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE ) ## S3 method for class 'data.frame' left_join( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE, na_matches = c("na", "never") ) right_join( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE ) ## S3 method for class 'data.frame' right_join( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE, na_matches = c("na", "never") ) full_join( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE ) ## S3 method for class 'data.frame' full_join( x, y, by = NULL, copy = FALSE, suffix = c(".x", ".y"), ..., keep = FALSE, na_matches = c("na", "never") ) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x, y</code></td> <td> <p>A pair of data frames, data frame extensions (e.g. a tibble), or lazy data frames (e.g. from dbplyr or dtplyr). See <em>Methods</em>, below, for more details.</p> </td></tr> <tr valign="top"><td><code>by</code></td> <td> <p>A character vector of variables to join by. </p> <p>If <code>NULL</code>, the default, <code style="white-space: pre;">*_join()</code> will perform a natural join, using all variables in common across <code>x</code> and <code>y</code>. A message lists the variables so that you can check they're correct; suppress the message by supplying <code>by</code> explicitly. </p> <p>To join by different variables on <code>x</code> and <code>y</code>, use a named vector. For example, <code>by = c("a" = "b")</code> will match <code>x$a</code> to <code>y$b</code>. </p> <p>To join by multiple variables, use a vector with length > 1. For example, <code>by = c("a", "b")</code> will match <code>x$a</code> to <code>y$a</code> and <code>x$b</code> to <code>y$b</code>. Use a named vector to match different variables in <code>x</code> and <code>y</code>. For example, <code>by = c("a" = "b", "c" = "d")</code> will match <code>x$a</code> to <code>y$b</code> and <code>x$c</code> to <code>y$d</code>. </p> <p>To perform a cross-join, generating all combinations of <code>x</code> and <code>y</code>, use <code>by = character()</code>.</p> </td></tr> <tr valign="top"><td><code>copy</code></td> <td> <p>If <code>x</code> and <code>y</code> are not from the same data source, and <code>copy</code> is <code>TRUE</code>, then <code>y</code> will be copied into the same src as <code>x</code>. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.</p> </td></tr> <tr valign="top"><td><code>suffix</code></td> <td> <p>If there are non-joined duplicate variables in <code>x</code> and <code>y</code>, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.</p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p>Other parameters passed onto methods.</p> </td></tr> <tr valign="top"><td><code>na_matches</code></td> <td> <p>Should <code>NA</code> and <code>NaN</code> values match one another? </p> <p>The default, <code>"na"</code>, treats two <code>NA</code> or <code>NaN</code> values as equal, like <code>%in%</code>, <code><a href="../../base/html/match.html">match()</a></code>, <code><a href="../../base/html/merge.html">merge()</a></code>. </p> <p>Use <code>"never"</code> to always treat two <code>NA</code> or <code>NaN</code> values as different, like joins for database sources, similarly to <code>merge(incomparables = FALSE)</code>.</p> </td></tr> <tr valign="top"><td><code>keep</code></td> <td> <p>Should the join keys from both <code>x</code> and <code>y</code> be preserved in the output? Only applies to <code>nest_join()</code>, <code>left_join()</code>, <code>right_join()</code>, and <code>full_join()</code>.</p> </td></tr> </table> <h3>Value</h3> <p>An object of the same type as <code>x</code>. The order of the rows and columns of <code>x</code> is preserved as much as possible. The output has the following properties: </p> <ul> <li><p> For <code>inner_join()</code>, a subset of <code>x</code> rows. For <code>left_join()</code>, all <code>x</code> rows. For <code>right_join()</code>, a subset of <code>x</code> rows, followed by unmatched <code>y</code> rows. For <code>full_join()</code>, all <code>x</code> rows, followed by unmatched <code>y</code> rows. </p> </li> <li><p> For all joins, rows will be duplicated if one or more rows in <code>x</code> matches multiple rows in <code>y</code>. </p> </li> <li><p> Output columns include all <code>x</code> columns and all <code>y</code> columns. If columns in <code>x</code> and <code>y</code> have the same name (and aren't included in <code>by</code>), <code>suffix</code>es are added to disambiguate. </p> </li> <li><p> Output columns included in <code>by</code> are coerced to common type across <code>x</code> and <code>y</code>. </p> </li> <li><p> Groups are taken from <code>x</code>. </p> </li></ul> <h3>Methods</h3> <p>These functions are <strong>generic</strong>s, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour. </p> <p>Methods available in currently loaded packages: </p> <ul> <li> <p><code>inner_join()</code>: no methods found. </p> </li> <li> <p><code>left_join()</code>: no methods found. </p> </li> <li> <p><code>right_join()</code>: no methods found. </p> </li> <li> <p><code>full_join()</code>: no methods found. </p> </li></ul> <h3>See Also</h3> <p>Other joins: <code><a href="filter-joins.html">filter-joins</a></code>, <code><a href="nest_join.html">nest_join</a>()</code> </p> <h3>Examples</h3> <pre> band_members %>% inner_join(band_instruments) band_members %>% left_join(band_instruments) band_members %>% right_join(band_instruments) band_members %>% full_join(band_instruments) # To suppress the message about joining variables, supply `by` band_members %>% inner_join(band_instruments, by = "name") # This is good practice in production code # Use a named `by` if the join variables have different names band_members %>% full_join(band_instruments2, by = c("name" = "artist")) # By default, the join keys from `x` and `y` are coalesced in the output; use # `keep = TRUE` to keep the join keys from both `x` and `y` band_members %>% full_join(band_instruments2, by = c("name" = "artist"), keep = TRUE) # If a row in `x` matches multiple rows in `y`, all the rows in `y` will be # returned once for each matching row in `x` df1 <- tibble(x = 1:3) df2 <- tibble(x = c(1, 1, 2), y = c("first", "second", "third")) df1 %>% left_join(df2) # By default, NAs match other NAs so that there are two # rows in the output of this join: df1 <- data.frame(x = c(1, NA), y = 2) df2 <- data.frame(x = c(1, NA), z = 3) left_join(df1, df2) # You can optionally request that NAs don't match, giving a # a result that more closely resembles SQL joins left_join(df1, df2, na_matches = "never") </pre> <hr /><div style="text-align: center;">[Package <em>dplyr</em> version 1.0.2 <a href="00Index.html">Index</a>]</div> </body></html>