EVOLUTION-MANAGER
Edit File: ddply.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Split data frame, apply function, and return results in a...</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for ddply {plyr}"><tr><td>ddply {plyr}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Split data frame, apply function, and return results in a data frame.</h2> <h3>Description</h3> <p>For each subset of a data frame, apply function then combine results into a data frame. To apply a function for each row, use <code><a href="adply.html">adply</a></code> with <code>.margins</code> set to <code>1</code>. </p> <h3>Usage</h3> <pre> ddply( .data, .variables, .fun = NULL, ..., .progress = "none", .inform = FALSE, .drop = TRUE, .parallel = FALSE, .paropts = NULL ) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>.data</code></td> <td> <p>data frame to be processed</p> </td></tr> <tr valign="top"><td><code>.variables</code></td> <td> <p>variables to split data frame by, as <code><a href="as.quoted.html">as.quoted</a></code> variables, a formula or character vector</p> </td></tr> <tr valign="top"><td><code>.fun</code></td> <td> <p>function to apply to each piece</p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p>other arguments passed on to <code>.fun</code></p> </td></tr> <tr valign="top"><td><code>.progress</code></td> <td> <p>name of the progress bar to use, see <code><a href="create_progress_bar.html">create_progress_bar</a></code></p> </td></tr> <tr valign="top"><td><code>.inform</code></td> <td> <p>produce informative error messages? This is turned off by default because it substantially slows processing speed, but is very useful for debugging</p> </td></tr> <tr valign="top"><td><code>.drop</code></td> <td> <p>should combinations of variables that do not appear in the input data be preserved (FALSE) or dropped (TRUE, default)</p> </td></tr> <tr valign="top"><td><code>.parallel</code></td> <td> <p>if <code>TRUE</code>, apply function in parallel, using parallel backend provided by foreach</p> </td></tr> <tr valign="top"><td><code>.paropts</code></td> <td> <p>a list of additional options passed into the <code><a href="../../foreach/html/foreach.html">foreach</a></code> function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the <code>.export</code> and <code>.packages</code> arguments to supply them so that all cluster nodes have the correct environment set up for computing.</p> </td></tr> </table> <h3>Value</h3> <p>A data frame, as described in the output section. </p> <h3>Input</h3> <p>This function splits data frames by variables. </p> <h3>Output</h3> <p>The most unambiguous behaviour is achieved when <code>.fun</code> returns a data frame - in that case pieces will be combined with <code><a href="rbind.fill.html">rbind.fill</a></code>. If <code>.fun</code> returns an atomic vector of fixed length, it will be <code>rbind</code>ed together and converted to a data frame. Any other values will result in an error. </p> <p>If there are no results, then this function will return a data frame with zero rows and columns (<code>data.frame()</code>). </p> <h3>References</h3> <p>Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. <a href="https://www.jstatsoft.org/v40/i01/">https://www.jstatsoft.org/v40/i01/</a>. </p> <h3>See Also</h3> <p><code><a href="../../base/html/tapply.html">tapply</a></code> for similar functionality in the base package </p> <p>Other data frame input: <code><a href="d_ply.html">d_ply</a>()</code>, <code><a href="daply.html">daply</a>()</code>, <code><a href="dlply.html">dlply</a>()</code> </p> <p>Other data frame output: <code><a href="adply.html">adply</a>()</code>, <code><a href="ldply.html">ldply</a>()</code>, <code><a href="mdply.html">mdply</a>()</code> </p> <h3>Examples</h3> <pre> # Summarize a dataset by two variables dfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)), sex = sample(c("M", "F"), size = 29, replace = TRUE), age = runif(n = 29, min = 18, max = 54) ) # Note the use of the '.' function to allow # group and sex to be used without quoting ddply(dfx, .(group, sex), summarize, mean = round(mean(age), 2), sd = round(sd(age), 2)) # An example using a formula for .variables ddply(baseball[1:100,], ~ year, nrow) # Applying two functions; nrow and ncol ddply(baseball, .(lg), c("nrow", "ncol")) # Calculate mean runs batted in for each year rbi <- ddply(baseball, .(year), summarise, mean_rbi = mean(rbi, na.rm = TRUE)) # Plot a line chart of the result plot(mean_rbi ~ year, type = "l", data = rbi) # make new variable career_year based on the # start year for each player (id) base2 <- ddply(baseball, .(id), mutate, career_year = year - min(year) + 1 ) </pre> <hr /><div style="text-align: center;">[Package <em>plyr</em> version 1.8.7 <a href="00Index.html">Index</a>]</div> </body></html>