EVOLUTION-MANAGER
Edit File: data.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Data Sets</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for data {utils}"><tr><td>data {utils}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Data Sets</h2> <h3>Description</h3> <p>Loads specified data sets, or list the available data sets. </p> <h3>Usage</h3> <pre> data(..., list = character(), package = NULL, lib.loc = NULL, verbose = getOption("verbose"), envir = .GlobalEnv, overwrite = TRUE) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>...</code></td> <td> <p>literal character strings or names.</p> </td></tr> <tr valign="top"><td><code>list</code></td> <td> <p>a character vector.</p> </td></tr> <tr valign="top"><td><code>package</code></td> <td> <p>a character vector giving the package(s) to look in for data sets, or <code>NULL</code>. </p> <p>By default, all packages in the search path are used, then the ‘<span class="file">data</span>’ subdirectory (if present) of the current working directory. </p> </td></tr> <tr valign="top"><td><code>lib.loc</code></td> <td> <p>a character vector of directory names of <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span> libraries, or <code>NULL</code>. The default value of <code>NULL</code> corresponds to all libraries currently known.</p> </td></tr> <tr valign="top"><td><code>verbose</code></td> <td> <p>a logical. If <code>TRUE</code>, additional diagnostics are printed.</p> </td></tr> <tr valign="top"><td><code>envir</code></td> <td> <p>the <a href="../../base/html/environment.html">environment</a> where the data should be loaded.</p> </td></tr> <tr valign="top"><td><code>overwrite</code></td> <td> <p>logical: should existing objects of the same name in <span class="env">envir</span> be replaced?</p> </td></tr> </table> <h3>Details</h3> <p>Currently, four formats of data files are supported: </p> <ol> <li><p> files ending ‘<span class="file">.R</span>’ or ‘<span class="file">.r</span>’ are <code><a href="../../base/html/source.html">source</a>()</code>d in, with the <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span> working directory changed temporarily to the directory containing the respective file. (<code>data</code> ensures that the <span class="pkg">utils</span> package is attached, in case it had been run <em>via</em> <code>utils::data</code>.) </p> </li> <li><p> files ending ‘<span class="file">.RData</span>’ or ‘<span class="file">.rda</span>’ are <code><a href="../../base/html/load.html">load</a>()</code>ed. </p> </li> <li><p> files ending ‘<span class="file">.tab</span>’, ‘<span class="file">.txt</span>’ or ‘<span class="file">.TXT</span>’ are read using <code><a href="read.table.html">read.table</a>(..., header = TRUE, as.is=FALSE)</code>, and hence result in a data frame. </p> </li> <li><p> files ending ‘<span class="file">.csv</span>’ or ‘<span class="file">.CSV</span>’ are read using <code><a href="read.table.html">read.table</a>(..., header = TRUE, sep = ";", as.is=FALSE)</code>, and also result in a data frame. </p> </li></ol> <p>If more than one matching file name is found, the first on this list is used. (Files with extensions ‘<span class="file">.txt</span>’, ‘<span class="file">.tab</span>’ or ‘<span class="file">.csv</span>’ can be compressed, with or without further extension ‘<span class="file">.gz</span>’, ‘<span class="file">.bz2</span>’ or ‘<span class="file">.xz</span>’.) </p> <p>The data sets to be loaded can be specified as a set of character strings or names, or as the character vector <code>list</code>, or as both. </p> <p>For each given data set, the first two types (‘<span class="file">.R</span>’ or ‘<span class="file">.r</span>’, and ‘<span class="file">.RData</span>’ or ‘<span class="file">.rda</span>’ files) can create several variables in the load environment, which might all be named differently from the data set. The third and fourth types will always result in the creation of a single variable with the same name (without extension) as the data set. </p> <p>If no data sets are specified, <code>data</code> lists the available data sets. It looks for a new-style data index in the ‘<span class="file">Meta</span>’ or, if this is not found, an old-style ‘<span class="file">00Index</span>’ file in the ‘<span class="file">data</span>’ directory of each specified package, and uses these files to prepare a listing. If there is a ‘<span class="file">data</span>’ area but no index, available data files for loading are computed and included in the listing, and a warning is given: such packages are incomplete. The information about available data sets is returned in an object of class <code>"packageIQR"</code>. The structure of this class is experimental. Where the datasets have a different name from the argument that should be used to retrieve them the index will have an entry like <code>beaver1 (beavers)</code> which tells us that dataset <code>beaver1</code> can be retrieved by the call <code>data(beaver)</code>. </p> <p>If <code>lib.loc</code> and <code>package</code> are both <code>NULL</code> (the default), the data sets are searched for in all the currently loaded packages then in the ‘<span class="file">data</span>’ directory (if any) of the current working directory. </p> <p>If <code>lib.loc = NULL</code> but <code>package</code> is specified as a character vector, the specified package(s) are searched for first amongst loaded packages and then in the default library/ies (see <code><a href="../../base/html/libPaths.html">.libPaths</a></code>). </p> <p>If <code>lib.loc</code> <em>is</em> specified (and not <code>NULL</code>), packages are searched for in the specified library/ies, even if they are already loaded from another library. </p> <p>To just look in the ‘<span class="file">data</span>’ directory of the current working directory, set <code>package = character(0)</code> (and <code>lib.loc = NULL</code>, the default). </p> <h3>Value</h3> <p>A character vector of all data sets specified (whether found or not), or information about all available data sets in an object of class <code>"packageIQR"</code> if none were specified. </p> <h3>Good practice</h3> <p>There is no requirement for <code>data(<var>foo</var>)</code> to create an object named <code><var>foo</var></code> (nor to create one object), although it much reduces confusion if this convention is followed (and it is enforced if datasets are lazy-loaded). </p> <p><code>data()</code> was originally intended to allow users to load datasets from packages for use in their examples, and as such it loaded the datasets into the workspace <code><a href="../../base/html/environment.html">.GlobalEnv</a></code>. This avoided having large datasets in memory when not in use: that need has been almost entirely superseded by lazy-loading of datasets. </p> <p>The ability to specify a dataset by name (without quotes) is a convenience: in programming the datasets should be specified by character strings (with quotes). </p> <p>Use of <code>data</code> within a function without an <code>envir</code> argument has the almost always undesirable side-effect of putting an object in the user's workspace (and indeed, of replacing any object of that name already there). It would almost always be better to put the object in the current evaluation environment by <code>data(..., envir = environment())</code>. However, two alternatives are usually preferable, both described in the ‘Writing R Extensions’ manual. </p> <ul> <li><p> For sets of data, set up a package to use lazy-loading of data. </p> </li> <li><p> For objects which are system data, for example lookup tables used in calculations within the function, use a file ‘<span class="file">R/sysdata.rda</span>’ in the package sources or create the objects by <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span> code at package installation time. </p> </li></ul> <p>A sometimes important distinction is that the second approach places objects in the namespace but the first does not. So if it is important that the function sees <code>mytable</code> as an object from the package, it is system data and the second approach should be used. In the unusual case that a package uses a lazy-loaded dataset as a default argument to a function, that needs to be specified by <code><a href="../../base/html/ns-dblcolon.html">::</a></code>, e.g., <code>survival::survexp.us</code>. </p> <h3>Warning</h3> <p>This function creates objects in the <code>envir</code> environment (by default the user's workspace) replacing any which already existed. <code>data("foo")</code> can silently create objects other than <code>foo</code>: there have been instances in published packages where it created/replaced <code><a href="../../base/html/Random.html">.Random.seed</a></code> and hence change the seed for the session. </p> <h3>Note</h3> <p>One can take advantage of the search order and the fact that a ‘<span class="file">.R</span>’ file will change directory. If raw data are stored in ‘<span class="file">mydata.txt</span>’ then one can set up ‘<span class="file">mydata.R</span>’ to read ‘<span class="file">mydata.txt</span>’ and pre-process it, e.g., using <code>transform</code>. For instance one can convert numeric vectors to factors with the appropriate labels. Thus, the ‘<span class="file">.R</span>’ file can effectively contain a metadata specification for the plaintext formats. </p> <h3>See Also</h3> <p><code><a href="help.html">help</a></code> for obtaining documentation on data sets, <code><a href="../../base/html/save.html">save</a></code> for <em>creating</em> the second (‘<span class="file">.rda</span>’) kind of data, typically the most efficient one. </p> <p>The ‘Writing R Extensions’ for considerations in preparing the ‘<span class="file">data</span>’ directory of a package. </p> <h3>Examples</h3> <pre> require(utils) data() # list all available data sets try(data(package = "rpart") ) # list the data sets in the rpart package data(USArrests, "VADeaths") # load the data sets 'USArrests' and 'VADeaths' ## Not run: ## Alternatively ds <- c("USArrests", "VADeaths"); data(list = ds) ## End(Not run) help(USArrests) # give information on data set 'USArrests' </pre> <hr /><div style="text-align: center;">[Package <em>utils</em> version 3.6.0 <a href="00Index.html">Index</a>]</div> </body></html>