EVOLUTION-MANAGER

Edit File: read_excel.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Read xls and xlsx files</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="R.css" />
</head><body>

<table width="100%" summary="page for read_excel {readxl}"><tr><td>read_excel {readxl}</td><td style="text-align: right;">R Documentation</td></tr></table>

<h2>Read xls and xlsx files</h2>

<h3>Description</h3>

<p>Read xls and xlsx files
</p>
<p><code>read_excel()</code> calls <code><a href="excel_format.html">excel_format()</a></code> to determine if <code>path</code> is xls or xlsx,
based on the file extension and the file itself, in that order. Use
<code>read_xls()</code> and <code>read_xlsx()</code> directly if you know better and want to
prevent such guessing.
</p>

<h3>Usage</h3>

<pre>
read_excel(path, sheet = NULL, range = NULL, col_names = TRUE,
  col_types = NULL, na = "", trim_ws = TRUE, skip = 0,
  n_max = Inf, guess_max = min(1000, n_max),
  progress = readxl_progress(), .name_repair = "unique")

read_xls(path, sheet = NULL, range = NULL, col_names = TRUE,
  col_types = NULL, na = "", trim_ws = TRUE, skip = 0,
  n_max = Inf, guess_max = min(1000, n_max),
  progress = readxl_progress(), .name_repair = "unique")

read_xlsx(path, sheet = NULL, range = NULL, col_names = TRUE,
  col_types = NULL, na = "", trim_ws = TRUE, skip = 0,
  n_max = Inf, guess_max = min(1000, n_max),
  progress = readxl_progress(), .name_repair = "unique")
</pre>

<h3>Arguments</h3>

<table summary="R argblock">
<tr valign="top"><td><code>path</code></td>
<td>
<p>Path to the xls/xlsx file.</p>
</td></tr>
<tr valign="top"><td><code>sheet</code></td>
<td>
<p>Sheet to read. Either a string (the name of a sheet), or an
integer (the position of the sheet). Ignored if the sheet is specified via
<code>range</code>. If neither argument specifies the sheet, defaults to the first
sheet.</p>
</td></tr>
<tr valign="top"><td><code>range</code></td>
<td>
<p>A cell range to read from, as described in <a href="cell-specification.html">cell-specification</a>.
Includes typical Excel ranges like &quot;B3:D87&quot;, possibly including the sheet
name like &quot;Budget!B2:G14&quot;, and more. Interpreted strictly, even if the
range forces the inclusion of leading or trailing empty rows or columns.
Takes precedence over <code>skip</code>, <code>n_max</code> and <code>sheet</code>.</p>
</td></tr>
<tr valign="top"><td><code>col_names</code></td>
<td>
<p><code>TRUE</code> to use the first row as column names, <code>FALSE</code> to get
default names, or a character vector giving a name for each column. If user
provides <code>col_types</code> as a vector, <code>col_names</code> can have one entry per
column, i.e. have the same length as <code>col_types</code>, or one entry per
unskipped column.</p>
</td></tr>
<tr valign="top"><td><code>col_types</code></td>
<td>
<p>Either <code>NULL</code> to guess all from the spreadsheet or a
character vector containing one entry per column from these options:
&quot;skip&quot;, &quot;guess&quot;, &quot;logical&quot;, &quot;numeric&quot;, &quot;date&quot;, &quot;text&quot; or &quot;list&quot;. If exactly
one <code>col_type</code> is specified, it will be recycled. The content of a cell in
a skipped column is never read and that column will not appear in the data
frame output. A list cell loads a column as a list of length 1 vectors,
which are typed using the type guessing logic from <code>col_types = NULL</code>, but
on a cell-by-cell basis.</p>
</td></tr>
<tr valign="top"><td><code>na</code></td>
<td>
<p>Character vector of strings to interpret as missing values. By
default, readxl treats blank cells as missing data.</p>
</td></tr>
<tr valign="top"><td><code>trim_ws</code></td>
<td>
<p>Should leading and trailing whitespace be trimmed?</p>
</td></tr>
<tr valign="top"><td><code>skip</code></td>
<td>
<p>Minimum number of rows to skip before reading anything, be it
column names or data. Leading empty rows are automatically skipped, so this
is a lower bound. Ignored if <code>range</code> is given.</p>
</td></tr>
<tr valign="top"><td><code>n_max</code></td>
<td>
<p>Maximum number of data rows to read. Trailing empty rows are
automatically skipped, so this is an upper bound on the number of rows in
the returned tibble. Ignored if <code>range</code> is given.</p>
</td></tr>
<tr valign="top"><td><code>guess_max</code></td>
<td>
<p>Maximum number of data rows to use for guessing column
types.</p>
</td></tr>
<tr valign="top"><td><code>progress</code></td>
<td>
<p>Display a progress spinner? By default, the spinner appears
only in an interactive session, outside the context of knitting a document,
and when the call is likely to run for several seconds or more. See
<code><a href="readxl_progress.html">readxl_progress()</a></code> for more details.</p>
</td></tr>
<tr valign="top"><td><code>.name_repair</code></td>
<td>
<p>Handling of column names. By default, readxl ensures
column names are not empty and are unique. If the tibble package version is
recent enough, there is full support for <code>.name_repair</code> as documented in
<code><a href="../../tibble/html/tibble.html">tibble::tibble()</a></code>. If an older version of tibble is present, readxl falls
back to name repair in the style of tibble v1.4.2.</p>
</td></tr>
</table>

<h3>Value</h3>

<p>A <a href="../../tibble/html/tibble-package.html">tibble</a>
</p>

<p><a href="cell-specification.html">cell-specification</a> for more details on targetting cells with the
<code>range</code> argument
</p>

<h3>Examples</h3>

<pre>
datasets &lt;- readxl_example("datasets.xlsx")
read_excel(datasets)

# Specify sheet either by position or by name
read_excel(datasets, 2)
read_excel(datasets, "mtcars")

# Skip rows and use default column names
read_excel(datasets, skip = 148, col_names = FALSE)

# Recycle a single column type
read_excel(datasets, col_types = "text")

# Specify some col_types and guess others
read_excel(datasets, col_types = c("text", "guess", "numeric", "guess", "guess"))

# Accomodate a column with disparate types via col_type = "list"
df &lt;- read_excel(readxl_example("clippy.xlsx"), col_types = c("text", "list"))
df
df$value
sapply(df$value, class)

# Limit the number of data rows read
read_excel(datasets, n_max = 3)

# Read from an Excel range using A1 or R1C1 notation
read_excel(datasets, range = "C1:E7")
read_excel(datasets, range = "R1C2:R2C5")

# Specify the sheet as part of the range
read_excel(datasets, range = "mtcars!B1:D5")

# Read only specific rows or columns
read_excel(datasets, range = cell_rows(102:151), col_names = FALSE)
read_excel(datasets, range = cell_cols("B:D"))

# Get a preview of column names
names(read_excel(readxl_example("datasets.xlsx"), n_max = 0))

if (utils::packageVersion("tibble") &gt; "1.4.2") {
  ## exploit full .name_repair flexibility from tibble

## "universal" names are unique and syntactic
  read_excel(
    readxl_example("deaths.xlsx"),
    range = "arts!A5:F15",
    .name_repair = "universal"
  )

## specify name repair as a built-in function
  read_excel(readxl_example("clippy.xlsx"), .name_repair = toupper)

## specify name repair as a custom function
  my_custom_name_repair &lt;- function(nms) tolower(gsub("[.]", "_", nms))
  read_excel(
    readxl_example("datasets.xlsx"),
    .name_repair = my_custom_name_repair
  )

## specify name repair as an anonymous function
  read_excel(
    readxl_example("datasets.xlsx"),
    sheet = "chickwts",
    .name_repair = ~ substr(.x, start = 1, stop = 3)
  )
}
</pre>

<hr /><div style="text-align: center;">[Package <em>readxl</em> version 1.3.1 <a href="00Index.html">Index</a>]</div>
</body></html>