EVOLUTION-MANAGER
Edit File: Encoding.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Read or Set the Declared Encodings for a Character Vector</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for Encoding {base}"><tr><td>Encoding {base}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Read or Set the Declared Encodings for a Character Vector</h2> <h3>Description</h3> <p>Read or set the declared encodings for a character vector. </p> <h3>Usage</h3> <pre> Encoding(x) Encoding(x) <- value enc2native(x) enc2utf8(x) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x</code></td> <td> <p>A character vector.</p> </td></tr> <tr valign="top"><td><code>value</code></td> <td> <p>A character vector of positive length.</p> </td></tr> </table> <h3>Details</h3> <p>Character strings in <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span> can be declared to be encoded in <code>"latin1"</code> or <code>"UTF-8"</code> or as <code>"bytes"</code>. These declarations can be read by <code>Encoding</code>, which will return a character vector of values <code>"latin1"</code>, <code>"UTF-8"</code> <code>"bytes"</code> or <code>"unknown"</code>, or set, when <code>value</code> is recycled as needed and other values are silently treated as <code>"unknown"</code>. ASCII strings will never be marked with a declared encoding, since their representation is the same in all supported encodings. Strings marked as <code>"bytes"</code> are intended to be non-ASCII strings which should be manipulated as bytes, and never converted to a character encoding (so writing them to a text file is not supported). </p> <p><code>enc2native</code> and <code>enc2utf8</code> convert elements of character vectors to the native encoding or UTF-8 respectively, taking any marked encoding into account. They are <a href="Primitive.html">primitive</a> functions, designed to do minimal copying. </p> <p>There are other ways for character strings to acquire a declared encoding apart from explicitly setting it (and these have changed as <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span> has evolved). Functions <code><a href="scan.html">scan</a></code>, <code><a href="../../utils/html/read.table.html">read.table</a></code>, <code><a href="readLines.html">readLines</a></code>, and <code><a href="parse.html">parse</a></code> have an <code>encoding</code> argument that is used to declare encodings, <code><a href="iconv.html">iconv</a></code> declares encodings from its <code>to</code> argument, and console input in suitable locales is also declared. <code><a href="utf8Conversion.html">intToUtf8</a></code> declares its output as <code>"UTF-8"</code>, and output text connections (see <code><a href="textconnections.html">textConnection</a></code>) are marked if running in a suitable locale. Under some circumstances (see its help page) <code><a href="source.html">source</a>(encoding=)</code> will mark encodings of character strings it outputs. </p> <p>Most character manipulation functions will set the encoding on output strings if it was declared on the corresponding input. These include <code><a href="chartr.html">chartr</a></code>, <code><a href="strsplit.html">strsplit</a>(useBytes = FALSE)</code>, <code><a href="chartr.html">tolower</a></code> and <code><a href="chartr.html">toupper</a></code> as well as <code><a href="grep.html">sub</a>(useBytes = FALSE)</code> and <code><a href="grep.html">gsub</a>(useBytes = FALSE)</code>. Note that such functions do not <em>preserve</em> the encoding, but if they know the input encoding and that the string has been successfully re-encoded (to the current encoding or UTF-8), they mark the output. </p> <p><code><a href="substr.html">substr</a></code> does preserve the encoding, and <code><a href="chartr.html">chartr</a></code>, <code><a href="chartr.html">tolower</a></code> and <code><a href="chartr.html">toupper</a></code> preserve UTF-8 encoding on systems with Unicode wide characters. With their <code>fixed</code> and <code>perl</code> options, <code><a href="strsplit.html">strsplit</a></code>, <code><a href="grep.html">sub</a></code> and <code>gsub</code> will give a marked UTF-8 result if any of the inputs are UTF-8. </p> <p><code><a href="paste.html">paste</a></code> and <code><a href="sprintf.html">sprintf</a></code> return elements marked as bytes if any of the corresponding inputs is marked as bytes, and otherwise marked as UTF-8 of any of the inputs is marked as UTF-8. </p> <p><code><a href="match.html">match</a></code>, <code><a href="pmatch.html">pmatch</a></code>, <code><a href="charmatch.html">charmatch</a></code>, <code><a href="duplicated.html">duplicated</a></code> and <code><a href="unique.html">unique</a></code> all match in UTF-8 if any of the elements are marked as UTF-8. </p> <p>There is some ambiguity as to what is meant by a ‘Latin-1’ locale, since some OSes (notably Windows) make use of character positions used for control characters in the ISO 8859-1 character set. How such characters are interpreted is system-dependent but as from <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span> 3.5.0 they are if possible interpreted as per Windows codepage 1252 (which Microsoft calls ‘Windows Latin 1 (ANSI)’) when converting to e.g. UTF-8. </p> <h3>Value</h3> <p>A character vector. </p> <p>For <code>enc2utf8</code> encodings are always marked: they are for <code>enc2native</code> in UTF-8 and Latin-1 locales. </p> <h3>Examples</h3> <pre> ## x is intended to be in latin1 x <- "fa\xE7ile" Encoding(x) Encoding(x) <- "latin1" x xx <- iconv(x, "latin1", "UTF-8") Encoding(c(x, xx)) c(x, xx) Encoding(xx) <- "bytes" xx # will be encoded in hex cat("xx = ", xx, "\n", sep = "") </pre> <hr /><div style="text-align: center;">[Package <em>base</em> version 3.6.0 <a href="00Index.html">Index</a>]</div> </body></html>