EVOLUTION-MANAGER
Edit File: as_utf8_character.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Coerce to a character vector and attempt encoding conversion</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for as_utf8_character {rlang}"><tr><td>as_utf8_character {rlang}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Coerce to a character vector and attempt encoding conversion</h2> <h3>Description</h3> <p><a href="https://lifecycle.r-lib.org/articles/stages.html#experimental"><img src="../help/figures/lifecycle-experimental.svg" alt='[Experimental]' /></a> </p> <p>Unlike specifying the <code>encoding</code> argument in <code>as_string()</code> and <code>as_character()</code>, which is only declarative, these functions actually attempt to convert the encoding of their input. There are two possible cases: </p> <ul> <li><p> The string is tagged as UTF-8 or latin1, the only two encodings for which R has specific support. In this case, converting to the same encoding is a no-op, and converting to native always works as expected, as long as the native encoding, the one specified by the <code>LC_CTYPE</code> locale has support for all characters occurring in the strings. Unrepresentable characters are serialised as unicode points: "<U+xxxx>". </p> </li> <li><p> The string is not tagged. R assumes that it is encoded in the native encoding. Conversion to native is a no-op, and conversion to UTF-8 should work as long as the string is actually encoded in the locale codeset. </p> </li></ul> <p>When translating to UTF-8, the strings are parsed for serialised unicode points (e.g. strings looking like "U+xxxx") with <code><a href="chr_unserialise_unicode.html">chr_unserialise_unicode()</a></code>. This helps to alleviate the effects of character-to-symbol-to-character roundtrips on systems with non-UTF-8 native encoding. </p> <h3>Usage</h3> <pre> as_utf8_character(x) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x</code></td> <td> <p>An object to coerce.</p> </td></tr> </table> <h3>Examples</h3> <pre> # Let's create a string marked as UTF-8 (which is guaranteed by the # Unicode escaping in the string): utf8 <- "caf\uE9" Encoding(utf8) charToRaw(utf8) </pre> <hr /><div style="text-align: center;">[Package <em>rlang</em> version 1.0.6 <a href="00Index.html">Index</a>]</div> </body></html>