EVOLUTION-MANAGER
Edit File: substr_ctl.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: ANSI Control Sequence Aware Version of substr</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for substr_ctl {fansi}"><tr><td>substr_ctl {fansi}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>ANSI Control Sequence Aware Version of substr</h2> <h3>Description</h3> <p><code>substr_ctl</code> is a drop-in replacement for <code>substr</code>. Performance is slightly slower than <code>substr</code>. ANSI CSI SGR sequences will be included in the substrings to reflect the format of the substring when it was embedded in the source string. Additionally, other <em>Control Sequences</em> specified in <code>ctl</code> are treated as zero-width. </p> <h3>Usage</h3> <pre> substr_ctl(x, start, stop, warn = getOption("fansi.warn"), term.cap = getOption("fansi.term.cap"), ctl = "all") substr2_ctl(x, start, stop, type = "chars", round = "start", tabs.as.spaces = getOption("fansi.tabs.as.spaces"), tab.stops = getOption("fansi.tab.stops"), warn = getOption("fansi.warn"), term.cap = getOption("fansi.term.cap"), ctl = "all") substr_sgr(x, start, stop, warn = getOption("fansi.warn"), term.cap = getOption("fansi.term.cap")) substr2_sgr(x, start, stop, type = "chars", round = "start", tabs.as.spaces = getOption("fansi.tabs.as.spaces"), tab.stops = getOption("fansi.tab.stops"), warn = getOption("fansi.warn"), term.cap = getOption("fansi.term.cap")) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x</code></td> <td> <p>a character vector or object that can be coerced to character.</p> </td></tr> <tr valign="top"><td><code>start</code></td> <td> <p>integer. The first element to be replaced.</p> </td></tr> <tr valign="top"><td><code>stop</code></td> <td> <p>integer. The last element to be replaced.</p> </td></tr> <tr valign="top"><td><code>warn</code></td> <td> <p>TRUE (default) or FALSE, whether to warn when potentially problematic <em>Control Sequences</em> are encountered. These could cause the assumptions <code>fansi</code> makes about how strings are rendered on your display to be incorrect, for example by moving the cursor (see <a href="fansi.html">fansi</a>).</p> </td></tr> <tr valign="top"><td><code>term.cap</code></td> <td> <p>character a vector of the capabilities of the terminal, can be any combination "bright" (SGR codes 90-97, 100-107), "256" (SGR codes starting with "38;5" or "48;5"), and "truecolor" (SGR codes starting with "38;2" or "48;2"). Changing this parameter changes how <code>fansi</code> interprets escape sequences, so you should ensure that it matches your terminal capabilities. See <a href="term_cap_test.html">term_cap_test</a> for details.</p> </td></tr> <tr valign="top"><td><code>ctl</code></td> <td> <p>character, which <em>Control Sequences</em> should be treated specially. See the "_ctl vs. _sgr" section for details. </p> <ul> <li><p> "nl": newlines. </p> </li> <li><p> "c0": all other "C0" control characters (i.e. 0x01-0x1f, 0x7F), except for newlines and the actual ESC (0x1B) character. </p> </li> <li><p> "sgr": ANSI CSI SGR sequences. </p> </li> <li><p> "csi": all non-SGR ANSI CSI sequences. </p> </li> <li><p> "esc": all other escape sequences. </p> </li> <li><p> "all": all of the above, except when used in combination with any of the above, in which case it means "all but". </p> </li></ul> </td></tr> <tr valign="top"><td><code>type</code></td> <td> <p>character(1L) partial matching <code>c("chars", "width")</code>, although <code>type="width"</code> only works correctly with R >= 3.2.2.</p> </td></tr> <tr valign="top"><td><code>round</code></td> <td> <p>character(1L) partial matching <code>c("start", "stop", "both", "neither")</code>, controls how to resolve ambiguities when a <code>start</code> or <code>stop</code> value in "width" <code>type</code> mode falls within a multi-byte character or a wide display character. See details.</p> </td></tr> <tr valign="top"><td><code>tabs.as.spaces</code></td> <td> <p>FALSE (default) or TRUE, whether to convert tabs to spaces. This can only be set to TRUE if <code>strip.spaces</code> is FALSE.</p> </td></tr> <tr valign="top"><td><code>tab.stops</code></td> <td> <p>integer(1:n) indicating position of tab stops to use when converting tabs to spaces. If there are more tabs in a line than defined tab stops the last tab stop is re-used. For the purposes of applying tab stops, each input line is considered a line and the character count begins from the beginning of the input line.</p> </td></tr> </table> <h3>Details</h3> <p><code>substr2_ctl</code> and <code>substr2_sgr</code> add the ability to retrieve substrings based on display width, and byte width in addition to the normal character width. <code>substr2_ctl</code> also provides the option to convert tabs to spaces with <a href="tabs_as_spaces.html">tabs_as_spaces</a> prior to taking substrings. </p> <p>Because exact substrings on anything other than character width cannot be guaranteed (e.g. as a result of multi-byte encodings, or double display-width characters) <code>substr2_ctl</code> must make assumptions on how to resolve provided <code>start</code>/<code>stop</code> values that are infeasible and does so via the <code>round</code> parameter. </p> <p>If we use "start" as the <code>round</code> value, then any time the <code>start</code> value corresponds to the middle of a multi-byte or a wide character, then that character is included in the substring, while any similar partially included character via the <code>stop</code> is left out. The converse is true if we use "stop" as the <code>round</code> value. "neither" would cause all partial characters to be dropped irrespective whether they correspond to <code>start</code> or <code>stop</code>, and "both" could cause all of them to be included. </p> <p>These functions map string lengths accounting for ANSI CSI SGR sequence semantics to the naive length calculations, and then use the mapping in conjunction with <code><a href="../../base/html/substr.html">base::substr()</a></code> to extract the string. This concept is borrowed directly from Gábor Csárdi's <code>crayon</code> package, although the implementation of the calculation is different. </p> <h3>_ctl vs. _sgr</h3> <p>The <code>*_ctl</code> versions of the functions treat all <em>Control Sequences</em> specially by default. Special treatment is context dependent, and may include detecting them and/or computing their display/character width as zero. For the SGR subset of the ANSI CSI sequences, <code>fansi</code> will also parse, interpret, and reapply the text styles they encode if needed. You can modify whether a <em>Control Sequence</em> is treated specially with the <code>ctl</code> parameter. You can exclude a type of <em>Control Sequence</em> from special treatment by combining "all" with that type of sequence (e.g. <code>ctl=c("all", "nl")</code> for special treatment of all <em>Control Sequences</em> <strong>but</strong> newlines). The <code>*_sgr</code> versions only treat ANSI CSI SGR sequences specially, and are equivalent to the <code>*_ctl</code> versions with the <code>ctl</code> parameter set to "sgr". </p> <h3>Note</h3> <p>Non-ASCII strings are converted to and returned in UTF-8 encoding. </p> <h3>See Also</h3> <p><a href="fansi.html">fansi</a> for details on how <em>Control Sequences</em> are interpreted, particularly if you are getting unexpected results. </p> <h3>Examples</h3> <pre> substr_ctl("\033[42mhello\033[m world", 1, 9) substr_ctl("\033[42mhello\033[m world", 3, 9) ## Width 2 and 3 are in the middle of an ideogram as ## start and stop positions respectively, so we control ## what we get with `round` cn.string <- paste0("\033[42m", "\u4E00\u4E01\u4E03", "\033[m") substr2_ctl(cn.string, 2, 3, type='width') substr2_ctl(cn.string, 2, 3, type='width', round='both') substr2_ctl(cn.string, 2, 3, type='width', round='start') substr2_ctl(cn.string, 2, 3, type='width', round='stop') ## the _sgr variety only treat as special CSI SGR, ## compare the following: substr_sgr("\033[31mhello\tworld", 1, 6) substr_ctl("\033[31mhello\tworld", 1, 6) substr_ctl("\033[31mhello\tworld", 1, 6, ctl=c('all', 'c0')) </pre> <hr /><div style="text-align: center;">[Package <em>fansi</em> version 0.4.1 <a href="00Index.html">Index</a>]</div> </body></html>