EVOLUTION-MANAGER
Edit File: stri_split_boundaries.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Split a String at Text Boundaries</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for stri_split_boundaries {stringi}"><tr><td>stri_split_boundaries {stringi}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Split a String at Text Boundaries</h2> <h3>Description</h3> <p>This function locates text boundaries (like character, word, line, or sentence boundaries) and splits strings at the indicated positions. </p> <h3>Usage</h3> <pre> stri_split_boundaries( str, n = -1L, tokens_only = FALSE, simplify = FALSE, ..., opts_brkiter = NULL ) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>str</code></td> <td> <p>character vector or an object coercible to</p> </td></tr> <tr valign="top"><td><code>n</code></td> <td> <p>integer vector, maximal number of strings to return</p> </td></tr> <tr valign="top"><td><code>tokens_only</code></td> <td> <p>single logical value; may affect the result if <code>n</code> is positive, see Details</p> </td></tr> <tr valign="top"><td><code>simplify</code></td> <td> <p>single logical value; if <code>TRUE</code> or <code>NA</code>, then a character matrix is returned; otherwise (the default), a list of character vectors is given, see Value</p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p>additional settings for <code>opts_brkiter</code></p> </td></tr> <tr valign="top"><td><code>opts_brkiter</code></td> <td> <p>a named list with <span class="pkg">ICU</span> BreakIterator's settings, see <code><a href="stri_opts_brkiter.html">stri_opts_brkiter</a></code>; <code>NULL</code> for the default break iterator, i.e., <code>line_break</code></p> </td></tr> </table> <h3>Details</h3> <p>Vectorized over <code>str</code> and <code>n</code>. </p> <p>If <code>n</code> is negative (the default), then all text pieces are extracted. </p> <p>Otherwise, if <code>tokens_only</code> is <code>FALSE</code> (this is the default, for compatibility with the <span class="pkg">stringr</span> package), then <code>n-1</code> tokens are extracted (if possible) and the <code>n</code>-th string gives the (non-split) remainder (see Examples). On the other hand, if <code>tokens_only</code> is <code>TRUE</code>, then only full tokens (up to <code>n</code> pieces) are extracted. </p> <p>For more information on text boundary analysis performed by <span class="pkg">ICU</span>'s <code>BreakIterator</code>, see <a href="stringi-search-boundaries.html">stringi-search-boundaries</a>. </p> <h3>Value</h3> <p>If <code>simplify=FALSE</code> (the default), then the functions return a list of character vectors. </p> <p>Otherwise, <code><a href="stri_list2matrix.html">stri_list2matrix</a></code> with <code>byrow=TRUE</code> and <code>n_min=n</code> arguments is called on the resulting object. In such a case, a character matrix with <code>length(str)</code> rows is returned. Note that <code><a href="stri_list2matrix.html">stri_list2matrix</a></code>'s <code>fill</code> argument is set to an empty string and <code>NA</code>, for <code>simplify</code> equal to <code>TRUE</code> and <code>NA</code>, respectively. </p> <h3>See Also</h3> <p>Other search_split: <code><a href="stri_split_lines.html">stri_split_lines</a>()</code>, <code><a href="stri_split.html">stri_split</a>()</code>, <code><a href="stringi-search.html">stringi-search</a></code> </p> <p>Other locale_sensitive: <code><a href="oper_comparison.html">%s<%</a>()</code>, <code><a href="stri_compare.html">stri_compare</a>()</code>, <code><a href="stri_count_boundaries.html">stri_count_boundaries</a>()</code>, <code><a href="stri_duplicated.html">stri_duplicated</a>()</code>, <code><a href="stri_enc_detect2.html">stri_enc_detect2</a>()</code>, <code><a href="stri_extract_boundaries.html">stri_extract_all_boundaries</a>()</code>, <code><a href="stri_locate_boundaries.html">stri_locate_all_boundaries</a>()</code>, <code><a href="stri_opts_collator.html">stri_opts_collator</a>()</code>, <code><a href="stri_order.html">stri_order</a>()</code>, <code><a href="stri_sort.html">stri_sort</a>()</code>, <code><a href="stri_trans_casemap.html">stri_trans_tolower</a>()</code>, <code><a href="stri_unique.html">stri_unique</a>()</code>, <code><a href="stri_wrap.html">stri_wrap</a>()</code>, <code><a href="stringi-locale.html">stringi-locale</a></code>, <code><a href="stringi-search-boundaries.html">stringi-search-boundaries</a></code>, <code><a href="stringi-search-coll.html">stringi-search-coll</a></code> </p> <p>Other text_boundaries: <code><a href="stri_count_boundaries.html">stri_count_boundaries</a>()</code>, <code><a href="stri_extract_boundaries.html">stri_extract_all_boundaries</a>()</code>, <code><a href="stri_locate_boundaries.html">stri_locate_all_boundaries</a>()</code>, <code><a href="stri_opts_brkiter.html">stri_opts_brkiter</a>()</code>, <code><a href="stri_split_lines.html">stri_split_lines</a>()</code>, <code><a href="stri_trans_casemap.html">stri_trans_tolower</a>()</code>, <code><a href="stri_wrap.html">stri_wrap</a>()</code>, <code><a href="stringi-search-boundaries.html">stringi-search-boundaries</a></code>, <code><a href="stringi-search.html">stringi-search</a></code> </p> <h3>Examples</h3> <pre> test <- "The\u00a0above-mentioned features are very useful. " %s+% "Kudos to their developers. 123 456 789" stri_split_boundaries(test, type="line") stri_split_boundaries(test, type="word") stri_split_boundaries(test, type="word", skip_word_none=TRUE) stri_split_boundaries(test, type="word", skip_word_none=TRUE, skip_word_letter=TRUE) stri_split_boundaries(test, type="word", skip_word_none=TRUE, skip_word_number=TRUE) stri_split_boundaries(test, type="sentence") stri_split_boundaries(test, type="sentence", skip_sentence_sep=TRUE) stri_split_boundaries(test, type="character") # a filtered break iterator with the new ICU: stri_split_boundaries("Mr. Jones and Mrs. Brown are very happy. So am I, Prof. Smith.", type="sentence", locale="en_US@ss=standard") # ICU >= 56 only </pre> <hr /><div style="text-align: center;">[Package <em>stringi</em> version 1.4.6 <a href="00Index.html">Index</a>]</div> </body></html>