EVOLUTION-MANAGER
Edit File: stringi-search-boundaries.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Text Boundary Analysis in 'stringi'</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for stringi-search-boundaries {stringi}"><tr><td>stringi-search-boundaries {stringi}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Text Boundary Analysis in <span class="pkg">stringi</span></h2> <h3>Description</h3> <p>Text boundary analysis is the process of locating linguistic boundaries while formatting and handling text. </p> <h3>Details</h3> <p>Examples of the boundary analysis process include: </p> <ul> <li><p> Locating positions to word-wrap text to fit within specific margins while displaying or printing, see <code><a href="stri_wrap.html">stri_wrap</a></code> and <code><a href="stri_split_boundaries.html">stri_split_boundaries</a></code>. </p> </li> <li><p> Counting characters, words, sentences, or paragraphs, see <code><a href="stri_count_boundaries.html">stri_count_boundaries</a></code>. </p> </li> <li><p> Making a list of the unique words in a document, see <code><a href="stri_extract_boundaries.html">stri_extract_all_words</a></code> and then <code><a href="stri_unique.html">stri_unique</a></code>. </p> </li> <li><p> Capitalizing the first letter of each word or sentence, see also <code><a href="stri_trans_casemap.html">stri_trans_totitle</a></code>. </p> </li> <li><p> Locating a particular unit of the text (for example, finding the third word in the document), see <code><a href="stri_locate_boundaries.html">stri_locate_all_boundaries</a></code>. </p> </li></ul> <p>Generally, text boundary analysis is a locale-dependent operation. For example, in Japanese and Chinese one does not separate words with spaces - a line break can occur even in the middle of a word. These languages have punctuation and diacritical marks that cannot start or end a line, so this must also be taken into account. </p> <p><span class="pkg">stringi</span> uses <span class="pkg">ICU</span>'s <code>BreakIterator</code> to locate specific text boundaries. Note that the <code>BreakIterator</code>'s behavior may be controlled in come cases, see <code><a href="stri_opts_brkiter.html">stri_opts_brkiter</a></code>. </p> <ul> <li><p> The <code>character</code> boundary iterator tries to match what a user would think of as a “character” – a basic unit of a writing system for a language – which may be more than just a single Unicode code point. </p> </li> <li><p> The <code>word</code> boundary iterator locates the boundaries of words, for purposes such as “Find whole words” operations. </p> </li> <li><p> The <code>line_break</code> iterator locates positions that would be appropriate to wrap lines when displaying the text. </p> </li> <li><p> The break iterator of type <code>sentence</code> locates sentence boundaries. </p> </li></ul> <p>For technical details on different classes of text boundaries refer to the <span class="pkg">ICU</span> User Guide, see below. </p> <h3>References</h3> <p><em>Boundary Analysis</em> – ICU User Guide, <a href="http://userguide.icu-project.org/boundaryanalysis">http://userguide.icu-project.org/boundaryanalysis</a> </p> <h3>See Also</h3> <p>Other locale_sensitive: <code><a href="oper_comparison.html">%s<%</a>()</code>, <code><a href="stri_compare.html">stri_compare</a>()</code>, <code><a href="stri_count_boundaries.html">stri_count_boundaries</a>()</code>, <code><a href="stri_duplicated.html">stri_duplicated</a>()</code>, <code><a href="stri_enc_detect2.html">stri_enc_detect2</a>()</code>, <code><a href="stri_extract_boundaries.html">stri_extract_all_boundaries</a>()</code>, <code><a href="stri_locate_boundaries.html">stri_locate_all_boundaries</a>()</code>, <code><a href="stri_opts_collator.html">stri_opts_collator</a>()</code>, <code><a href="stri_order.html">stri_order</a>()</code>, <code><a href="stri_sort.html">stri_sort</a>()</code>, <code><a href="stri_split_boundaries.html">stri_split_boundaries</a>()</code>, <code><a href="stri_trans_casemap.html">stri_trans_tolower</a>()</code>, <code><a href="stri_unique.html">stri_unique</a>()</code>, <code><a href="stri_wrap.html">stri_wrap</a>()</code>, <code><a href="stringi-locale.html">stringi-locale</a></code>, <code><a href="stringi-search-coll.html">stringi-search-coll</a></code> </p> <p>Other text_boundaries: <code><a href="stri_count_boundaries.html">stri_count_boundaries</a>()</code>, <code><a href="stri_extract_boundaries.html">stri_extract_all_boundaries</a>()</code>, <code><a href="stri_locate_boundaries.html">stri_locate_all_boundaries</a>()</code>, <code><a href="stri_opts_brkiter.html">stri_opts_brkiter</a>()</code>, <code><a href="stri_split_boundaries.html">stri_split_boundaries</a>()</code>, <code><a href="stri_split_lines.html">stri_split_lines</a>()</code>, <code><a href="stri_trans_casemap.html">stri_trans_tolower</a>()</code>, <code><a href="stri_wrap.html">stri_wrap</a>()</code>, <code><a href="stringi-search.html">stringi-search</a></code> </p> <p>Other stringi_general_topics: <code><a href="stringi-arguments.html">stringi-arguments</a></code>, <code><a href="stringi-encoding.html">stringi-encoding</a></code>, <code><a href="stringi-locale.html">stringi-locale</a></code>, <code><a href="stringi-package.html">stringi-package</a></code>, <code><a href="stringi-search-charclass.html">stringi-search-charclass</a></code>, <code><a href="stringi-search-coll.html">stringi-search-coll</a></code>, <code><a href="stringi-search-fixed.html">stringi-search-fixed</a></code>, <code><a href="stringi-search-regex.html">stringi-search-regex</a></code>, <code><a href="stringi-search.html">stringi-search</a></code> </p> <hr /><div style="text-align: center;">[Package <em>stringi</em> version 1.4.6 <a href="00Index.html">Index</a>]</div> </body></html>