EVOLUTION-MANAGER
Edit File: NEWS
# stringi package NEWS and CHANGELOG ------------------------------------------------------------------------------- ## 1.4.6 (2020-02-17) **devel* * [NEW FEATURE] #369: `stri_c()` now returns an empty string when input is empty and `collapse` is set. * [BUGFIX] #370: fixed an issue in `stri_prepare_arg_POSIXct()` reported by rchk. * [BUGFIX] #372: documented arguments not in `\usage` in documentation object `stri_datetime_format`: `...` ----------------------------------------------------------------------------- ## 1.4.5 (2020-01-11) **CRAN* * [BUGFIX] #366: fix for #363 required ICU >= 55 . ----------------------------------------------------------------------------- ## 1.4.4 (2020-01-06) **CRAN** * [BUGFIX] #348: Avoid copying 0 bytes to a nil-buffer in `stri_sub_all()`. * [BUGFIX] #362: Removed `configure` variable `CXXCPP` as it is now deprecated. * [BUGFIX] #318: PROTECTing objects from gcing as reported by `rchk`. * [BUGFIX] #344, #364: Removed compiler warnings in icu61/common/cstring.h. * [BUGFIX] #363: Status of `RegexMatcher` is now checked after its use. ------------------------------------------------------------------------------- ## 1.4.3 (2019-03-12) **CRAN** * [NEW FEATURE] #30: New function `stri_sub_all()` - a version of `stri_sub()` accepting list `from`/`to`/`length` arguments for extracting multiple substrings from each string in a character vector. * [NEW FEATURE] #30: New function `stri_sub_all<-()` (and its `%<%`-friendly version, `stri_sub_replace_all()`) - for replacing multiple substrings with corresponding replacement strings. * [NEW FEATURE] In `stri_sub_replace()`, `value` parameter has a new alias, `replacement`. * [NEW FEATURE] New convenience functions based on `stri_remove_empty()`: `stri_omit_empty_na()`, `stri_remove_empty_na()`, `stri_omit_empty()`, and also `stri_remove_na()`, `stri_omit_na()`. * [BUGFIX] #343: `stri_trans_char()` did not yield correct results for overlapping pattern and replacement strings. * [WARNFIX] #205: `configure.ac` is now included in the source bundle. ------------------------------------------------------------------------------- ## 1.3.1 (2019-02-10) **CRAN** * [BACKWARD INCOMPATIBILITY] #335: A fix to #314 prevented (by design) the use of the system ICU if the library had been compiled with `U_CHARSET_IS_UTF8=1`. However, this is the default setting in `libicu`>=61. From now on, in such cases the system ICU is used more eagerly, but `stri_enc_set()` issues a warning stating that the default (UTF-8) encoding cannot be changed. * [NEW FEATURE] #232: All `stri_detect_*` functions now have the `max_count` argument that allows for, e.g., stopping at the first pattern occurrence. * [NEW FEATURE] #338: `stri_sub_replace()` is now an alias for `stri_sub<-()` which makes it much more easily pipable (@yutannihilation, @BastienFR). * [NEW FEATURE] #334: Added missing `icudt61b.dat` to support big-endian platforms (thanks to Dimitri John Ledkov @xnox). * [BUGFIX] #296: Out-of-the box build used to fail on CentOS 6, upgraded `./configure` to `--disable-cxx11` more eagerly at an early stage. * [BUGFIX] #341: Fixed possible buffer overflows when calling `strncpy()` from within ICU 61. * [BUGFIX] #325: Made `./configure` more portable so that it works under `/bin/dash` now. * [BUGFIX] #319: Fixed overflow in `stri_rand_shuffle()`. * [BUGFIX] #337: Empty search patters in search functions (e.g., `stri_split_regex()` and `stri_count_fixed()`) used to raise too many warnings on empty search patters. ------------------------------------------------------------------------------- ## 1.2.4 (2018-07-20) **CRAN** * [BUGFIX] #314: Testing `U_CHARSET_IS_UTF8` in `./configure` when using `pkg-build`. * ~~[BUILD TIME] #317: Included `icudt61l.zip` in the source bundle to solve the frequent `icudt download failed` error (also on CRAN's `windows-release` and `windows-oldrel`).~~ (reverted in version 1.3.1, the `winbuilder` errors were caused by a build chain bug). ------------------------------------------------------------------------------- ## 1.2.3 (2018-05-16) **CRAN** * [BUGFIX] #296: Fixed the behavior of the `./configure` script on CentOS 6. * [BUGFIX] Fixed broken Windows build by updating the `icudt` mirror list. ------------------------------------------------------------------------------- ## 1.2.2 (2018-05-01) **CRAN** * [GENERAL] #193: stringi is now bundled with ICU4C 61.1, which is used on most Windows and OS X builds as well as on *nix systems not equipped with ICU. However, if the C++11 support is disabled, stringi will be built against ICU4C 55.1. The update to ICU brings Unicode 10.0 support, including new emoji characters. * [BUGFIX] #288: `stri_match()` did not return the correct number of columns when input was empty. * [NEW FEATURE] #188: `stri_enc_detect()` now returns a list of data frames. * [NEW FEATURE] #289: `stri_flatten()` how has `na_empty` and `omit_empty` arguments. * [NEW FEATURE] New functions: `stri_remove_empty()`, `stri_na2empty()`. * [NEW FEATURE] #285: Coercion from a non-trivial list (one that consists of atomic vectors, each of length 1) to an atomic vector now issues a warning. * [WARN] Removed `-Wparentheses` warnings in `icu55/common/cstring.h:38:63` and `icu55/i18n/windtfmt.cpp` in the ICU4C 55.1 bundle. ------------------------------------------------------------------------------- ## 1.1.7 (2018-03-06) **CRAN** * [BUGFIX] Fixed ICU4C 55.1 generating some *significant warnings* (`icu55/i18n/winnmfmt.cpp`) and *suppressing important diagnostics* (`src/icu55/i18n/decNumber.c`). ------------------------------------------------------------------------------- ## 1.1.6 (2017-11-10) **CRAN** * [WINDOWS SPECIFIC] #270: Strings marked with `latin1` encoding are now converted internally to UTF-8 using the WINDOWS-1252 codec. This fixes problems with - among others - displaying the Euro sign. * [NEW FEATURE] #263: Added support for custom rule-based break iteration, see `?stri_opts_brkiter`. * [NEW FEATURE] #267: `omit_na=TRUE` in `stri_sub<-()` now ignores missing values in any of the arguments provided. * [BUGFIX] Fixed unPROTECTed variable names and stack imbalances as reported by `rchk`. ------------------------------------------------------------------------------- ## 1.1.5 (2017-04-07) **CRAN** * [GENERAL] stringi now requires ICU4C >= 52. * [BUGFIX] Fixed errors pointed out by `clang-UBSAN` in `stri_brkiter.h`. ------------------------------------------------------------------------------- ## 1.1.4 (2017-04-06) **devel** * [GENERAL] stringi now requires R >= 2.14. * [BUILD TIME] #238, #220: Now trying *standard* ICU4C build flags if a call to `pkg-config` fails. * [BUILD TIME] #258: Use `CXX11` instead of `CXX1X` on R >= 3.4. * [BUILD TIME, BUGFIX] #254: `dir.exists()` is R >= 3.2. ------------------------------------------------------------------------------- ## 1.1.3 (2017-03-21) **CRAN** * [REMOVE DEPRECATED] `stri_install_check()` and `stri_install_icudt()` marked as deprecated in stringi 0.5-5 are no longer being exported. * [BUGFIX] #227: Incorrect behavior of `stri_sub()` and `stri_sub<-()` if the empty string was the result. * [BUILD TIME] #231: The `./configure` (*NIX only) script now reads the following environment variables: `STRINGI_CFLAGS`, `STRINGI_CPPFLAGS`, `STRINGI_CXXFLAGS`, `STRINGI_LDFLAGS`, `STRINGI_LIBS`, `STRINGI_DISABLE_CXX11`, `STRINGI_DISABLE_ICU_BUNDLE`, `STRINGI_DISABLE_PKG_CONFIG`, `PKG_CONFIG`, see `INSTALL` for more information. * [BUILD TIME] #253: Call to `R_useDynamicSymbols()` added. * [BUILD TIME] #230: `icudt` is now being downloaded by `./configure` (*NIX only) *before* building. * [BUILD TIME] #242: `_COUNT/_LIMIT` enum constants have been deprecated as of ICU 58.2, stringi code has been upgraded accordingly. ------------------------------------------------------------------------------- ## 1.1.2 (2016-09-30) **CRAN** * [BUGFIX] `round()`, `snprintf()` is not C++98. ------------------------------------------------------------------------------- ## 1.1.1 (2016-05-25) **CRAN** * [BUGFIX] #214: Allow a regex pattern like `.*` to match an empty string. * [BUGFIX] #210: `stri_replace_all_fixed(c("1", "NULL"), "NULL", NA)` now results in `c("1", NA)`. * [NEW FEATURE] #199: `stri_sub<-()` now allows for ignoring `NA` locations (a new `omit_na` argument added). * [NEW FEATURE] #207: `stri_sub<-()` now allows for substring insertions (via `length=0`). * [NEW FUNCTION] #124: `stri_subset<-()` functions added. * [NEW FEATURE] #216: `stri_detect()`, `stri_subset()`, `stri_subset<-()` now all have the `negate` argument. * [NEW FUNCTION] #175: `stri_join_list()` concatenates all strings in a list of character vectors. Useful in conjunction with, e.g., `stri_extract_all_regex()`, `stri_extract_all_words()`, etc. ------------------------------------------------------------------------------- ## 1.0-1 (2015-10-22) **CRAN** * [GENERAL] #88: C API is now available for use in, e.g., Rcpp packages, see https://github.com/gagolews/ExampleRcppStringi for an example. * [BUGFIX] #183: Floating point exception raised in `stri_sub()` and `stri_sub<-()` when `to` or `length` was a zero-length numeric vector. * [BUGFIX] #180: `stri_c()` warned incorrectly (recycling rule) when using more than two elements. ------------------------------------------------------------------------------- ## 0.5-5 (2015-06-28) **CRAN** * [BACKWARD INCOMPATIBILITY] `stri_install_check()` and `stri_install_icudt()` are now deprecated. From now on they are supposed to be used only by the stringi installer. * [BUGFIX] #176: A patch for `sys/feature_tests.h` no longer included (the original file was copyrighted by Sun Microsystems); fixed the *Compiler or options invalid for pre-UNIX 03 X/Open applications and pre-2001 POSIX applications* error by forcing (conditionally) `_XPG6` conformance. * [BUGFIX] #174: `stri_paste()` did not generate any warning when the recycling rule is violated and `sep==""`. * [BUGFIX] #170: `icu::setDataDirectory` is no longer called if our ICU source bundle is not used (this used to cause build problems on openSUSE). * [BUILD TIME] #169: `./configure` now tries to switch to the *standard* C++ compiler if a C++11 one is not configured correctly. * [BUILD TIME] `configure.win` (`Biarch: TRUE`) now mimics `autoconf`'s `AC_SUBST` and `AC_CONFIG_FILES` so that the build process is now more similar across different platforms. * [NEW FEATURE] `stri_info()` now also gives information about which version of ICU4C is in use (system or bundle). ------------------------------------------------------------------------------- ## 0.5-2 (2015-06-21) **CRAN** * [BACKWARD INCOMPATIBILITY] The second argument to `stri_pad_*()` has been renamed `width`. * [GENERAL] #69: stringi is now bundled with ICU4C 55.1. * [NEW FUNCTIONS] `stri_extract_*_boundaries()` extract text between text boundaries. * [NEW FUNCTION] #46: `stri_trans_char()` is a stringi-flavoured `chartr()` equivalent. * [NEW FUNCTION] #8: `stri_width()` approximates the *width* of a string in a more Unicode-ish fashion than `nchar(..., "width")` * [NEW FEATURE] #149: `stri_pad()` and `stri_wrap()` is now (by default) based on code point widths instead of the number of code points. Moreover, the default behavior of `stri_wrap()` is now such that it does not get rid of non-breaking, zero width, etc., spaces. * [NEW FEATURE] #133: `stri_wrap()` silently allows for `width <= 0` (for compatibility with `strwrap()`). * [NEW FEATURE] #139: `stri_wrap()` gained a new argument: `whitespace_only`. * [NEW FUNCTIONS] #137: Date-time formatting/parsing: * `stri_timezone_list()` - lists all known time zone identifiers; * `stri_timezone_set()`, `stri_timezone_get()` - manage the current default time zone; * `stri_timezone_info()` - basic information on a given time zone; * `stri_datetime_symbols()` - gives localizable date-time formatting data; * `stri_datetime_fstr()` - converts a `strptime`-like format string to an ICU date/time format string; * `stri_datetime_format()` - converts date/time to string; * `stri_datetime_parse()` - converts string to date/time object; * `stri_datetime_create()` - constructs date-time objects from numeric representations; * `stri_datetime_now()` - returns current date-time; * `stri_datetime_fields()` - returns date-time fields' values; * `stri_datetime_add()` - adds specific number of date-time units to a date-time object. * [GENERAL] #144: Performance improvements in handling ASCII strings (these affect `stri_sub()`, `stri_locate()` and other string index-based operations) * [GENERAL] #143: Searching for short fixed patterns (`stri_*_fixed()`) now relies on the current `libC`'s implementation of `strchr()` and `strstr()`. This is very fast, e.g., on `glibc` using the `SSE2/3/4` instruction set. * [BUILD TIME] #141: A local copy of `icudt*.zip` may be used on package install; see the `INSTALL` file for more information. * [BUILD TIME] #165: The `./configure` option `--disable-icu-bundle` forces the use of system ICU when building the package. * [BUGFIX] Locale specifiers are now normalized in a more intelligent way: e.g., `@calendar=gregorian` expands to `DEFAULT_LOCALE@calendar=gregorian`. * [BUGFIX] #134: `stri_extract_all_words()` did not accept `simplify=NA`. * [BUGFIX] #132: Incorrect behavior in `stri_locate_regex()` for matches of zero lengths. * [BUGFIX] stringr/#73: `stri_wrap()` returned `CHARSXP` instead of `STRSXP` on empty string input with `simplify=FALSE` argument. * [BUGFIX] #164: Using `libicu-dev` failed on Ubuntu (`LIBS` shall be passed after `LDFLAGS` and the list of `.o` files). * [BUGFIX] #168: Build now fails if `icudt` is not available. * [BUGFIX] #135: C++11 is now used by default (see the `INSTALL` file, however) to build stringi from sources. This is because ICU4C uses the `long long` type which is not part of the C++98 standard. * [BUGFIX] #154: Dates and other objects with a custom class attribute were not coerced to the character type correctly. * [BUGFIX] Force ICU `u_init()` call on stringi dynlib load. * [BUGFIX] #157: Many overfull hboxes in the package PDF manual has been corrected. ------------------------------------------------------------------------------- ## 0.4-1 (2014-12-11) **CRAN** * [IMPORTANT CHANGE] `n_max` argument in `stri_split_*()` has been renamed `n`. * [IMPORTANT CHANGE] `simplify=FALSE` in `stri_extract_all_*()` and `stri_split_*()` now calls `stri_list2matrix()` with `fill=""`. `fill=NA_character_` may be obtained by using `simplify=NA`. * [IMPORTANT CHANGE, NEW FUNCTIONS] #120: `stri_extract_words()` has been renamed `stri_extract_all_words()` and `stri_locate_boundaries()` - `stri_locate_all_boundaries()` as well as `stri_locate_words()` - `stri_locate_all_words()`. New functions are now available: `stri_locate_first_boundaries()`, `stri_locate_last_boundaries()`, `stri_locate_first_words()`, `stri_locate_last_words()`, `stri_extract_first_words()`, `stri_extract_last_words()`. * [IMPORTANT CHANGE] #111: `opts_regex`, `opts_collator`, `opts_fixed`, and `opts_brkiter` can now be supplied individually via `...`. In other words, you may now simply call, e.g., `stri_detect_regex(str, pattern, case_insensitive=TRUE)` instead of `stri_detect_regex(str, pattern, opts_regex=stri_opts_regex(case_insensitive=TRUE))`. * [NEW FEATURE] #110: Fixed pattern search engine's settings can now be supplied via `opts_fixed` argument in `stri_*_fixed()`, see `stri_opts_fixed()`. A simple (not suitable for natural language processing) yet very fast `case_insensitive` pattern matching can be performed now. `stri_extract_*_fixed()` is again available. * [NEW FEATURE] #23: `stri_extract_all_fixed()`, `stri_count()`, and `stri_locate_all_fixed()` may now also look for overlapping pattern matches, see `?stri_opts_fixed`. * [NEW FEATURE] #129: `stri_match_*_regex()` gained a `cg_missing` argument. * [NEW FEATURE] #117: `stri_extract_all_*()`, `stri_locate_all_*()`, `stri_match_all_*()` gained a new argument: `omit_no_match`. Setting it to `TRUE` makes these functions compatible with their `stringr` equivalents. * [NEW FEATURE] #118: `stri_wrap()` gained `indent`, `exdent`, `initial`, and `prefix` arguments. Moreover, Knuth's dynamic word wrapping algorithm now assumes that the cost of printing the last line is zero, see #128. * [NEW FEATURE] #122: `stri_subset()` gained an `omit_na` argument. * [NEW FEATURE] `stri_list2matrix()` gained an `n_min` argument. * [NEW FEATURE] #126: `stri_split()` is now also able to act just like `stringr::str_split_fixed()`. * [NEW FEATURE] #119: `stri_split_boundaries()` now has `n`, `tokens_only`, and `simplify` arguments. Additionally, `stri_extract_all_words()` is now equipped with `simplify` arg. * [NEW FEATURE] #116: `stri_paste()` gained a new argument: `ignore_null`. Setting it to `TRUE` makes this function more compatible with `paste()`. * [OTHER] #123: `useDynLib` is used to speed up symbol look-up in the compiled dynamic library. * [BUGFIX] #114: `stri_paste()`: could return result in an incorrect order. * [BUGFIX] #94: Run-time errors on Solaris caused by setting `-DU_DISABLE_RENAMING=1` - memory allocation errors in, among others, the ICU `UnicodeString`. This setting also caused some `ASAN` sanity check failures within ICU code. ------------------------------------------------------------------------------- ## 0.3-1 (2014-11-06) **CRAN** * [IMPORTANT CHANGE] #87: `%>%` overlapped with the pipe operator from the `magrittr` package; now each operator like `%>%` has been renamed `%s>%`. * [IMPORTANT CHANGE] #108: Now the `BreakIterator` (for text boundary analysis) may be more easily controlled via `stri_opts_brkiter()` (see options `type` and `locale` which aim to replace now-removed `boundary` and `locale` parameters to `stri_locate_boundaries()`, `stri_split_boundaries()`, `stri_trans_totitle()`, `stri_extract_words()`, and `stri_locate_words()`). * [NEW FUNCTIONS] #109: `stri_count_boundaries()` and `stri_count_words()` count the number of text boundaries in a string. * [NEW FUNCTIONS] #41: `stri_startswith_*()` and `stri_endswith_*()` determine whether a string starts or ends with a given pattern. * [NEW FEATURE] #102: `stri_replace_all_*()` now all have the `vectorize_all` parameter, which defaults to `TRUE` for backward compatibility. * [NEW FUNCTION] #91: Added `stri_subset_*()` - a convenient and more efficient substitute for `str[stri_detect_*(str, ...)]`. * [NEW FEATURE] #100: `stri_split_fixed()`, `stri_split_charclass()`, `stri_split_regex()`, `stri_split_coll()` gained a `tokens_only` parameter, which defaults to `FALSE` for backward compatibility. * [NEW FUNCTION] #105: `stri_list2matrix()` converts lists of atomic vectors to character matrices, useful in conjunction with `stri_split()` and `stri_extract()`. * [NEW FEATURE] #107: `stri_split_*()` now allow setting an `omit_empty=NA` argument. * [NEW FEATURE] #106: `stri_split()` and `stri_extract_all()` gained a `simplify` argument (if `TRUE`, then `stri_list2matrix(..., byrow=TRUE)` is called on the resulting list). * [NEW FUNCTION] #77: `stri_rand_lipsum()` generates a (pseudo)random dummy *lorem ipsum* text. * [NEW FEATURE] #98: `stri_trans_totitle()` gained a `opts_brkiter` parameter; it indicates which ICU `BreakIterator` should be used when case mapping. * [NEW FEATURE] `stri_wrap()` gained a new parameter: `normalize`. * [BUGFIX] #86: `stri_*_fixed()`, `stri_*_coll()`, and `stri_*_regex()` could give incorrect results if one of search strings were of length 0. * [BUGFIX] #99: `stri_replace_all()` did not use the `replacement` arg. * [BUGFIX] #112: Some of the objects were not PROTECTed from garbage collection - this could have led to spontaneous SEGFAULTS. * [BUGFIX] Some collator's options were not passed correctly to ICU services. * [BUGFIX] Memory leaks as detected by `valgrind --tool=memcheck --leak-check=full` have been removed. * [DOCUMENTATION] Significant extensions/clean ups in the stringi manual. ------------------------------------------------------------------------------- ## 0.2-5 (2014-05-16) **CRAN** * ~~icudt-dependent examples are no longer run if `icudt` is not available.~~ ------------------------------------------------------------------------------- ## 0.2-4 (2014-05-15) **CRAN** * [BUGFIX] Fixed issues with loading of misaligned addresses in `stri_*_fixed()`. ------------------------------------------------------------------------------- ## 0.2-3 (2014-05-14) **CRAN** * [IMPORTANT CHANGE] `stri_cmp*()` now do not allow for passing `opts_collator=NA`. From now on, `stri_cmp_eq()`, `stri_cmp_neq()`, and the new operators `%===%`, `%!==%`, `%stri===%`, and `%stri!==%` are locale-independent operations, which base on code point comparisons. New functions `stri_cmp_equiv()` and `stri_cmp_nequiv()` (and from now on also `%==%`, `%!=%`, `%stri==%`, and `%stri!=%`) test for canonical equivalence. * [IMPORTANT CHANGE] `stri_*_fixed()` search functions now perform a locale-independent exact (byte-wise, of course after conversion to UTF-8) pattern search. All the `Collator`-based, locale-dependent search routines are now available via `stri_*_coll()`. The reason behind this is that ICU's `USearch` has currently very poor performance. What is more, in many search tasks exact pattern matching is sufficient anyway. * [GENERAL] `stri_*_fixed` now use a tweaked Knuth-Morris-Pratt search algorithm which improves the search performance drastically. * [IMPORTANT CHANGE] `stri_enc_nf*()` and `stri_enc_isnf*()` function families have been renamed `stri_trans_nf*()` and `stri_trans_isnf*()`, respectively -- they deal with text transforming, and not with character encoding. Note that all of these may be performed by ICU's `Transliterator` too (see below). * [NEW FUNCTION] `stri_trans_general()` and `stri_trans_list()` give access to ICU's `Transliterator`: they may be used to perform some generic text transforms, like Unicode normalization, case folding, etc. * [NEW FUNCTION `stri_split_boundaries()` uses ICU's `BreakIterator` to split strings at specific text boundaries. Moreover, `stri_locate_boundaries()` indicates positions of these boundaries. * [NEW FUNCTION] `stri_extract_words()` uses ICU's `BreakIterator` to extract all words from a text. Additionally, `stri_locate_words()` locates start and end positions of words in a text. * [NEW FUNCTION] `stri_pad()`, `stri_pad_left()`, `stri_pad_right()`, and `stri_pad_both()` pad a string with a specific code point. * [NEW FUNCTION] `stri_wrap()` breaks paragraphs of text into lines. Two algorithms (greedy and minimal raggedness) are available. * [IMPORTANT CHANGE] `stri_*_charclass()` search functions now rely solely on ICU's `UnicodeSet` patterns. All the previously accepted charclass identifiers became invalid. However, new patterns should now be more familiar to the users (they are regex-like). Moreover, we observe a very nice performance gain. * [IMPORTANT CHANGE] `stri_sort()` now does not include `NA`s in output vectors by default, for compatibility with `sort()`. Moreover, currently none of the input vector's attributes are preserved. * [NEW FUNCTION] `stri_unique()` extracts unique elements from a character vector. * [NEW FUNCTIONS] `stri_duplicated()` and `stri_duplicated_any()` determine duplicate elements in a character vector. * [NEW FUNCTION] `stri_replace_na()` replaces `NA`s in a character vector with a given string, useful for emulating, e.g., R's `paste()` behavior. * [NEW FUNCTION] `stri_rand_shuffle()` generates a random permutation of code points in a string. * [NEW FUNCTION] `stri_rand_strings()` generates random strings. * [NEW FUNCTIONS] New functions and binary operators for string comparison: `stri_cmp_eq()`, `stri_cmp_neq()`, `stri_cmp_lt()`, `stri_cmp_le()`, `stri_cmp_gt()`, `stri_cmp_ge()`, `%==%`, `%!=%`, `%<%`, `%<=%`, `%>%`, `%>=%`. * [NEW FUNCTION] `stri_enc_mark()` reads declared encodings of character strings as seen by stringi. * [NEW FUNCTION] `stri_enc_tonative(str)` is an alias to `stri_encode(str, NULL, NULL)`. * [NEW FEATURE] `stri_order()` and `stri_sort()` now have an additional argument `na_last` (defaults to `TRUE` and `NA`, respectively). * [NEW FEATURE] `stri_replace_all_charclass()`, `stri_extract_all_charclass()`, and `stri_locate_all_charclass()` now have a new argument, `merge` (defaults to `FALSE` for backward-compatibility). It may be used to, e.g., replace sequences of white spaces with a single space. * [NEW FEATURE] `stri_enc_toutf8()` now has a new `validate` arg (defaults to `FALSE` for backward-compatibility). It may be used in a (rare) case where a user wants to fix an invalid UTF-8 byte sequence. `stri_length()` (among others) now detects invalid UTF-8 byte sequences. * [NEW FEATURE] All binary operators `%???%` now also have aliases `%stri???%`. * [GENERAL] Performance improvements in `StriContainerUTF8` and `StriContainerUTF16` (they affect most other functions). * [GENERAL] Significant performance improvements in `stri_join()`, `stri_flatten()`, `stri_cmp()`, `stri_trans_to*()`, and others. * [GENERAL] Added 3rd mirror site for our `icudt` binary distribution. * `U_MISSING_RESOURCE_ERROR` message in `StriException` now suggests calling `stri_install_check()`. * [BUGFIX] UTF-8 BOMs are now silently removed from input strings. * [BUGFIX] No more attempts to re-encode UTF-8 encoded strings if native encoding is UTF-8 in `StriContainerUTF8`. * [BUGFIX] Possible memory leaks when throwing errors via `Rf_error()`. * [BUGFIX] `stri_order()` and `stri_cmp()` could return incorrect results for `opts_collator=NA`. * [BUGFIX] `stri_sort()` did not guarantee to return strings in UTF-8. ------------------------------------------------------------------------------- ## 0.1-25 (2014-03-12) **CRAN** * LICENSE tweaks. * Initial CRAN release. ------------------------------------------------------------------------------- ## 0.1-24 (2014-03-11) **devel** * Fixed bugs detected with `ASAN` and `UBSAN`, e.g., fixed `CharClass::gcmask` type (`enum` -> `uint32_t`) (reported by `UBSAN`). * Fixed array over-runs detected with `valgrind` in `string8.h`. * Fixed unitialized class fields in `StriContainerUTF8` (reported by `valgrind`). ------------------------------------------------------------------------------- ## 0.1-23 (2014-03-11) **devel** * License changed to BSD-3-clause, COPYRIGHTS updated. * `icudt` is not shipped with stringi anymore; it is now downloaded in `install.libs.R` from one of our servers. * New functions: `stri_install_check()`, `stri_install_icudt()`. ------------------------------------------------------------------------------- ## 0.1-22 (2014-02-20) **devel** * System ICU is used on systems which do have one (version >= 50 needed). ICU is autodetected with `pkg-config` in `./configure`. Pass `'--disable-pkg-config'` to `./configure` to force building ICU from sources. * `icudt52b` (custom subset) is now shipped with stringi (for big-endian, ASCII systems). ------------------------------------------------------------------------------- ## 0.1-21 (2014-02-19) **devel** * Fixed some Solaris-related issues while preparing stringi for CRAN submission. ------------------------------------------------------------------------------- ## 0.1-20 (2014-02-17) **devel** * ICU4C 52.1 sources included (common, i18n, stubdata + icu52dt.dat loaded dynamically). Compilation via Makevars. * stringi now does not depend on any external libraries. ------------------------------------------------------------------------------- ## 0.1-11 (2013-11-16) **devel** * ICU4C is now statically linked on Windows. * First OS X binary build. * The package is being intensively tested by our students @ FMIS WUT. ------------------------------------------------------------------------------- ## 0.1-10 (2013-11-13) **devel** * Using `pkg-config` via `./configure` to look for ICU4C libs. ------------------------------------------------------------------------------- ## 0.1-6 (2013-07-05) **devel** * First Windows binary build. * Compilation passed on Oracle Sun Studio compiler collection. * By now we have implemented most of the functionality scheduled for milestone 0.1. ------------------------------------------------------------------------------- ## 0.1-1 (2013-01-05) **devel** * The stringi project has been established on GitHub. -------------------------------------------------------------------------------