EVOLUTION-MANAGER
Edit File: NEWS.md
# tidytext 0.3.4 * Updated the tidy method for a quanteda `dfm` because of the upcoming release of Matrix (#218) # tidytext 0.3.3 * `scale_x/y_reordered()` now uses a function `labels` as its main input (#200) * Fixed how `to_lower` is passed to underlying tokenization function for character shingles (#208) * Added support for tidying STM models that use `content`, thanks to @jonathanvoelkle (#209) # tidytext 0.3.2 * Update testing for rlang change + testthat 3e # tidytext 0.3.1 * Check for installation of stopwords more gracefully * Update tidiers and casters for new version of quanteda # tidytext 0.3.0 * Use vdiffr conditionally * Bug fix/breaking change for `collapse` argument to `unnest_functions()`. This argument now takes either `NULL` (do not collapse text across rows for tokenizing) or a character vector of variables (use said variables to collapse text across rows for tokenizing). This fixes a long-standing bug and provides more consistent behavior, but does change results for many situations (such as n-gram tokenization). # tidytext 0.2.6 * Move one vignette to pkgdown site, because of dependency removal * Move all CI from Travis to GH actions # tidytext 0.2.5 * `reorder_within()` now handles multiple variables, thanks to @tmastny (#170) * Move stopwords to Suggests so tidytext can be installed on older versions of R * Pass `to_lower` argument to other tokenizing functions, for more consistent behavior (#175) * Add `glance()` method for stm's estimated regressions, thanks to @vincentarelbundock (#176) # tidytext 0.2.4 * Update tidying test for new tibble release (inner names for columns) * Deprecate SE versions of main functions (have long been replaced by tidy eval semantics) * Improve error handling throughout # tidytext 0.2.3 * Wrapper tokenization functions for n-grams, characters, sentences, tweets, and more, thanks to @ColinFay (#137). * Simplify get_sentiments() thanks to @jennybc (#151). * Fix flaky tests for corpus tidiers. # tidytext 0.2.2 * Access NRC lexicon via textdata package # tidytext 0.2.1 * Fix bug in `augment()` function for stm topic model. * Warn when tf-idf is negative, thanks to @EmilHvitfeldt (#112). * Switch from importing broom to importing generics, for lighter dependencies (#133). * Add functions for reordering factors (such as for ggplot2 bar plots) thanks to @tmastny (#110). * Update to `tibble()` where appropriate, thanks to @luisdza (#136). * Clarify documentation about impact of lowercase conversion on URLs (#139). * Change how sentiment lexicons are accessed from package (remove NRC lexicon entirely, access AFINN and Loughran lexicons via textdata package so they are no longer included in this package). # tidytext 0.2.0 * Improvements to documentation (#117) * Fix for NSE thanks to @lepennec (#122). * Tidier for estimated regressions from **stm** package thanks to @jefferickson (#115). * Tidier for correlated topic model from **topicmodels** package (#123). # tidytext 0.1.9 * Updates to documentation (#109) thanks to Emil Hvitfeldt. * Add new tokenizers for tweets, Penn Treebank to `unnest_tokens()`. * Better error message (#111) and code styling. * Declare dependency for tests. # tidytext 0.1.8 * Updates to documentation (#102), README, and vignettes. * Add tokenizing by character shingles thanks to Kanishka Misra (#105). * Fix tests for skip grams thanks to Lincoln Mullen (#106). # tidytext 0.1.7 * Updated more docs/tests so package can build on R-oldrel. (Still trying!) # tidytext 0.1.6 * `unnest_tokens` can now unnest a data frame with a list column (which formerly threw the error `unnest_tokens expects all columns of input to be atomic vectors (not lists)`). The unnested result repeats the objects within each list. (It's still not possible when `collapse = TRUE`, in which tokens can span multiple lines). * Add `get_tidy_stopwords()` to obtain stopword lexicons in multiple languages in a tidy format. * Add a dataset `nma_words` of negators, modals, and adverbs that affect sentiment analysis (#55). * Updated various vignettes/docs/tests so package can build on R-oldrel. # tidytext 0.1.5 * Change how `NA` values are handled in `unnest_tokens` so they no longer cause other columns to become `NA` (#82). * Update tidiers and casters to align with quanteda v1.0 (#87). * Handle input/output object classes (such as `data.table`) consistently (#88). # tidytext 0.1.4 * Fix tidier for quanteda dictionary for correct class (#71). * Add a [pkgdown site](https://juliasilge.github.io/tidytext/). * Convert NSE from underscored function to tidyeval (`unnest_tokens`, `bind_tf_idf`, all sparse casters) (#67, #74). * Added tidiers for topic models from the `stm` package (#51). # tidytext 0.1.3 * `get_sentiments` now works regardless of whether `tidytext` has been loaded or not (#50). * `unnest_tokens` now supports data.table objects (#37). * Fixed `to_lower` parameter in `unnest_tokens` to work properly for all tokenizing options. * Updated `tidy.corpus`, `glance.corpus`, tests, and vignette for changes to quanteda API * Removed the deprecated `pair_count` function, which is now in the in-development widyr package * Added tidiers for LDA models from the `mallet` package * Added the Loughran and McDonald dictionary of sentiment words specific to financial reports * `unnest_tokens` preserves custom attributes of data frames and data.tables # tidytext 0.1.2 * Updated DESCRIPTION to require purrr >= 0.1.1. * Fixed `cast_sparse`, `cast_dtm`, and other sparse casters to ignore groups in the input (#19) * Changed `unnest_tokens` so that it no longer uses tidyr's unnest, but rather a custom version that removes some overhead. In some experiments, this sped up unnest_tokens on large inputs by about 40%. This also moves tidyr from Imports to Suggests for now. * `unnest_tokens` now checks that there are no list columns in the input, and raises an error if present (since those cannot be unnested). * Added a `format` argument to unnest_tokens so that it can process html, xml, latex or man pages using the hunspell package, though only when `token = "words"`. * Added a `get_sentiments` function that takes the name of a lexicon ("nrc", "bing", or "sentiment") and returns just that sentiment data frame (#25) # tidytext 0.1.1 * Added documentation for n-grams, skip n-grams, and regex * Added codecov and appveyor * Added tidiers for LDA objects from topicmodels and a vignette on topic modeling * Added function to calculate tf-idf of a tidy text dataset and a tf-idf vignette * Fixed a bug when tidying by line/sentence/paragraph/regex and there are multiple non-text columns * Fixed a bug when unnesting using n-grams and skip n-grams (entire text was not being collapsed) * Added ability to pass a (custom tokenizing) function to token. Also added a collapse argument that makes the choice whether to combine lines before tokenizing explicit. * Changed tidy.dictionary to return a tbl_df rather than a data.frame * Updated `cast_sparse` to work with dplyr 0.5.0 * Deprecated the `pair_count` function, which has been moved to `pairwise_count` in the [widyr package](https://github.com/dgrtwo/widyr). This will be removed entirely in a future version. # tidytext 0.1.0 * Initial release for text mining using tidy tools