EVOLUTION-MANAGER
Edit File: lda_tidiers.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Tidiers for LDA and CTM objects from the topicmodels package</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for lda_tidiers {tidytext}"><tr><td>lda_tidiers {tidytext}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Tidiers for LDA and CTM objects from the topicmodels package</h2> <h3>Description</h3> <p>Tidy the results of a Latent Dirichlet Allocation or Correlated Topic Model. </p> <h3>Usage</h3> <pre> ## S3 method for class 'LDA' tidy(x, matrix = c("beta", "gamma"), log = FALSE, ...) ## S3 method for class 'CTM' tidy(x, matrix = c("beta", "gamma"), log = FALSE, ...) ## S3 method for class 'LDA' augment(x, data, ...) ## S3 method for class 'CTM' augment(x, data, ...) ## S3 method for class 'LDA' glance(x, ...) ## S3 method for class 'CTM' glance(x, ...) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x</code></td> <td> <p>An LDA or CTM (or LDA_VEM/CTA_VEM) object from the topicmodels package</p> </td></tr> <tr valign="top"><td><code>matrix</code></td> <td> <p>Whether to tidy the beta (per-term-per-topic, default) or gamma (per-document-per-topic) matrix</p> </td></tr> <tr valign="top"><td><code>log</code></td> <td> <p>Whether beta/gamma should be on a log scale, default FALSE</p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p>Extra arguments, not used</p> </td></tr> <tr valign="top"><td><code>data</code></td> <td> <p>For <code>augment</code>, the data given to the LDA or CTM function, either as a DocumentTermMatrix or as a tidied table with "document" and "term" columns</p> </td></tr> </table> <h3>Value</h3> <p><code>tidy</code> returns a tidied version of either the beta or gamma matrix. </p> <p>If <code>matrix == "beta"</code> (default), returns a table with one row per topic and term, with columns </p> <dl> <dt>topic</dt><dd><p>Topic, as an integer</p> </dd> <dt>term</dt><dd><p>Term</p> </dd> <dt>beta</dt><dd><p>Probability of a term generated from a topic according to the multinomial model</p> </dd> </dl> <p>If <code>matrix == "gamma"</code>, returns a table with one row per topic and document, with columns </p> <dl> <dt>topic</dt><dd><p>Topic, as an integer</p> </dd> <dt>document</dt><dd><p>Document name or ID</p> </dd> <dt>gamma</dt><dd><p>Probability of topic given document</p> </dd> </dl> <p><code>augment</code> returns a table with one row per original document-term pair, such as is returned by <a href="tdm_tidiers.html">tdm_tidiers</a>: </p> <dl> <dt>document</dt><dd><p>Name of document (if present), or index</p> </dd> <dt>term</dt><dd><p>Term</p> </dd> <dt>.topic</dt><dd><p>Topic assignment</p> </dd> </dl> <p>If the <code>data</code> argument is provided, any columns in the original data are included, combined based on the <code>document</code> and <code>term</code> columns. </p> <p><code>glance</code> always returns a one-row table, with columns </p> <dl> <dt>iter</dt><dd><p>Number of iterations used</p> </dd> <dt>terms</dt><dd><p>Number of terms in the model</p> </dd> <dt>alpha</dt><dd><p>If an LDA_VEM, the parameter of the Dirichlet distribution for topics over documents</p> </dd> </dl> <h3>Examples</h3> <pre> if (requireNamespace("topicmodels", quietly = TRUE)) { set.seed(2016) library(dplyr) library(topicmodels) data("AssociatedPress", package = "topicmodels") ap <- AssociatedPress[1:100, ] lda <- LDA(ap, control = list(alpha = 0.1), k = 4) # get term distribution within each topic td_lda <- tidy(lda) td_lda library(ggplot2) # visualize the top terms within each topic td_lda_filtered <- td_lda %>% filter(beta > .004) %>% mutate(term = reorder(term, beta)) ggplot(td_lda_filtered, aes(term, beta)) + geom_bar(stat = "identity") + facet_wrap(~ topic, scales = "free") + theme(axis.text.x = element_text(angle = 90, size = 15)) # get classification of each document td_lda_docs <- tidy(lda, matrix = "gamma") td_lda_docs doc_classes <- td_lda_docs %>% group_by(document) %>% top_n(1) %>% ungroup() doc_classes # which were we most uncertain about? doc_classes %>% arrange(gamma) } </pre> <hr /><div style="text-align: center;">[Package <em>tidytext</em> version 0.3.4 <a href="00Index.html">Index</a>]</div> </body></html>