EVOLUTION-MANAGER
Edit File: stm_tidiers.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Tidiers for Structural Topic Models from the stm package</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for stm_tidiers {tidytext}"><tr><td>stm_tidiers {tidytext}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Tidiers for Structural Topic Models from the stm package</h2> <h3>Description</h3> <p>Tidy topic models fit by the stm package. The arguments and return values are similar to <code><a href="lda_tidiers.html">lda_tidiers</a></code>. </p> <h3>Usage</h3> <pre> ## S3 method for class 'STM' tidy( x, matrix = c("beta", "gamma", "theta"), log = FALSE, document_names = NULL, ... ) ## S3 method for class 'estimateEffect' tidy(x, ...) ## S3 method for class 'estimateEffect' glance(x, ...) ## S3 method for class 'STM' augment(x, data, ...) ## S3 method for class 'STM' glance(x, ...) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x</code></td> <td> <p>An STM fitted model object from either <code>stm</code> or <code>estimateEffect</code> from the stm package.</p> </td></tr> <tr valign="top"><td><code>matrix</code></td> <td> <p>Whether to tidy the beta (per-term-per-topic, default) or gamma/theta (per-document-per-topic) matrix. The stm package calls this the theta matrix, but other topic modeling packages call this gamma.</p> </td></tr> <tr valign="top"><td><code>log</code></td> <td> <p>Whether beta/gamma/theta should be on a log scale, default FALSE</p> </td></tr> <tr valign="top"><td><code>document_names</code></td> <td> <p>Optional vector of document names for use with per-document-per-topic tidying</p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p>Extra arguments, not used</p> </td></tr> <tr valign="top"><td><code>data</code></td> <td> <p>For <code>augment</code>, the data given to the stm function, either as a <code>dfm</code> from quanteda or as a tidied table with "document" and "term" columns</p> </td></tr> </table> <h3>Value</h3> <p><code>tidy</code> returns a tidied version of either the beta or gamma matrix if called on an object from <code>stm</code> or a tidied version of the estimated regressions if called on an object from <code>estimateEffect</code>. </p> <p><code>glance</code> always returns a one-row table, with columns </p> <dl> <dt>k</dt><dd><p>Number of topics in the model</p> </dd> <dt>docs</dt><dd><p>Number of documents in the model</p> </dd> <dt>uncertainty</dt><dd><p>Uncertainty measure</p> </dd> </dl> <p><code>augment</code> must be provided a data argument, either a <code>dfm</code> from quanteda or a table containing one row per original document-term pair, such as is returned by <a href="tdm_tidiers.html">tdm_tidiers</a>, containing columns <code>document</code> and <code>term</code>. It returns that same data as a table with an additional column <code>.topic</code> with the topic assignment for that document-term combination. </p> <p><code>glance</code> always returns a one-row table, with columns </p> <dl> <dt>k</dt><dd><p>Number of topics in the model</p> </dd> <dt>docs</dt><dd><p>Number of documents in the model</p> </dd> <dt>terms</dt><dd><p>Number of terms in the model</p> </dd> <dt>iter</dt><dd><p>Number of iterations used</p> </dd> <dt>alpha</dt><dd><p>If an LDA model, the parameter of the Dirichlet distribution for topics over documents</p> </dd> </dl> <h3>See Also</h3> <p><code><a href="lda_tidiers.html">lda_tidiers</a></code> </p> <p>If <code>matrix == "beta"</code> (default), returns a table with one row per topic and term, with columns </p> <dl> <dt>topic</dt><dd><p>Topic, as an integer</p> </dd> <dt>term</dt><dd><p>Term</p> </dd> <dt>beta</dt><dd><p>Probability of a term generated from a topic according to the structural topic model</p> </dd> </dl> <p>If <code>matrix == "gamma"</code>, returns a table with one row per topic and document, with columns </p> <dl> <dt>topic</dt><dd><p>Topic, as an integer</p> </dd> <dt>document</dt><dd><p>Document name (if given in vector of <code>document_names</code>) or ID as an integer</p> </dd> <dt>gamma</dt><dd><p>Probability of topic given document</p> </dd> </dl> <p>If called on an object from <code>estimateEffect</code>, returns a table with columns </p> <dl> <dt>topic</dt><dd><p>Topic, as an integer</p> </dd> <dt>term</dt><dd><p>The term in the model being estimated and tested</p> </dd> <dt>estimate</dt><dd><p>The estimated coefficient</p> </dd> <dt>std.error</dt><dd><p>The standard error from the linear model</p> </dd> <dt>statistic</dt><dd><p>t-statistic</p> </dd> <dt>p.value</dt><dd><p>two-sided p-value</p> </dd> </dl> <h3>Examples</h3> <pre> ## Not run: if (requireNamespace("stm", quietly = TRUE)) { library(dplyr) library(ggplot2) library(stm) library(janeaustenr) austen_sparse <- austen_books() %>% unnest_tokens(word, text) %>% anti_join(stop_words) %>% count(book, word) %>% cast_sparse(book, word, n) topic_model <- stm(austen_sparse, K = 12, verbose = FALSE, init.type = "Spectral") # tidy the word-topic combinations td_beta <- tidy(topic_model) td_beta # Examine the topics td_beta %>% group_by(topic) %>% top_n(10, beta) %>% ungroup() %>% ggplot(aes(term, beta)) + geom_col() + facet_wrap(~ topic, scales = "free") + coord_flip() # tidy the document-topic combinations, with optional document names td_gamma <- tidy(topic_model, matrix = "gamma", document_names = rownames(austen_sparse)) td_gamma # using stm's gardarianFit, we can tidy the result of a model # estimated with covariates effects <- estimateEffect(1:3 ~ treatment, gadarianFit, gadarian) glance(effects) td_estimate <- tidy(effects) td_estimate } ## End(Not run) </pre> <hr /><div style="text-align: center;">[Package <em>tidytext</em> version 0.3.4 <a href="00Index.html">Index</a>]</div> </body></html>