EVOLUTION-MANAGER
Edit File: mallet_tidiers.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Tidiers for Latent Dirichlet Allocation models from the...</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for mallet_tidiers {tidytext}"><tr><td>mallet_tidiers {tidytext}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Tidiers for Latent Dirichlet Allocation models from the mallet package</h2> <h3>Description</h3> <p>Tidy LDA models fit by the mallet package, which wraps the Mallet topic modeling package in Java. The arguments and return values are similar to <code><a href="lda_tidiers.html">lda_tidiers</a></code>. </p> <h3>Usage</h3> <pre> ## S3 method for class 'jobjRef' tidy( x, matrix = c("beta", "gamma"), log = FALSE, normalized = TRUE, smoothed = TRUE, ... ) ## S3 method for class 'jobjRef' augment(x, data, ...) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x</code></td> <td> <p>A jobjRef object, of type RTopicModel, such as created by <code><a href="../../mallet/html/MalletLDA.html">MalletLDA</a></code>.</p> </td></tr> <tr valign="top"><td><code>matrix</code></td> <td> <p>Whether to tidy the beta (per-term-per-topic, default) or gamma (per-document-per-topic) matrix.</p> </td></tr> <tr valign="top"><td><code>log</code></td> <td> <p>Whether beta/gamma should be on a log scale, default FALSE</p> </td></tr> <tr valign="top"><td><code>normalized</code></td> <td> <p>If true (default), normalize so that each document or word sums to one across the topics. If false, values will be integers representing the actual number of word-topic or document-topic assignments.</p> </td></tr> <tr valign="top"><td><code>smoothed</code></td> <td> <p>If true (default), add the smoothing parameter to each to avoid any values being zero. This smoothing parameter is initialized as <code>alpha.sum</code> in <code><a href="../../mallet/html/MalletLDA.html">MalletLDA</a></code>.</p> </td></tr> <tr valign="top"><td><code>...</code></td> <td> <p>Extra arguments, not used</p> </td></tr> <tr valign="top"><td><code>data</code></td> <td> <p>For <code>augment</code>, the data given to the LDA function, either as a DocumentTermMatrix or as a tidied table with "document" and "term" columns.</p> </td></tr> </table> <h3>Details</h3> <p>Note that the LDA models from <code><a href="../../mallet/html/MalletLDA.html">MalletLDA</a></code> are technically a special case of S4 objects with class <code>jobjRef</code>. These are thus implemented as <code>jobjRef</code> tidiers, with a check for whether the <code>toString</code> output is as expected. </p> <h3>Value</h3> <p><code>augment</code> must be provided a data argument containing one row per original document-term pair, such as is returned by <a href="tdm_tidiers.html">tdm_tidiers</a>, containing columns <code>document</code> and <code>term</code>. It returns that same data with an additional column <code>.topic</code> with the topic assignment for that document-term combination. </p> <h3>See Also</h3> <p><code><a href="lda_tidiers.html">lda_tidiers</a></code>, <code><a href="../../mallet/html/mallet.doc.topics.html">mallet.doc.topics</a></code>, <code><a href="../../mallet/html/mallet.topic.words.html">mallet.topic.words</a></code> </p> <h3>Examples</h3> <pre> ## Not run: library(mallet) library(dplyr) data("AssociatedPress", package = "topicmodels") td <- tidy(AssociatedPress) # mallet needs a file with stop words tmp <- tempfile() writeLines(stop_words$word, tmp) # two vectors: one with document IDs, one with text docs <- td %>% group_by(document = as.character(document)) %>% summarize(text = paste(rep(term, count), collapse = " ")) docs <- mallet.import(docs$document, docs$text, tmp) # create and run a topic model topic_model <- MalletLDA(num.topics = 4) topic_model$loadDocuments(docs) topic_model$train(20) # tidy the word-topic combinations td_beta <- tidy(topic_model) td_beta # Examine the four topics td_beta %>% group_by(topic) %>% top_n(8, beta) %>% ungroup() %>% mutate(term = reorder(term, beta)) %>% ggplot(aes(term, beta)) + geom_col() + facet_wrap(~ topic, scales = "free") + coord_flip() # find the assignments of each word in each document assignments <- augment(topic_model, td) assignments ## End(Not run) </pre> <hr /><div style="text-align: center;">[Package <em>tidytext</em> version 0.3.4 <a href="00Index.html">Index</a>]</div> </body></html>