EVOLUTION-MANAGER

Edit File: mallet_tidiers.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Tidiers for Latent Dirichlet Allocation models from the...</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="R.css" />
</head><body>

<table width="100%" summary="page for mallet_tidiers {tidytext}"><tr><td>mallet_tidiers {tidytext}</td><td style="text-align: right;">R Documentation</td></tr></table>

<h2>Tidiers for Latent Dirichlet Allocation models from the mallet package</h2>

<h3>Description</h3>

<p>Tidy LDA models fit by the mallet package, which wraps the Mallet topic
modeling package in Java. The arguments and return values
are similar to <code><a href="lda_tidiers.html">lda_tidiers</a></code>.
</p>

<h3>Usage</h3>

<pre>
## S3 method for class 'jobjRef'
tidy(
  x,
  matrix = c("beta", "gamma"),
  log = FALSE,
  normalized = TRUE,
  smoothed = TRUE,
  ...
)

## S3 method for class 'jobjRef'
augment(x, data, ...)
</pre>

<h3>Arguments</h3>

<table summary="R argblock">
<tr valign="top"><td><code>x</code></td>
<td>
<p>A jobjRef object, of type RTopicModel, such as created
by <code><a href="../../mallet/html/MalletLDA.html">MalletLDA</a></code>.</p>
</td></tr>
<tr valign="top"><td><code>matrix</code></td>
<td>
<p>Whether to tidy the beta (per-term-per-topic, default)
or gamma (per-document-per-topic) matrix.</p>
</td></tr>
<tr valign="top"><td><code>log</code></td>
<td>
<p>Whether beta/gamma should be on a log scale, default FALSE</p>
</td></tr>
<tr valign="top"><td><code>normalized</code></td>
<td>
<p>If true (default), normalize so that each
document or word sums to one across the topics. If false, values will
be integers representing the actual number of word-topic or document-topic
assignments.</p>
</td></tr>
<tr valign="top"><td><code>smoothed</code></td>
<td>
<p>If true (default), add the smoothing parameter to each
to avoid any values being zero. This smoothing parameter is initialized
as <code>alpha.sum</code> in <code><a href="../../mallet/html/MalletLDA.html">MalletLDA</a></code>.</p>
</td></tr>
<tr valign="top"><td><code>...</code></td>
<td>
<p>Extra arguments, not used</p>
</td></tr>
<tr valign="top"><td><code>data</code></td>
<td>
<p>For <code>augment</code>, the data given to the LDA function, either
as a DocumentTermMatrix or as a tidied table with &quot;document&quot; and &quot;term&quot;
columns.</p>
</td></tr>
</table>

<h3>Details</h3>

<p>Note that the LDA models from <code><a href="../../mallet/html/MalletLDA.html">MalletLDA</a></code>
are technically a special case of S4 objects with class <code>jobjRef</code>.
These are thus implemented as <code>jobjRef</code> tidiers, with a check for
whether the <code>toString</code> output is as expected.
</p>

<h3>Value</h3>

<p><code>augment</code> must be provided a data argument containing
one row per original document-term pair, such as is returned by
<a href="tdm_tidiers.html">tdm_tidiers</a>, containing columns <code>document</code> and <code>term</code>.
It returns that same data with an additional column
<code>.topic</code> with the topic assignment for that document-term combination.
</p>

<p><code><a href="lda_tidiers.html">lda_tidiers</a></code>, <code><a href="../../mallet/html/mallet.doc.topics.html">mallet.doc.topics</a></code>,
<code><a href="../../mallet/html/mallet.topic.words.html">mallet.topic.words</a></code>
</p>

<h3>Examples</h3>

<pre>

## Not run: 
library(mallet)
library(dplyr)

data("AssociatedPress", package = "topicmodels")
td &lt;- tidy(AssociatedPress)

# mallet needs a file with stop words
tmp &lt;- tempfile()
writeLines(stop_words$word, tmp)

# two vectors: one with document IDs, one with text
docs &lt;- td %&gt;%
  group_by(document = as.character(document)) %&gt;%
  summarize(text = paste(rep(term, count), collapse = " "))

docs &lt;- mallet.import(docs$document, docs$text, tmp)

# create and run a topic model
topic_model &lt;- MalletLDA(num.topics = 4)
topic_model$loadDocuments(docs)
topic_model$train(20)

# tidy the word-topic combinations
td_beta &lt;- tidy(topic_model)
td_beta

# Examine the four topics
td_beta %&gt;%
  group_by(topic) %&gt;%
  top_n(8, beta) %&gt;%
  ungroup() %&gt;%
  mutate(term = reorder(term, beta)) %&gt;%
  ggplot(aes(term, beta)) +
  geom_col() +
  facet_wrap(~ topic, scales = "free") +
  coord_flip()

# find the assignments of each word in each document
assignments &lt;- augment(topic_model, td)
assignments

## End(Not run)

</pre>

<hr /><div style="text-align: center;">[Package <em>tidytext</em> version 0.3.4 <a href="00Index.html">Index</a>]</div>
</body></html>