EVOLUTION-MANAGER
Edit File: xclara.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Bivariate Data Set with 3 Clusters</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for xclara {cluster}"><tr><td>xclara {cluster}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Bivariate Data Set with 3 Clusters</h2> <h3>Description</h3> <p>An artificial data set consisting of 3000 points in 3 quite well-separated clusters. </p> <h3>Usage</h3> <pre>data(xclara)</pre> <h3>Format</h3> <p>A data frame with 3000 observations on 2 numeric variables (named <code>V1</code> and <code>V2</code>) giving the <i>x</i> and <i>y</i> coordinates of the points, respectively. </p> <h3>Note</h3> <p>Our version of the <code>xclara</code> is slightly more rounded than the one from <code><a href="../../utils/html/read.table.html">read.table</a>("xclara.dat")</code> and the relative difference measured by <code><a href="../../base/html/all.equal.html">all.equal</a></code> is <code>1.15e-7</code> for <code>V1</code> and <code>1.17e-7</code> for <code>V2</code> which suggests that our version has been the result of a <code><a href="../../base/html/options.html">options</a>(digits = 7)</code> formatting. </p> <p>Previously (before May 2017), it was claimed the three cluster were each of size 1000, which is clearly wrong. <code><a href="pam.html">pam</a>(*, 3)</code> gives cluster sizes of 899, 1149, and 952, which apart from seven “outliers” (or “mislabellings”) correspond to observation indices <i>1:900</i>, <i>901:2050</i>, and <i>2051:3000</i>, see the example. </p> <h3>Source</h3> <p>Sample data set accompanying the reference below (file ‘<span class="file">xclara.dat</span>’ in side ‘<span class="file">clus_examples.tar.gz</span>’). </p> <h3>References</h3> <p>Anja Struyf, Mia Hubert & Peter J. Rousseeuw (1996) Clustering in an Object-Oriented Environment. <em>Journal of Statistical Software</em> <b>1</b>. doi: <a href="http://doi.org/10.18637/jss.v001.i04">10.18637/jss.v001.i04</a> </p> <h3>Examples</h3> <pre> ## Visualization: Assuming groups are defined as {1:1000}, {1001:2000}, {2001:3000} plot(xclara, cex = 3/4, col = rep(1:3, each=1000)) p.ID <- c(78, 1411, 2535) ## PAM's medoid indices == pam(xclara, 3)$id.med text(xclara[p.ID,], labels = 1:3, cex=2, col=1:3) px <- pam(xclara, 3) ## takes ~2 seconds cxcl <- px$clustering ; iCl <- split(seq_along(cxcl), cxcl) boxplot(iCl, range = 0.7, horizontal=TRUE, main = "Indices of the 3 clusters of pam(xclara, 3)") ## Look more closely now: bxCl <- boxplot(iCl, range = 0.7, plot=FALSE) ## We see 3 + 2 + 2 = 7 clear "outlier"s or "wrong group" observations: with(bxCl, rbind(out, group)) ## out 1038 1451 1610 30 327 562 770 ## group 1 1 1 2 2 3 3 ## Apart from these, what are the robust ranges of indices? -- Robust range: t(iR <- bxCl$stats[c(1,5),]) ## 1 900 ## 901 2050 ## 2051 3000 gc <- adjustcolor("gray20",1/2) abline(v = iR, col = gc, lty=3) axis(3, at = c(0, iR[2,]), padj = 1.2, col=gc, col.axis=gc) </pre> <hr /><div style="text-align: center;">[Package <em>cluster</em> version 2.0.8 <a href="00Index.html">Index</a>]</div> </body></html>