EVOLUTION-MANAGER
Edit File: dist.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Distance Matrix Computation</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="stylesheet" type="text/css" href="R.css" /> </head><body> <table width="100%" summary="page for dist {stats}"><tr><td>dist {stats}</td><td style="text-align: right;">R Documentation</td></tr></table> <h2>Distance Matrix Computation</h2> <h3>Description</h3> <p>This function computes and returns the distance matrix computed by using the specified distance measure to compute the distances between the rows of a data matrix. </p> <h3>Usage</h3> <pre> dist(x, method = "euclidean", diag = FALSE, upper = FALSE, p = 2) as.dist(m, diag = FALSE, upper = FALSE) ## Default S3 method: as.dist(m, diag = FALSE, upper = FALSE) ## S3 method for class 'dist' print(x, diag = NULL, upper = NULL, digits = getOption("digits"), justify = "none", right = TRUE, ...) ## S3 method for class 'dist' as.matrix(x, ...) </pre> <h3>Arguments</h3> <table summary="R argblock"> <tr valign="top"><td><code>x</code></td> <td> <p>a numeric matrix, data frame or <code>"dist"</code> object.</p> </td></tr> <tr valign="top"><td><code>method</code></td> <td> <p>the distance measure to be used. This must be one of <code>"euclidean"</code>, <code>"maximum"</code>, <code>"manhattan"</code>, <code>"canberra"</code>, <code>"binary"</code> or <code>"minkowski"</code>. Any unambiguous substring can be given.</p> </td></tr> <tr valign="top"><td><code>diag</code></td> <td> <p>logical value indicating whether the diagonal of the distance matrix should be printed by <code>print.dist</code>.</p> </td></tr> <tr valign="top"><td><code>upper</code></td> <td> <p>logical value indicating whether the upper triangle of the distance matrix should be printed by <code>print.dist</code>.</p> </td></tr> <tr valign="top"><td><code>p</code></td> <td> <p>The power of the Minkowski distance.</p> </td></tr> <tr valign="top"><td><code>m</code></td> <td> <p>An object with distance information to be converted to a <code>"dist"</code> object. For the default method, a <code>"dist"</code> object, or a matrix (of distances) or an object which can be coerced to such a matrix using <code><a href="../../base/html/matrix.html">as.matrix</a>()</code>. (Only the lower triangle of the matrix is used, the rest is ignored).</p> </td></tr> <tr valign="top"><td><code>digits, justify</code></td> <td> <p>passed to <code><a href="../../base/html/format.html">format</a></code> inside of <code>print()</code>.</p> </td></tr> <tr valign="top"><td><code>right, ...</code></td> <td> <p>further arguments, passed to other methods.</p> </td></tr> </table> <h3>Details</h3> <p>Available distance measures are (written for two vectors <i>x</i> and <i>y</i>): </p> <dl> <dt><code>euclidean</code>:</dt><dd><p>Usual distance between the two vectors (2 norm aka <i>L_2</i>), <i>sqrt(sum((x_i - y_i)^2))</i>.</p> </dd> <dt><code>maximum</code>:</dt><dd><p>Maximum distance between two components of <i>x</i> and <i>y</i> (supremum norm)</p> </dd> <dt><code>manhattan</code>:</dt><dd><p>Absolute distance between the two vectors (1 norm aka <i>L_1</i>).</p> </dd> <dt><code>canberra</code>:</dt><dd> <p><i>sum(|x_i - y_i| / (|x_i| + |y_i|))</i>. Terms with zero numerator and denominator are omitted from the sum and treated as if the values were missing. </p> <p>This is intended for non-negative values (e.g., counts), in which case the denominator can be written in various equivalent ways; Originally, <span style="font-family: Courier New, Courier; color: #666666;"><b>R</b></span> used <i>x_i + y_i</i>, then from 1998 to 2017, <i>|x_i + y_i|</i>, and then the correct <i>|x_i| + |y_i|</i>. </p> </dd> <dt><code>binary</code>:</dt><dd><p>(aka <em>asymmetric binary</em>): The vectors are regarded as binary bits, so non-zero elements are ‘on’ and zero elements are ‘off’. The distance is the <em>proportion</em> of bits in which only one is on amongst those in which at least one is on.</p> </dd> <dt><code>minkowski</code>:</dt><dd><p>The <i>p</i> norm, the <i>p</i>th root of the sum of the <i>p</i>th powers of the differences of the components.</p> </dd> </dl> <p>Missing values are allowed, and are excluded from all computations involving the rows within which they occur. Further, when <code>Inf</code> values are involved, all pairs of values are excluded when their contribution to the distance gave <code>NaN</code> or <code>NA</code>. If some columns are excluded in calculating a Euclidean, Manhattan, Canberra or Minkowski distance, the sum is scaled up proportionally to the number of columns used. If all pairs are excluded when calculating a particular distance, the value is <code>NA</code>. </p> <p>The <code>"dist"</code> method of <code>as.matrix()</code> and <code>as.dist()</code> can be used for conversion between objects of class <code>"dist"</code> and conventional distance matrices. </p> <p><code>as.dist()</code> is a generic function. Its default method handles objects inheriting from class <code>"dist"</code>, or coercible to matrices using <code><a href="../../base/html/matrix.html">as.matrix</a>()</code>. Support for classes representing distances (also known as dissimilarities) can be added by providing an <code><a href="../../base/html/matrix.html">as.matrix</a>()</code> or, more directly, an <code>as.dist</code> method for such a class. </p> <h3>Value</h3> <p><code>dist</code> returns an object of class <code>"dist"</code>. </p> <p>The lower triangle of the distance matrix stored by columns in a vector, say <code>do</code>. If <code>n</code> is the number of observations, i.e., <code>n <- attr(do, "Size")</code>, then for <i>i < j ≤ n</i>, the dissimilarity between (row) i and j is <code>do[n*(i-1) - i*(i-1)/2 + j-i]</code>. The length of the vector is <i>n*(n-1)/2</i>, i.e., of order <i>n^2</i>. </p> <p>The object has the following attributes (besides <code>"class"</code> equal to <code>"dist"</code>): </p> <table summary="R valueblock"> <tr valign="top"><td><code>Size</code></td> <td> <p>integer, the number of observations in the dataset.</p> </td></tr> <tr valign="top"><td><code>Labels</code></td> <td> <p>optionally, contains the labels, if any, of the observations of the dataset.</p> </td></tr> <tr valign="top"><td><code>Diag, Upper</code></td> <td> <p>logicals corresponding to the arguments <code>diag</code> and <code>upper</code> above, specifying how the object should be printed.</p> </td></tr> <tr valign="top"><td><code>call</code></td> <td> <p>optionally, the <code><a href="../../base/html/call.html">call</a></code> used to create the object.</p> </td></tr> <tr valign="top"><td><code>method</code></td> <td> <p>optionally, the distance method used; resulting from <code><a href="dist.html">dist</a>()</code>, the (<code><a href="../../base/html/match.arg.html">match.arg</a>()</code>ed) <code>method</code> argument.</p> </td></tr> </table> <h3>References</h3> <p>Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) <em>The New S Language</em>. Wadsworth & Brooks/Cole. </p> <p>Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979) <em>Multivariate Analysis.</em> Academic Press. </p> <p>Borg, I. and Groenen, P. (1997) <em>Modern Multidimensional Scaling. Theory and Applications.</em> Springer. </p> <h3>See Also</h3> <p><code><a href="../../cluster/html/daisy.html">daisy</a></code> in the <a href="https://CRAN.R-project.org/package=cluster"><span class="pkg">cluster</span></a> package with more possibilities in the case of <em>mixed</em> (continuous / categorical) variables. <code><a href="hclust.html">hclust</a></code>. </p> <h3>Examples</h3> <pre> require(graphics) x <- matrix(rnorm(100), nrow = 5) dist(x) dist(x, diag = TRUE) dist(x, upper = TRUE) m <- as.matrix(dist(x)) d <- as.dist(m) stopifnot(d == dist(x)) ## Use correlations between variables "as distance" dd <- as.dist((1 - cor(USJudgeRatings))/2) round(1000 * dd) # (prints more nicely) plot(hclust(dd)) # to see a dendrogram of clustered variables ## example of binary and canberra distances. x <- c(0, 0, 1, 1, 1, 1) y <- c(1, 0, 1, 1, 0, 1) dist(rbind(x, y), method = "binary") ## answer 0.4 = 2/5 dist(rbind(x, y), method = "canberra") ## answer 2 * (6/5) ## To find the names labels(eurodist) ## Examples involving "Inf" : ## 1) x[6] <- Inf (m2 <- rbind(x, y)) dist(m2, method = "binary") # warning, answer 0.5 = 2/4 ## These all give "Inf": stopifnot(Inf == dist(m2, method = "euclidean"), Inf == dist(m2, method = "maximum"), Inf == dist(m2, method = "manhattan")) ## "Inf" is same as very large number: x1 <- x; x1[6] <- 1e100 stopifnot(dist(cbind(x, y), method = "canberra") == print(dist(cbind(x1, y), method = "canberra"))) ## 2) y[6] <- Inf #-> 6-th pair is excluded dist(rbind(x, y), method = "binary" ) # warning; 0.5 dist(rbind(x, y), method = "canberra" ) # 3 dist(rbind(x, y), method = "maximum") # 1 dist(rbind(x, y), method = "manhattan") # 2.4 </pre> <hr /><div style="text-align: center;">[Package <em>stats</em> version 3.6.0 <a href="00Index.html">Index</a>]</div> </body></html>