Wer*_*ner 1 r distance similarity metric matching
给定矩阵
structure(list(X1 = c(1L, 2L, 3L, 4L, 2L, 5L), X2 = c(2L, 3L,
4L, 5L, 3L, 6L), X3 = c(3L, 4L, 4L, 5L, 3L, 2L), X4 = c(2L, 4L,
6L, 5L, 3L, 8L), X5 = c(1L, 3L, 2L, 4L, 6L, 4L)), .Names = c("X1",
"X2", "X3", "X4", "X5"), class = "data.frame", row.names = c(NA,
-6L))
Run Code Online (Sandbox Code Playgroud)
我想创建一个5 x 5距离矩阵,其中匹配比率和所有列之间的总行数.例如,X4和X3之间的距离应为0.5,假设两列匹配6次中的3次.
我尝试使用dist(test, method="simple matching")包"代理",但此方法仅适用于二进制数据.
使用outer(再次:-)
my.dist <- function(x) {
n <- nrow(x)
d <- outer(seq.int(ncol(x)), seq.int(ncol(x)),
Vectorize(function(i,j)sum(x[[i]] == x[[j]]) / n))
rownames(d) <- names(x)
colnames(d) <- names(x)
return(d)
}
my.dist(x)
# X1 X2 X3 X4 X5
# X1 1.0000000 0.0000000 0.0 0.0 0.3333333
# X2 0.0000000 1.0000000 0.5 0.5 0.1666667
# X3 0.0000000 0.5000000 1.0 0.5 0.0000000
# X4 0.0000000 0.5000000 0.5 1.0 0.0000000
# X5 0.3333333 0.1666667 0.0 0.0 1.0000000
Run Code Online (Sandbox Code Playgroud)