在R中按组顺序匹配和计数值

rdo*_*arn 3 r sequence match

这是我的数据:

group <- c(1,1,1,1,2,2,2,3,3,4,4,4,4)
X1 <- c("A","A","A","A","B","A","B","A","A","B","B","B","B")
X2 <- c("A","A","A","A","B","B","B","A","A","B","B","A","A")
X3 <- c("B","A","A","A","B","B","B","B","B","B","B","B","B")
X4 <- c("A","A","A","B","B","B","A","A","A","B","A","B","B")
X5 <- c("A","A","A","A","B","B","B","A","A","A","B","B","B")
X6 <- c("A","A","A","A","B","A","B","A","A","B","B","A","A")
mydf <- data.frame (group, X1, X2, X3, X4, X5, X6)
Run Code Online (Sandbox Code Playgroud)

因此数据是:

 group X1 X2 X3 X4 X5 X6
1      1  A  A  B  A  A  A
2      1  A  A  A  A  A  A
3      1  A  A  A  A  A  A
4      1  A  A  A  B  A  A
5      2  B  B  B  B  B  B
6      2  A  B  B  B  B  A
7      2  B  B  B  A  B  B
8      3  A  A  B  A  A  A
9      3  A  A  B  A  A  A
10     4  B  B  B  B  A  B
11     4  B  B  B  A  B  B
12     4  B  A  B  B  B  A
13     4  B  A  B  B  B  A
Run Code Online (Sandbox Code Playgroud)

现在我需要将第一行与组中的其余行进行比较.

   group X1 X2 X3 X4 X5 X6
1      1  A  A  B  A  A  A
2      1  A  A  A  A  A  A
          TRUE TRUE FALSE TRUE TRUE TRUE
Run Code Online (Sandbox Code Playgroud)

这里的不匹配仅在X3处.1中6 = 1/6 = 17%

同样地,将3与第1组中的1st进行比较.

   group X1 X2 X3 X4 X5 X6
1      1  A  A  B  A  A  A
3      1  A  A  A  A  A  A
Run Code Online (Sandbox Code Playgroud)

不匹配= 17%

同样将第4组与第1组进行比较.

   group X1 X2 X3 X4 X5 X6
1      1  A  A  B  A  A  A
4      1  A  A  A  B  A  A
Run Code Online (Sandbox Code Playgroud)

不匹配= 2/6 = 34%

类似地,对于组2(组的第1行,即5组,6组)

     group X1 X2 X3 X4 X5 X6
5      2  B  B  B  B  B  B
6      2  A  B  B  B  B  A
Run Code Online (Sandbox Code Playgroud)

不匹配= 2/6 = 34%

同理:

         group X1 X2 X3 X4 X5 X6
    5      2  B  B  B  B  B  B
    7      2  B  B  B  A  B  B
Run Code Online (Sandbox Code Playgroud)

不匹配= 1/6 = 17%

我的试用版:

match (mydf[1,], mydf[2,])
match (mydf[1,], mydf[3,])
Run Code Online (Sandbox Code Playgroud)

flo*_*del 6

试试这个:

match_ratio <- function(x)
   cbind(x, match_ratio = rowMeans(mapply(`==`, x[1, -1], x[, -1])))
library(plyr)
ddply(mydf, "group", match_ratio)

#    group X1 X2 X3 X4 X5 X6 match_ratio
# 1      1  A  A  B  A  A  A   1.0000000
# 2      1  A  A  A  A  A  A   0.8333333
# 3      1  A  A  A  A  A  A   0.8333333
# 4      1  A  A  A  B  A  A   0.6666667
# 5      2  B  B  B  B  B  B   1.0000000
# 6      2  A  B  B  B  B  A   0.6666667
# 7      2  B  B  B  A  B  B   0.8333333
# 8      3  A  A  B  A  A  A   1.0000000
# 9      3  A  A  B  A  A  A   1.0000000
# 10     4  B  B  B  B  A  B   1.0000000
# 11     4  B  B  B  A  B  B   0.6666667
# 12     4  B  A  B  B  B  A   0.5000000
# 13     4  B  A  B  B  B  A   0.5000000
Run Code Online (Sandbox Code Playgroud)

  • 太好了!`ddply`很强大.我的解决方案更加原始. (2认同)