根据组ID匹配值

sal*_*hin 5 r match

假设我有以下数据框(实际的数据集代表非常大的数据集)

df<- structure(list(x = c(1, 1, 1, 2, 2, 3, 3, 3), y = structure(c(1L, 
6L, NA, 2L, 4L, 3L, 7L, 5L), .Label = c("all", "fall", "hello", 
"hi", "me", "non", "you"), class = "factor"), z = structure(c(5L, 
NA, 4L, 2L, 1L, 6L, 3L, 4L), .Label = c("fall", "hi", "me", "mom", 
"non", "you"), class = "factor")), .Names = c("x", "y", "z"), row.names = c(NA, 
-8L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

看起来像

>df
  x     y    z
1 1   all  non
2 1   non <NA>
3 1  <NA>  mom
4 2  fall   hi
5 2    hi fall
6 3 hello  you
7 3   you   me
8 3    me  mom
Run Code Online (Sandbox Code Playgroud)

我想要做的是计算每组x(1,2或3)中匹配值的数量.例如,组号1具有一个匹配值,即"non"(NA应该被忽略).所需的输出如下:

  x    n
1 1    1
2 2    2
3 3    2
Run Code Online (Sandbox Code Playgroud)

试图以这样做的方式思考,而不是for-loop因为我有一个大型数据集,但无法找到我的方法.

jer*_*ycg 5

使用dplyr:

library(dplyr)

df %>% group_by(x) %>%
       summarise(n = sum(y %in% na.omit(z)))
Run Code Online (Sandbox Code Playgroud)

  • @AhmedSalhin按顺序加载`plyr`然后`dplyr`,或明确使用`summarise`和`dplyr :: summarise` (2认同)