如何检查分组列中的所有值是否相同？

Question

如何检查分组列中的所有值是否相同？

例如，我有以下 df：

   id category yes
1   1       in   1
2   1       in   1
3   1       in   1
4   1       in   1
5   1       in   1
6   1      out   1
7   1      out   1
8   1      out   1
9   2       in   1
10  2       in   1
11  2      out   0
12  2      out   1
13  2      out   1
14  3       in   1
15  3       in   1
16  3       in   0
17  3      out   1
18  3      out   1
19  4       in   1
20  4       in   1
21  4       in   1
22  4      out   1
23  4      out   0

Run Code Online (Sandbox Code Playgroud)

我想做这样的事情：

df <- df %>%
  group_by(id, category) %>%
  mutate(
    out = ifelse(# id, category, and yes have the same values in each row within the group)
  )

Run Code Online (Sandbox Code Playgroud)

所以预期的输出将如下所示：

   id category yes same
1   1       in   1    1
2   1       in   1    1
3   1       in   1    1
4   1       in   1    1
5   1       in   1    1
6   1      out   1    1
7   1      out   1    1
8   1      out   1    1
9   2       in   1    1
10  2       in   1    1
11  2      out   0    0
12  2      out   1    0
13  2      out   1    0
14  3       in   1    0
15  3       in   1    0
16  3       in   0    0
17  3      out   1    1
18  3      out   1    1
19  4       in   1    1
20  4       in   1    1
21  4       in   1    1
22  4      out   1    0
23  4      out   0    0

Run Code Online (Sandbox Code Playgroud)

第 11-13 行具有相同的“id”和“category”，但“yes”列具有不同的值。因此，“相同”列应标记为 0（因为它们不同）。与第 14-16 行和第 22-23 行相同。

这是 df 的可重现代码：

structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), category = c("in", 
"in", "in", "in", "in", "out", "out", "out", "in", "in", "out", 
"out", "out", "in", "in", "in", "out", "out", "in", "in", "in", 
"out", "out"), yes = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L)), class = "data.frame", row.names = c(NA, -23L))

Run Code Online (Sandbox Code Playgroud)

任何指导将不胜感激！

Answer 1

akr*_*run 5

我们可以用来n_distinct检查组中唯一元素的频率，转换为逻辑 ( == 1)，然后使用as.integer或转换为二进制+

library(dplyr)
df %>%
  group_by(id, category) %>% 
  mutate(same = +(n_distinct(yes) == 1)) %>% 
  ungroup

Run Code Online (Sandbox Code Playgroud)

或者使用data.table

library(data.table)
setDT(df)[, same := +(uniqueN(yes) == 1), by = .(id, category)]

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，3 月前
查看次数：	2604 次
最近记录：	3 年，8 月前