我有一个包含字母和数字的表格:
xx <- tibble (letter = c (rep ("a", 3), rep ("b", 3), rep ("c", 3)),
number = c (1, 2, 3, 1, 2, 3, 4, 5, 6))
Run Code Online (Sandbox Code Playgroud)
我想首先按“字母”对数据进行分组,然后检查数字列中是否有两个具有相同值的组。这些将是“字母”列中带有字母“a”和“b”的组。
结果看起来像这样
xx <- tibble (letter = c (rep ("a", 3), rep ("b", 3), rep ("c", 3)),
number = c (1, 2, 3, 1, 2, 3, 4, 5, 6),
duplicated = c (rep (TRUE, 6), rep (FALSE, 3)) )
Run Code Online (Sandbox Code Playgroud)
有没有办法在 dplyr 中优雅地做到这一点?
您可以尝试:
xx %>%
distinct() %>%
group_by(number) %>%
mutate(n = n()) %>%
mutate(duplicated = ifelse(n>1, TRUE, FALSE)) %>%
select(-n)
letter number duplicated
<chr> <dbl> <lgl>
1 a 1 TRUE
2 a 2 TRUE
3 a 3 TRUE
4 b 1 TRUE
5 b 2 TRUE
6 b 3 TRUE
7 c 4 FALSE
8 c 5 FALSE
9 c 6 FALSE
Run Code Online (Sandbox Code Playgroud)
distinct是因为如果组内有一些重复的内容,
xx <- tibble (letter = c (rep ("a", 4), rep ("b", 3), rep ("c", 3)),
number = c (1,1, 2, 3, 1, 2, 3, 4, 4, 6))
letter number
<chr> <dbl>
1 a 1
2 a 1
3 a 2
4 a 3
5 b 1
6 b 2
7 b 3
8 c 4
9 c 4
10 c 6
xxx <- xx %>%
distinct() %>%
group_by(number) %>%
mutate(n = n()) %>%
mutate(duplicated = ifelse(n>1, TRUE, FALSE)) %>%
select(-n)
xx %>%
full_join(xxx, by = c("letter", "number"))
letter number duplicated
<chr> <dbl> <lgl>
1 a 1 TRUE
2 a 1 TRUE
3 a 2 TRUE
4 a 3 TRUE
5 b 1 TRUE
6 b 2 TRUE
7 b 3 TRUE
8 c 4 FALSE
9 c 4 FALSE
10 c 6 FALSE
Run Code Online (Sandbox Code Playgroud)
xxx <- xx %>%
distinct() %>%
add_count(number) %>%
mutate(duplicated = n> 1) %>%
select(-n)
xxx
xx %>%
full_join(xxx, by = c("letter", "number"))
Run Code Online (Sandbox Code Playgroud)