如果这是我的数据
Number Group Length
4432 1 NA
4432 2 2.34
4564 1 5.89
4389 1 NA
6578 2 3.12
4389 2 NA
4355 1 4.11
4355 2 6.15
4689 1 6.22
4689 1 NA
Run Code Online (Sandbox Code Playgroud)
我试图找到Numbers仅在Group1或Group2中的Ship和Numbers在Group1和Group2中的Ship 。
Number Group Length Results
4432 1 NA Both 1 &2
4432 2 2.34 Both 1 &2
4564 1 5.89 1
4389 1 NA 1
6578 2 3.12 2
4389 2 NA 2
4355 1 4.11 Both 1 & 2
4355 2 6.15 Both 1 & 2
4689 1 6.22 1
4689 1 NA 1
Run Code Online (Sandbox Code Playgroud)
我可以使用for循环和子集进行此操作,我对dplyr或其他创建Results列的方法感兴趣。任何帮助表示赞赏。谢谢。
我们可以用来n_distinct检查唯一的“组”的数量,并将unique“组” 粘贴为前缀“两个”
library(stringr)
library(dplyr)
library(data.table)
df1 %>%
group_by(grp = rleid(Number)) %>%
mutate(Results = case_when(n_distinct(Group) >1 ~
str_c("Both ", str_c(unique(Group), collapse=" & ")),
TRUE ~ as.character(unique(Group)))) %>%
ungroup %>%
select(-grp)
# A tibble: 10 x 4
# Number Group Length Results
# <int> <int> <dbl> <chr>
# 1 4432 1 NA Both 1 & 2
# 2 4432 2 2.34 Both 1 & 2
# 3 4564 1 5.89 1
# 4 4389 1 NA 1
# 5 6578 2 3.12 2
# 6 4389 2 NA 2
# 7 4355 1 4.11 Both 1 & 2
# 8 4355 2 6.15 Both 1 & 2
# 9 4689 1 6.22 1
#10 4689 1 NA 1
Run Code Online (Sandbox Code Playgroud)
如果不需要“两者”
df1 %>%
group_by(grp = rleid(Number)) %>%
mutate(Results = str_c(unique(Group), collapse=" & ")) %>%
ungroup %>%
select(-grp)
Run Code Online (Sandbox Code Playgroud)
df1 <- structure(list(Number = c(4432L, 4432L, 4564L, 4389L, 6578L,
4389L, 4355L, 4355L, 4689L, 4689L), Group = c(1L, 2L, 1L, 1L,
2L, 2L, 1L, 2L, 1L, 1L), Length = c(NA, 2.34, 5.89, NA, 3.12,
NA, 4.11, 6.15, 6.22, NA)), class = "data.frame", row.names = c(NA,
-10L))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
44 次 |
| 最近记录: |