R：使用 dplyr 删除 data.frame 中的某些行

Question

R：使用 dplyr 删除 data.frame 中的某些行

dat <- data.frame(ID = c(1, 2, 2, 2), Gender = c("Both", "Both", "Male", "Female"))
> dat
  ID Gender
1  1   Both
2  2   Both
3  2   Male
4  2 Female

Run Code Online (Sandbox Code Playgroud)

对于每个 ID，如果 Gender 是Both, Male, and Female，我想删除带有Both. 也就是说，我想要的数据是这样的：

  ID Gender
1  1   Both
2  2   Male
3  2 Female

Run Code Online (Sandbox Code Playgroud)

我尝试使用下面的代码来做到这一点：

library(dplyr)
> dat %>% 
  group_by(ID) %>% 
  mutate(A = ifelse(length(unique(Gender)) >= 3 & Gender == 'Both', F, T)) %>% 
  filter(A) %>% 
  select(-A)

# A tibble: 2 x 2
# Groups:   ID [1]
     ID Gender
  <dbl> <fctr>
1     2   Male
2     2 Female

Run Code Online (Sandbox Code Playgroud)

我声明了一个名为的虚拟变量A，A = F如果对于给定的ID，的所有 3 个元素Gender都存在（“Both”、“Male”和“Female”；这些是Gender可以采用的不同值，没有其他值是可能的）并且对应的行有Gender == Both。然后我将删除该行。

但是，似乎我正在分配A = F第一行，即使它Gender只是“Both”，而不是“Both”、“Male”和“Female”？

Answer 1

akr*_*run 7

按 'ID' 分组后，创建一个逻辑条件，其中 'Gender' 不是 'Both' 并且distinct'Gender'中元素的长度为 3，即 'Male'、'Female'、'Both'（如 OP 在那里提到的没有其他值) 或 ( |) 如果元素数仅为 1

dat %>% 
  group_by(ID) %>% 
  filter((Gender != "Both" & n_distinct(Gender)==3)| n() ==1 )
# A tibble: 3 x 2
# Groups:   ID [2]
#    ID Gender
#  <dbl> <fct> 
#1     1 Both  
#2     2 Male  
#3     2 Female

Run Code Online (Sandbox Code Playgroud)

或者另一种选择是

dat %>%
   group_by(ID) %>% 
   filter(Gender %in% c("Male", "Female")| n() == 1)
# A tibble: 3 x 2
# Groups:   ID [2]
#     ID Gender
#  <dbl> <fct> 
#1     1 Both  
#2     2 Male  
#3     2 Female

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，4 月前
查看次数：	14577 次
最近记录：	7 年，4 月前