如何根据特定序列中出现的值对列进行变异?

Sil*_*iss 5 r data-manipulation mutate

我有一个数据框df

  df <- data.frame(ID = c(1,1,1,2,2,2,3,3,3,4,4,4,4),  process = c("inspection", "evaluation", "result","inspection", "result", "evaluation", "result", "inspection","result","evaluation","result","result","evaluation"))
Run Code Online (Sandbox Code Playgroud)

我需要插入一列true_process,如果evaluation出现在result特定的之前ID,那么它就是true。如果它出现在后面或丢失,它应该取值false

我试过的代码。

library(dplyr)
df %>% 
    group_by(ID) %>% 
    mutate(true_process = case_when(
        !any(process == "evaluation") ~ "False",
        length(process == "evaluation")[[1]] > length(process == "result")[[1]] ~ "False",
        TRUE ~ "True"
    )) 
# A tibble: 13 x 3
# Groups:   ID [4]
      ID process    true_process
   <dbl> <fct>      <chr>       
 1     1 inspection True        
 2     1 evaluation True        
 3     1 result     True        
 4     2 inspection True        
 5     2 result     True        
 6     2 evaluation True        
 7     3 result     False       
 8     3 inspection False       
 9     3 result     False       
10     4 evaluation True        
11     4 result     True        
12     4 result     True        
13     4 evaluation True 
Run Code Online (Sandbox Code Playgroud)

预期输出如下

# A tibble: 13 x 3
# Groups:   ID [4]
      ID process    true_process
   <dbl> <fct>      <lgl>       
 1     1 inspection TRUE        
 2     1 evaluation TRUE        
 3     1 result     TRUE        
 4     2 inspection FALSE       
 5     2 result     FALSE       
 6     2 evaluation FALSE       
 7     3 result     FALSE       
 8     3 inspection FALSE       
 9     3 result     FALSE       
10     4 evaluation FALSE       
11     4 result     FALSE       
12     4 result     FALSE       
13     4 evaluation FALSE    
Run Code Online (Sandbox Code Playgroud)

H 1*_*H 1 3

根据更新的数据,您可以检查 的最后一个实例的索引是否evaluation小于 的任何索引result

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(true_process = any(tail(which(process == "evaluation"), 1) < which(process == "result")))


# A tibble: 13 x 3
# Groups:   ID [4]
      ID process    true_process
   <dbl> <chr>      <lgl>       
 1     1 inspection TRUE        
 2     1 evaluation TRUE        
 3     1 result     TRUE        
 4     2 inspection FALSE       
 5     2 result     FALSE       
 6     2 evaluation FALSE       
 7     3 result     FALSE       
 8     3 inspection FALSE       
 9     3 result     FALSE       
10     4 evaluation FALSE       
11     4 result     FALSE       
12     4 result     FALSE       
13     4 evaluation FALSE
Run Code Online (Sandbox Code Playgroud)

  • 这确实适用于测试数据,但如果您尝试使用“df[-2,]”,那么只有一个检查,然后是没有评估的结果,那么您会得到 TRUE,我认为基于文字描述。当然,我不确定这是否会发生。也许只需添加一个 `any(process == "evaluation")` (3认同)