如何在条件发生后过滤掉每组的行数

Bob*_*itz 6 r dplyr

我是R编程的新手,并且在满足过滤条件后尝试删除每组行中的某些行.

场景:对于每个GROUP,如果2个TYPE"B"在一行中,则删除该GROUP的所有以​​下行."Include in DataSet"列显示输出应该是什么.

这是我的示例输入:

GROUP   TYPE    Include in DataSet?
--------------------------------------------
1       A       yes
1       A       yes
1       B       yes
1       B       yes
1       B       no
2       A       yes
2       B       yes
2       B       yes
2       A       no
2       B       no
2       B       no

DF = structure(list(GROUP = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L), TYPE = c("A", "A", "B", "B", "B", "A", "B", "B", "A", 
"B", "B"), inc = c("yes", "yes", "yes", "yes", "no", "yes", "yes", 
"yes", "no", "no", "no")), .Names = c("GROUP", "TYPE", "inc"), row.names = c(NA, 
-11L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

预期产出:

GROUP   TYPE    Include in DataSet?
--------------------------------------------
1       A       yes
1       A       yes
1       B       yes
1       B       yes
2       A       yes
2       B       yes
2       B       yes
Run Code Online (Sandbox Code Playgroud)

我试过写一些代码,由于分组问题没有运气.

i=1
j=2
x <- allrows
for (i in x){
  for(j in x){
    if(i==j){
      a$REMOVE=1
    }
    else{
      a$REMOVE=2
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

div*_*san 8

您可以通过创建标识"双B"行的新变量,然后在组中第一个"双B"行之后过滤掉行来完成此操作:

library(dplyr)
df %>%
    group_by(GROUP) %>%
    # Create new variable that tests if each row and the one below it TYPE==B
    mutate(double_B = (TYPE == 'B' & lag(TYPE) == 'B')) %>%
    # Find the first row with `double_B` in each group, filter out rows after it
    filter(row_number() <= min(which(double_B == TRUE))) %>%
    # Optionally, remove `double_B` column when done with it
    select(-double_B)

# A tibble: 7 x 3
# Groups:   GROUP [2]
  GROUP TYPE  IncludeinDataSet
  <int> <chr> <chr>           
1     1 A     yes             
2     1 A     yes             
3     1 B     yes             
4     1 B     yes             
5     2 A     yes             
6     2 B     yes             
7     2 B     yes       
Run Code Online (Sandbox Code Playgroud)

正如@Frank在注释中指出的那样,您不需要创建double_B变量:您可以在以下which语句中测试"double B"条件filter:

df %>%
    group_by(GROUP) %>%
    # Find the first row with `double_B` in each group, filter out rows after it
    filter(row_number() <= min(which(TYPE == 'B' & lag(TYPE) == 'B')))
Run Code Online (Sandbox Code Playgroud)

此外,如果在组中未找到"双B"条件,它将返回警告,但仍将正确过滤