我是R编程的新手,并且在满足过滤条件后尝试删除每组行中的某些行.
场景:对于每个GROUP,如果2个TYPE"B"在一行中,则删除该GROUP的所有以下行."Include in DataSet"列显示输出应该是什么.
这是我的示例输入:
GROUP TYPE Include in DataSet?
--------------------------------------------
1 A yes
1 A yes
1 B yes
1 B yes
1 B no
2 A yes
2 B yes
2 B yes
2 A no
2 B no
2 B no
DF = structure(list(GROUP = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L), TYPE = c("A", "A", "B", "B", "B", "A", "B", "B", "A",
"B", "B"), inc = c("yes", "yes", "yes", "yes", "no", "yes", "yes",
"yes", "no", "no", "no")), .Names = c("GROUP", "TYPE", "inc"), row.names = c(NA,
-11L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
预期产出:
GROUP TYPE Include in DataSet?
--------------------------------------------
1 A yes
1 A yes
1 B yes
1 B yes
2 A yes
2 B yes
2 B yes
Run Code Online (Sandbox Code Playgroud)
我试过写一些代码,由于分组问题没有运气.
i=1
j=2
x <- allrows
for (i in x){
for(j in x){
if(i==j){
a$REMOVE=1
}
else{
a$REMOVE=2
}
}
}
Run Code Online (Sandbox Code Playgroud)
您可以通过创建标识"双B"行的新变量,然后在组中第一个"双B"行之后过滤掉行来完成此操作:
library(dplyr)
df %>%
group_by(GROUP) %>%
# Create new variable that tests if each row and the one below it TYPE==B
mutate(double_B = (TYPE == 'B' & lag(TYPE) == 'B')) %>%
# Find the first row with `double_B` in each group, filter out rows after it
filter(row_number() <= min(which(double_B == TRUE))) %>%
# Optionally, remove `double_B` column when done with it
select(-double_B)
# A tibble: 7 x 3
# Groups: GROUP [2]
GROUP TYPE IncludeinDataSet
<int> <chr> <chr>
1 1 A yes
2 1 A yes
3 1 B yes
4 1 B yes
5 2 A yes
6 2 B yes
7 2 B yes
Run Code Online (Sandbox Code Playgroud)
正如@Frank在注释中指出的那样,您不需要创建double_B变量:您可以在以下which语句中测试"double B"条件filter:
df %>%
group_by(GROUP) %>%
# Find the first row with `double_B` in each group, filter out rows after it
filter(row_number() <= min(which(TYPE == 'B' & lag(TYPE) == 'B')))
Run Code Online (Sandbox Code Playgroud)
此外,如果在组中未找到"双B"条件,它将返回警告,但仍将正确过滤