假设我有这种数据框:
day value group type id
1 1 0.1 A X 1
2 1 0.4 A Y 1
3 2 0.2 A X 3
4 2 0.5 A Y 3
5 3 0.3 A X 5
6 3 0.2 A Y 6
7 1 0.1 B X 3
8 1 0.3 B Y 3
9 2 0.1 B X 11
10 2 0.4 B Y 10
11 3 0.2 B X 12
12 3 0.3 B Y 12
13 1 0.1 C X 12
14 1 0.3 C Y 12
15 2 0.3 C X 5
16 2 0.2 C Y 5
17 3 0.2 C X 3
18 3 0.2 C Y 2
Run Code Online (Sandbox Code Playgroud)
数据:
library(dplyr)
df1 <- data.frame(
day = rep(1:3,6),
value = c(0.1,0.2,0.3,0.4,0.5,0.2,0.1,0.1,0.2,0.3,0.4,0.3, 0.1,0.3,0.2,0.3,0.2,0.2),
group = rep(LETTERS[1:3], each=6)
) %>%
arrange(group,day) %>%
mutate(type=rep(LETTERS[24:25],9),
id = c(1,1,3,3,5,6,3,3,11,10,12,12,12,12,5,5,3,2))
df1
Run Code Online (Sandbox Code Playgroud)
我想根据条件过滤器过滤此数据框。我想group_by(day, group),如果id每个分组中的所有行都相等,我想filter删除 Y 类型的所有行,但保留 X 类型的行。
我可以通过运行循环或通过数据帧子集的几个步骤来做到这一点,但我想知道是否有一个/两个衬里dplyr或者data.table我以某种方式忽略了。
这将是所需的输出:
day value group type id
1 1 0.1 A X 1
3 2 0.2 A X 3
5 3 0.3 A X 5
6 3 0.2 A Y 6
7 1 0.1 B X 3
9 2 0.1 B X 11
10 2 0.4 B Y 10
11 3 0.2 B X 12
13 1 0.1 C X 12
15 2 0.3 C X 5
17 3 0.2 C X 3
18 3 0.2 C Y 2
Run Code Online (Sandbox Code Playgroud)
这是一个带有 的单行线data.table。
我们将 'data.frame' 转换为 'data.table' ( setDT(df1)),按 'day'、'group' 分组, 'id'的if元素为 1,我们得到 Data.table ( ) 行的子集,其中 ' type' 是 'X' 或获取.lengthunique.SDelse.SD
library(data.table)#v1.9.6+
setDT(df1)[, if(uniqueN(id)==1) .SD[type=='X'] else .SD, .(day, group)]
# day group value type id
# 1: 1 A 0.1 X 1
# 2: 2 A 0.2 X 3
# 3: 3 A 0.3 X 5
# 4: 3 A 0.2 Y 6
# 5: 1 B 0.1 X 3
# 6: 2 B 0.1 X 11
# 7: 2 B 0.4 Y 10
# 8: 3 B 0.2 X 12
# 9: 1 C 0.1 X 12
#10: 2 C 0.3 X 5
#11: 3 C 0.2 X 3
#12: 3 C 0.2 Y 2
Run Code Online (Sandbox Code Playgroud)
或者,如果“类型”已按示例数据中所示进行排序
unique(setDT(df1), by = c('day', 'group', 'id'))
Run Code Online (Sandbox Code Playgroud)
如果没有订购,
unique(setDT(df1)[order(group,day, id, type)],by = c('day', 'group' , 'id'))
Run Code Online (Sandbox Code Playgroud)
df1 <- structure(list(day = c(1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L,
2L, 2L,
3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), value = c(0.1, 0.4, 0.2, 0.5,
0.3, 0.2, 0.1, 0.3, 0.1, 0.4, 0.2, 0.3, 0.1, 0.3, 0.3, 0.2, 0.2,
0.2), group = c("A", "A", "A", "A", "A", "A", "B", "B", "B",
"B", "B", "B", "C", "C", "C", "C", "C", "C"), type = c("X", "Y",
"X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", "Y", "X",
"Y", "X", "Y"), id = c(1L, 1L, 3L, 3L, 5L, 6L, 3L, 3L, 11L, 10L,
12L, 12L, 12L, 12L, 5L, 5L, 3L, 2L)), .Names = c("day", "value",
"group", "type", "id"), class = "data.frame",
row.names = c(NA, -18L))
Run Code Online (Sandbox Code Playgroud)