我有一个数据集,包括用户和顺序事件以及中间的非事件.
DT = data.table(user = c("1001","1001","1001","1001","1001","1001",
"1002","1002","1002","1002"),
event = c(NA,"e1",NA,NA,NA,"e2",
"e1",NA,NA,"e2"))
Run Code Online (Sandbox Code Playgroud)
我希望能够在用户组发生事件之前计算行(非事件).预期结果:
user event rows.before.event
1: 1001 NA NA
2: 1001 e1 1
3: 1001 NA NA
4: 1001 NA NA
5: 1001 NA NA
6: 1001 e2 3
7: 1002 e1 0
8: 1002 NA NA
9: 1002 NA NA
10: 1002 e2 2
Run Code Online (Sandbox Code Playgroud)
试过rleid()但没有成功.欢迎任何建议.
DT[, count := .N-1, by = .(user, rev(cumsum(rev(!is.na(event)))))][
is.na(event), count := NA]
# user event count
# 1: 1001 NA NA
# 2: 1001 e1 1
# 3: 1001 NA NA
# 4: 1001 NA NA
# 5: 1001 NA NA
# 6: 1001 e2 3
# 7: 1002 e1 0
# 8: 1002 NA NA
# 9: 1002 NA NA
#10: 1002 e2 2
Run Code Online (Sandbox Code Playgroud)
与rleid和的解决方案shift:
DT[, before := .N, by = .(user, rleid(is.na(event)))
][, before := shift(before, fill = 0), by = user
][is.na(event), before := NA][]
Run Code Online (Sandbox Code Playgroud)
这使:
user event before
1: 1001 NA NA
2: 1001 e1 1
3: 1001 NA NA
4: 1001 NA NA
5: 1001 NA NA
6: 1001 e2 3
7: 1002 e1 0
8: 1002 NA NA
9: 1002 NA NA
10: 1002 e2 2
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
145 次 |
| 最近记录: |