tuc*_*son 11 r data.table
我希望通过id和顺序(时间)来计算不同的东西.例如,用:
dt = data.table( id=c(1,1,1,2,2,2,3,3,3), hour=c(1,5,5,6,7,8,23,23,23), ip=c(1,1,45,2,2,2,3,1,1), target=c(1,0,0,1,1,1,1,1,0), day=c(1,1,1,1,1,1,3,2,1))
id hour ip target day
1: 1 1 1 1 1
2: 1 5 1 0 1
3: 1 5 45 0 1
4: 2 6 2 1 1
5: 2 7 2 1 1
6: 2 8 2 1 1
7: 3 23 3 1 3
8: 3 23 1 1 2
9: 3 23 1 0 1
Run Code Online (Sandbox Code Playgroud)
我希望来算,每个ID,活跃天数,和有效时间,到目前为止,对于每一行.这意味着我希望获得以下输出:
id hour ip target day nb_active_hours_so_far
1: 1 1 1 1 1 0 (first occurence of id when ordered by hour)
2: 1 5 1 0 1 1 (has been active in hour "1")
3: 1 5 45 0 1 2 (has been active in hour "1" and "5")
4: 2 6 2 1 1 0 (first occurence)
5: 2 7 2 1 1 1 (has been active in hour "6")
6: 2 8 2 1 1 2 (has been active in hour "6" and "7")
7: 3 23 3 1 3 0 (first occurence)
8: 3 23 1 1 2 1 (has been active in hour "23")
9: 3 23 1 0 1 1 (has been active in hour "23" only)
Run Code Online (Sandbox Code Playgroud)
要获得活动小时数,我会这样做:
dt[, nb_active_hours := length(unique(hour)), by=id]
Run Code Online (Sandbox Code Playgroud)
但是我想要到目前为止的部分.我不知道该怎么做...任何帮助将不胜感激.
这似乎有效(虽然没有在不同情况下测试)
dt[, nb_active_hours_so_far := cumsum(c(0:1, diff(hour[-.N]))>0), by = id]
# id hour ip target day temp nb_active_hours_so_far
# 1: 1 1 1 1 1 0 0
# 2: 1 5 1 0 1 1 1
# 3: 1 5 45 0 1 1 2
# 4: 2 6 2 1 1 0 0
# 5: 2 7 2 1 1 1 1
# 6: 2 8 2 1 1 2 2
# 7: 3 23 3 1 3 0 0
# 8: 3 23 1 1 2 0 1
# 9: 3 23 1 0 1 0 1
Run Code Online (Sandbox Code Playgroud)
Yerk.我有这个丑陋的解决方案:
library(data.table)
dt[ ,nb_active_hours_so_far:=c(0,head(cumsum(c(1,diff(hour)>0)), -1)),id][]
# id hour ip target day nb_active_hours_so_far
#1: 1 1 1 1 1 0
#2: 1 5 1 0 1 1
#3: 1 5 45 0 1 2
#4: 2 6 2 1 1 0
#5: 2 7 2 1 1 1
#6: 2 8 2 1 1 2
#7: 3 23 3 1 3 0
#8: 3 23 1 1 2 1
#9: 3 23 1 0 1 1
Run Code Online (Sandbox Code Playgroud)
或者您可以使用rleid/shiftdevel版本中的函数data.table,即v1.9.5.安装devel版本的说明是here.(感谢@Frank shift)
library(data.table)
dt[,nb_active_hours_so_far := shift(rleid(hour),fill=0L), id]
# id hour ip target day nb_active_hours_so_far
#1: 1 1 1 1 1 0
#2: 1 5 1 0 1 1
#3: 1 5 45 0 1 2
#4: 2 6 2 1 1 0
#5: 2 7 2 1 1 1
#6: 2 8 2 1 1 2
#7: 3 23 3 1 3 0
#8: 3 23 1 1 2 1
#9: 3 23 1 0 1 1
Run Code Online (Sandbox Code Playgroud)