我有以下 df:
df <- data.table(user = c('a', 'a', 'a', 'b', 'b')
, spend = 1:5
, shift_by = c(1,1,2,1,1)
); df
user spend shift_by
1: a 1 1
2: a 2 1
3: a 3 2
4: b 4 1
5: b 5 1
Run Code Online (Sandbox Code Playgroud)
我希望仅在此时创建一个超前滞后列,函数n中的参数是动态的并作为输入。我的预期结果是:data.tableshiftdf$shiftby
df[, spend_shifted := c(NA, 1, 1, NA, 4)]; df
user spend shift_by spend_shifted
1: a 1 1 NA
2: a 2 1 1
3: a 3 2 1
4: b 4 1 NA
5: b 5 1 4
Run Code Online (Sandbox Code Playgroud)
然而,通过以下尝试,它给出了:
df[, spend_shifted := shift(x=spend, n=shift_by, type="lag"), user]; df
user spend shift_by spend_shifted
1: a 1 1 NA
2: a 2 1 NA
3: a 3 2 NA
4: b 4 1 NA
5: b 5 1 NA
Run Code Online (Sandbox Code Playgroud)
这是我能找到的最接近的例子。但是,我需要一个分组依据,并且data.table由于速度原因正在寻求解决方案。真正期待找到任何想法。
我相信这会起作用。之后您可以删除 newindex-column。
df[, newindex := rowid(user) - shift_by]
df[newindex < 0, newindex := 0]
df[newindex > 0, spend_shifted := df[, spend[newindex], by = .(user)]$V1]
# user spend shift_by newindex spend_shifted
# 1: a 1 1 0 NA
# 2: a 2 1 1 1
# 3: a 3 2 1 1
# 4: b 4 1 0 NA
# 5: b 5 1 1 4
Run Code Online (Sandbox Code Playgroud)
也许这可以帮助
> df[, spend_shifted := spend[replace(seq(.N) - shift_by, seq(.N) <= shift_by, NA)], user][]
user spend shift_by spend_shifted
1: a 1 1 NA
2: a 2 1 1
3: a 3 2 1
4: b 4 1 NA
5: b 5 1 4
Run Code Online (Sandbox Code Playgroud)