gib*_*z00 4 r zoo dplyr data.table
这就是我的数据框架的样子.最右边(第4列)是我想要的列.对于一个给定的名字,我试图从7天前得出那个人的得分.如果7天前没有确切的日期存在,那么我希望与最接近的日期相关联的分数(行的日期 - 7天).
library(data.table)
dt <- fread('
Name Score Date ScoreAround7DaysAgo
John 9 2016-01-01 NA
John 6 2016-01-10 9
John 3 2016-01-17 6
John 5 2016-01-18 6
Tom 9 2016-01-01 NA
Tom 6 2016-01-10 9
Tom 3 2016-01-17 6
Tom 5 2016-01-18 6
')
dt[, Date := as.IDate(Date)]
Run Code Online (Sandbox Code Playgroud)
我试着dt[dt,roll=7+nearest]无济于事.谢谢您的帮助.
这有效:
dt[, DateLag := Date - 7L ]
w = dt[dt, which = TRUE, on = c("Name", Date = "DateLag"), roll = "nearest"]
dt[ , `:=`(ScoreLag = Score[replace(w, w == .I, NA_integer_)], DateLag = NULL)]
Name Score Date ScoreAround7DaysAgo ScoreLag
1: John 9 2016-01-01 NA NA
2: John 6 2016-01-10 9 9
3: John 3 2016-01-17 6 6
4: John 5 2016-01-18 6 6
5: Tom 9 2016-01-01 NA NA
6: Tom 6 2016-01-10 9 9
7: Tom 3 2016-01-17 6 6
8: Tom 5 2016-01-18 6 6
Run Code Online (Sandbox Code Playgroud)
它找到最近的日期Date-7,但如果它Date再次相同则丢弃它.
dt[, val := .SD[.(Name = Name, Date = Date - 7), on = c('Name', 'Date'), roll = 'nearest',
c(NA, tail(Score, -1)), by = Name]$V1]
dt
# Name Score Date ScoreAround7DaysAgo val
#1: John 9 2016-01-01 NA NA
#2: John 6 2016-01-10 9 9
#3: John 3 2016-01-17 6 6
#4: John 5 2016-01-18 6 6
#5: Tom 9 2016-01-01 NA NA
#6: Tom 6 2016-01-10 9 9
#7: Tom 3 2016-01-17 6 6
#8: Tom 5 2016-01-18 6 6
Run Code Online (Sandbox Code Playgroud)