我有一个 data.table,其中有一列:
c(58,NA,NA,NA,NA,13,NA,NA,NA,12,23,NA,12)
我想通过向后携带下一个非 NA 值,仅填充列中每个非 NA 值之前的两个 NA。结果应该是:
c(58,NA,NA,13,13,13,NA,12,12,12,23,12,12)
我已经成功做到了:
dt = data.table(V1 = c(58,NA,NA,NA,NA,13,NA,NA,NA,12,23,NA,12))
dt[, rleid:=rleid(dt$V1)]
dt[, num := seq(.N), rleid]
u=1
arr = c()
for (i in 1:(nrow(dt)-1)){
if(dt$rleid[i] == dt$rleid[i+1]){
u=u+1
next
}
else{
arr = append(arr,u)}
u=1
}
arr=append(arr,1)
v=c()
for (i in 1:(length(arr))){
for (j in 1:arr[i]){
v=append(v,arr[i])
}
}
dt[, len:=v]
dt[, val:=len-num]
dt[, V2 := fifelse(is.na(V1) & val<=1, nafill(V1, "nocb"), V1)]
Run Code Online (Sandbox Code Playgroud)
对于大数据表来说,该解决方案花费的时间太长。有什么更快的建议吗?
一个快速而肮脏的data.table解决方案:
dt[, V1b := fcoalesce(c(list(V1), shift(V1, -(1:2))))]
# Or simply (as suggested by B. Christian Kamgang)
dt[, V1b := fcoalesce(shift(V1, -(0:2)))]
V1 V1b
<num> <num>
1: 58 58
2: NA NA
3: NA NA
4: NA 13
5: NA 13
6: 13 13
7: NA NA
8: NA 12
9: NA 12
10: 12 12
11: 23 23
12: NA 12
13: 12 12
Run Code Online (Sandbox Code Playgroud)