我在R中有一个data.table,我想按组分别应用滚动总和.但问题是组长度不一样,当rollapply函数到达较短的组时,它会遇到错误.有没有办法解决这个问题,除了循环?
以下是一个简单的例子来说明问题.
DT <- data.table(id = c(rep("A", 6), rep("B", 2), rep("C", 8)),
val = c(1:6, 1:2, 1:8))
> DT
id val
1: A 1
2: A 2
3: A 3
4: A 4
5: A 5
6: A 6
7: B 1
8: B 2
9: B 1
10: B 2
11: B 3
12: B 4
13: B 5
14: B 6
15: C 7
16: C 8
Run Code Online (Sandbox Code Playgroud)
滚动总和4个数字,使用 rollapplyr()
DT[, cum.sum := rollapplyr(val, width = 4, FUN = sum, fill = NA), by = id]
Run Code Online (Sandbox Code Playgroud)
但这会给我一个错误
Error in seq.default(start.at, NROW(data), by = by) : wrong sign in 'by' argument
Run Code Online (Sandbox Code Playgroud)
输出是
> DT
id val cum.sum
1: A 1 NA
2: A 2 NA
3: A 3 NA
4: A 4 10
5: A 5 14
6: A 6 18
7: B 1 NA
8: B 2 NA
9: C 1 NA
10: C 2 NA
11: C 3 NA
12: C 4 NA
13: C 5 NA
14: C 6 NA
15: C 7 NA
16: C 8 NA
Run Code Online (Sandbox Code Playgroud)
理想情况下,输出应该是
> DT
id val cum.sum
1: A 1 NA
2: A 2 NA
3: A 3 NA
4: A 4 10
5: A 5 14
6: A 6 18
7: B 1 NA
8: B 2 NA
9: C 1 NA
10: C 2 NA
11: C 3 NA
12: C 4 10
13: C 5 14
14: C 6 18
15: C 7 22
16: C 8 26
Run Code Online (Sandbox Code Playgroud)
Sym*_*xAU 12
我们可以做的
DT[, cum.sum := Reduce(`+`, shift(val, 0:3)), by=id]
id val cum.sum
1: A 1 NA
2: A 2 NA
3: A 3 NA
4: A 4 10
5: A 5 14
6: A 6 18
7: B 1 NA
8: B 2 NA
9: C 1 NA
10: C 2 NA
11: C 3 NA
12: C 4 10
13: C 5 14
14: C 6 18
15: C 7 22
16: C 8 26
Run Code Online (Sandbox Code Playgroud)
我知道我以前见过这个 - 可能是重复的?