数据表中的累积向量

Question

数据表中的累积向量

我有以下数据表：

library(data.table)
dat = data.table(j = c(3,8,9,11,10,28), gr = c(9,9,9,9,10,10))
> dat
    j gr
1:  3  9
2:  8  9
3:  9  9
4: 11  9
5: 10 10
6: 28 10

Run Code Online (Sandbox Code Playgroud)

有两个组（由 'gr' 指定）并且它们是有序的。现在我想要实现的是为每组的每一行创建一个累积向量值在 'j' 中。结果应该是一list列，如下所示：

res_dat = data.table(j = c(3,8,9,11,10,28), gr = c(9,9,9,9,10,10),
                     res = list(3, c(3,8), c(3,8,9), c(3,8,9,11),
                                10, c(10, 28)))
> res_dat
    j gr         res
1:  3  9           3
2:  8  9         3,8
3:  9  9       3,8,9
4: 11  9  3, 8, 9,11
5: 10 10          10
6: 28 10       10,28

Run Code Online (Sandbox Code Playgroud)

我尝试了以下方法：

首先，我创建了一个虚拟列，其中包含每个组的每行编号。

dat[, tmp:= seq_len(.N), by = gr]

Run Code Online (Sandbox Code Playgroud)

我的计划是使用这个数字来对 j 向量进行子集化，但我没有做到。这些都不起作用：

dat[, res := list(j[1:tmp]), by = gr]
dat[, res := list(list(j[1:tmp])), by = gr] # based on /sf/ask/1577203421/

Run Code Online (Sandbox Code Playgroud)

我收到以下错误：

Warning messages:
1: In 1:tmp : numerical expression has 4 elements: only the first used
2: In 1:tmp : numerical expression has 2 elements: only the first used

Run Code Online (Sandbox Code Playgroud)

这确实有助于理解它是如何失败的，但我不知道如何使它成功。有任何想法吗？

Answer 1

r2e*_*ans 7

这是 Henrik 的回答（如果他们回来，我很乐意给他们这个答案......不知何故）：

dat[, res := .(Reduce(c, j, accumulate=TRUE)), by = gr]
#        j    gr         res
#    <num> <num>      <list>
# 1:     3     9           3
# 2:     8     9         3,8
# 3:     9     9       3,8,9
# 4:    11     9  3, 8, 9,11
# 5:    10    10          10
# 6:    28    10       10,28

Run Code Online (Sandbox Code Playgroud)

Reducesapply除了它对当前值和上一次操作的结果进行操作之外，与此类似。例如，我们可以看到

sapply(1:3, function(z) z*2)
# [1] 2 4 6

Run Code Online (Sandbox Code Playgroud)

这，展开，相当于

1*2 # 2
2*2 # 4
3*2 # 6

Run Code Online (Sandbox Code Playgroud)

也就是说，对向量/列表的一个元素的计算是完全独立的，永远不知道之前迭代的结果。

但是，Reduce明确给出了先前计算的结果。默认情况下，它只会返回最后一次计算，类似于tail(sapply(...), 1)：

Reduce(function(prev, this) prev + this*2, 11:13)
# [1] 61

Run Code Online (Sandbox Code Playgroud)

这似乎有点晦涩......让我们看看所有的临时步骤，上面的答案是最后一个：

Reduce(function(prev, this) prev + this*2, 11:13, accumulate = TRUE)
# [1] 11 35 61

Run Code Online (Sandbox Code Playgroud)

在这种情况下（不指定init=，等待它），第一个结果只是中的第一个值x=，而不是通过函数运行。如果我们展开这个，我们会看到

11        # 11 is the first value in x
   _________/
  /
 v
11 + 12*2 # 35
35 + 13*2 # 61

Run Code Online (Sandbox Code Playgroud)

有时我们需要在函数中x=运行第一个值，并带有一个起始条件（prev当我们没有前一次迭代使用时的第一次值）。为此，我们可以使用init=; 我们可以init=通过查看两个完全等效的调用来考虑使用：

Reduce(function(prev, this) prev + this*2, 11:13, accumulate = TRUE)
Reduce(function(prev, this) prev + this*2, 12:13, init = 11, accumulate = TRUE)
# [1] 11 35 61

Run Code Online (Sandbox Code Playgroud)

（没有init=，Reduce 将获取的第一个元素x=并将其分配给init=并将其从中删除x=。）

现在假设我们希望起始条件（注入的“前一个”值）为 0，然后我们会这样做

Reduce(function(prev, this) prev + this*2, 11:13, init = 0, accumulate = TRUE)
# [1]  0 22 46 72


### unrolled
 0        # 0 is the init= value
   ________/
  /
 v
 0 + 11*2 # 22
22 + 12*2 # 46
46 + 13*2 # 72

Run Code Online (Sandbox Code Playgroud)

让我们回到这个问题和这个数据。我将注入 abrowser()并稍微更改函数，以便我们可以查看所有中间值。

> dat[, res := .(Reduce(function(prev, this) { browser(); c(prev, this); }, j, accumulate=TRUE)), by = gr]
Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev                                    # group `gr=9`, row 2
[1] 3
Browse[2]> this
[1] 8
Browse[2]> c(prev, this)
[1] 3 8
Browse[2]> c                                       # 'c'ontinue

Browse[2]> Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev                                    # group `gr=9`, row 3
[1] 3 8
Browse[2]> this
[1] 9
Browse[2]> c(prev, this)
[1] 3 8 9
Browse[2]> c                                       # 'c'ontinue

Browse[2]> Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev                                    # group `gr=9`, row 4
[1] 3 8 9
Browse[2]> this
[1] 11
Browse[2]> c(prev, this)
[1]  3  8  9 11
Browse[2]> c                                       # 'c'ontinue

Browse[2]> Called from: f(init, x[[i]])
Browse[1]> debug at #1: c(prev, this)
Browse[2]> prev                                    # group `gr=10`, row 6
[1] 10
Browse[2]> this
[1] 28
Browse[2]> c(prev, this)
[1] 10 28
Browse[2]> c                                       # 'c'ontinue

Run Code Online (Sandbox Code Playgroud)

请注意我们如何没有“看到”第 1 行或第 5 行，因为它们是init=减少的条件（prev每组中看到的第一个值）。

Reduce可能是一个难以可视化和使用的功能。当我使用它时，我几乎总是预先插入browser()anon-function 并完成前三个步骤：第一个步骤确保init=正确，第二个步骤确保 anon-function 正在做我认为我想要的init 和 next 值，以及第三个以确保它正确继续。这类似于“演绎证明”：n计算将是正确的，因为我们知道(n-1)th计算是正确的。

很高兴你发布了@r2evans！我忙着喝咖啡。干杯 (2认同)
*highfive* 用于中性语言。 (2认同)

归档时间：	4 年，3 月前
查看次数：	109 次
最近记录：	4 年，3 月前