在 data.table 上按周期分组重复

phe*_*nss 6 r data.table

我有一个包含名称、日期和几个分类列的数据集。让我们说

data <- data.table(name = c('Anne', 'Ben', 'Cal', 'Anne', 'Ben', 'Cal', 'Anne', 'Ben', 'Ben', 'Ben', 'Cal'),
               period = c(1,1,1,1,1,1,2,2,2,3,3), 
               category = c("A","A","A","B","B","B","A","B","A","B","A"))
Run Code Online (Sandbox Code Playgroud)

看起来像这样:

  name  period  category
  Anne       1         A
   Ben       1         A
   Cal       1         A
  Anne       1         B
   Ben       1         B
   Cal       1         B
  Anne       2         A
   Ben       2         B
   Ben       2         A
   Ben       3         A
   Cal       3         B
Run Code Online (Sandbox Code Playgroud)

我想计算,对于每个时期,对于我的每组分类变量,过去时期存在多少个名字。输出应如下所示:

period  category  recurrence_count
    2         A                 2   # due to Anne and Ben being on A, period 1
    2         B                 1   # due to Ben being on B, period 1
    3         A                 1   # due to Ben being on A, period 2 
    3         B                 0   # no match from B, period 2
Run Code Online (Sandbox Code Playgroud)

我知道 data.table 中的 .I 和 .GRP 运算符,但我不知道如何在语句的 j 条目中编写“下一组”的概念。我想像这样的事情可能是一条合理的路径,但我无法弄清楚正确的语法:

data[, .(recurrence_count = length(intersect(name, name[last(.GRP)]))), by = .(category, period)]
Run Code Online (Sandbox Code Playgroud)

Hen*_*rik 2

另一种data.table选择。对于可以有前一个句点 ( period != 1) 的行,创建这样一个变量 ( prev_period := period - 1)。

将原始数据与具有“prev_period”值的子集 ( data[data[!is.na(prev_period)]) 连接起来。连接“类别”、“期间 = prev_period”和“名称”。

在生成的数据集中,对于每个“周期”和“类别”( ),计算原始数据 ( ) 中与前一个周期 ( ) 匹配的by = .(period = i.period, category)名称数量。x.namelength(na.omit(x.name))

data[period != 1, prev_period := period - 1]

data[data[!is.na(prev_period)], on = c("category", period = "prev_period", "name"),
     .(category, i.period, x.name)][
       , .(n = length(na.omit(x.name))), by = .(period = i.period, category)]

#    period category n
# 1:      2        A 2
# 2:      2        B 1
# 3:      3        B 1
# 4:      3        A 0
Run Code Online (Sandbox Code Playgroud)