向量中不同元素之间元素数量的计数

Question

向量中不同元素之间元素数量的计数

ric*_*rey 9 grouping r vector difference

假设我有一个值向量，例如：

A C A B A C C B B C C A A A B B B B C A

Run Code Online (Sandbox Code Playgroud)

我想为每个元素创建一个新向量，该向量包含自该元素上次出现以来的元素数。所以，对于上面的向量，

NA NA  2 NA  2  4  1  4  1  3  1  7  1  1  6  1  1  1  8  6

Run Code Online (Sandbox Code Playgroud)

（其中NA表示这是第一次看到该元素）。

比如第一个和第二个A分别在位置1和3，相差2；第三个和第四个 A 在位置 4 和 11，相差 7，依此类推。

是否有预先构建的管道兼容功能可以做到这一点？

我把这个函数混在一起来演示：

# For reproducibility
set.seed(1)

# Example vector
x = sample(LETTERS[1:3], size = 20, replace = TRUE)


compute_lag_counts = function(x, first_time = NA){
  # return vector to fill
  lag_counts = rep(-1, length(x))
  # values to match
  vals = unique(x)
  # find all positions of all elements in the target vector
  match_list = grr::matches(vals, x, list = TRUE)
  # compute the lags, then put them in the appropriate place in the return vector
  for(i in seq_along(match_list))
    lag_counts[x == vals[i]] = c(first_time, diff(sort(match_list[[i]])))
  
  # return vector
  return(lag_counts)
}

compute_lag_counts(x)

Run Code Online (Sandbox Code Playgroud)

虽然它似乎做了它应该做的事情，但我宁愿使用别人的有效、经过充分测试的解决方案！我的搜索结果是空的，这让我感到惊讶，因为这似乎是一项常见的任务。

Answer 1

mar*_*kus 8

或者

ave(seq.int(x), x, FUN = function(x) c(NA, diff(x)))
#  [1] NA NA  2 NA  2  4  1  4  1  3  1  7  1  1  6  1  1  1  8  6

Run Code Online (Sandbox Code Playgroud)

我们计算diff每组的索引的第一个erence x。

一个data.table选项感谢@Henrik

library(data.table)
dt = data.table(x)
dt[ , d := .I - shift(.I), x]
dt

Run Code Online (Sandbox Code Playgroud)

我正在以类似的方式写一个 `data.table` 替代方案，但你更快：`dt = data.table(x)`; `dt[ , d := .I - shift(.I), x]`。 (4认同)

归档时间：	5 年，7 月前
查看次数：	238 次
最近记录：	5 年，7 月前