识别并计算法术（每组中的特殊事件）

Question

识别并计算法术（每组中的特殊事件）

Tho*_*del 8 grouping r time-series dataframe dplyr

我正在寻找一种有效的方法来识别时间序列中的咒语/奔跑。在下图中，前三列是我所拥有的，第四列spell是我要计算的。我已经尝试使用dplyr的lead和lag，但过于复杂。我已经尝试过，rle但无处可去。

代表

df <- structure(list(time = structure(c(1538876340, 1538876400, 
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), 
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))

Run Code Online (Sandbox Code Playgroud)

我更喜欢一个tidyverse解决方案。

假设条件

数据排序group，然后通过time
time每组之间没有差距

更新资料

感谢您的贡献。我已对全部数据计时了一些建议的方法（n = 2,583,360）

rle@markus 的方法花费了0.53秒
cumsum@MM 的方法花费了2.85秒
@MrFlick的函数方法花费了0.66秒
在rle与dense_rank由@tmfmnk了0.89

我最终选择了@markus的（1），因为它快速且仍然有些直观（主观）。（2）@MM最能满足我对dplyr解决方案的需求，尽管它在计算上效率低下。

Answer 1

mar*_*kus 7

一种选择 rle

library(dplyr)
df %>% 
  group_by(group) %>% 
  mutate(
    spell = {
      r <- rle(is.5)
      r$values <- cumsum(r$values) * r$values
      inverse.rle(r) 
      }
  )
# A tibble: 14 x 4
# Groups:   group [2]
#   time                group  is.5 spell
#   <dttm>              <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A         0     0
# 2 2018-10-07 01:40:00 A         1     1
# 3 2018-10-07 01:41:00 A         1     1
# 4 2018-10-07 01:42:00 A         0     0
# 5 2018-10-07 01:43:00 A         1     2
# 6 2018-10-07 01:44:00 A         0     0
# 7 2018-10-07 01:45:00 A         0     0
# 8 2018-10-07 01:46:00 A         1     3
# 9 2018-05-20 14:00:00 B         0     0
#10 2018-05-20 14:01:00 B         0     0
#11 2018-05-20 14:02:00 B         1     1
#12 2018-05-20 14:03:00 B         1     1
#13 2018-05-20 14:04:00 B         0     0
#14 2018-05-20 14:05:00 B         1     2

Run Code Online (Sandbox Code Playgroud)

您寻求tidyverse解决方案，但是如果您担心速度，可以使用data.table。语法非常相似

library(data.table)
setDT(df)[, spell := {
  r <- rle(is.5)
  r$values <- cumsum(r$values) * r$values
  inverse.rle(r) 
  }, by = group][] # the [] at the end prints the data.table

Run Code Online (Sandbox Code Playgroud)

说明

当我们打电话

r <- rle(df$is.5)

Run Code Online (Sandbox Code Playgroud)

我们得到的结果是

r
#Run Length Encoding
#  lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
#  values : num [1:10] 0 1 0 1 0 1 0 1 0 1

Run Code Online (Sandbox Code Playgroud)

我们需要更换values与累积和在那里values == 1，而values应保持否则为零。

我们可以做到这一点时，我们多cumsum(r$values)用r$values; 其中后者是0s和1s 的向量。

r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5

Run Code Online (Sandbox Code Playgroud)

最后，我们调用inverse.rle以返回与长度相同的向量is.5。

inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5

Run Code Online (Sandbox Code Playgroud)

我们每个人都这样做group。

Answer 2

MrF*_*ick 6

这是一个辅助函数，可以返回您想要的内容

spell_index <- function(time, flag) {
  change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
  cumsum(change) * (flag==1)+0
}

Run Code Online (Sandbox Code Playgroud)

您可以将其与数据一起使用

library(dplyr)
df %>% 
  group_by(group) %>% 
  mutate(
    spell = spell_index(time, is.5)
  )

Run Code Online (Sandbox Code Playgroud)

基本上，辅助函数用于lag()查找更改。我们cumsum()用来增加更改的数量。然后，我们将其乘以布尔值，以便将要归零的值归零。

归档时间：	6 年，7 月前
查看次数：	375 次
最近记录：	6 年，6 月前