r根据值(不是行数或日期/时间变量)使用窗口计算滚动平均值

Rmy*_*loR 5 r filter smoothing rolling-computation rolling-average

我对用于计算R中的滚动平均值的所有软件包都比较陌生,希望您能向我展示正确的方向。

我以以下数据为例:

ms <- c(300, 300, 300, 301, 303, 305, 305, 306, 308, 310, 310, 311, 312,
    314, 315, 315, 316, 316, 316, 317, 318, 320, 320, 321, 322, 324,
    328, 329, 330, 330, 330, 332, 332, 334, 334, 335, 335, 336, 336,
    337, 338, 338, 338, 340, 340, 341, 342, 342, 342, 342)
correct <- c(1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0,
         1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
         1, 0, 0, 1, 0, 0, 1, 1, 0, 0)
df <- data.frame(ms, correct)
Run Code Online (Sandbox Code Playgroud)

ms是时间点(以毫秒为单位),correct是特定操作是否正确执行
(1 =正确,0 =不正确)。

我现在的目标是,我要计算固定毫秒数的窗口正确率(或平均值)百分比。如您所见,某些时间点丢失了,某些时间点出现了多次。因此,我不想基于行号进行过滤。我研究了诸如“ tidyquant”之类的某些程序包,但在我看来,这类程序包需要时间/日期变量而不是数字变量来确定取平均值的窗口。有没有办法指定数值的窗口df$ms

ngh*_*ran 2

试用:

library(dplyr)

# count the number of values per ms
df <- df %>%
        group_by(ms) %>%
        mutate(Nb.values = n())

# consider a window of 1 ms and compute the percentage for each window
df2 <- setNames(aggregate(correct ~ factor(df$ms, levels = as.character(seq(min(df$ms), max(df$ms), 1))),
                          df, sum),
                c("ms", "Count.correct"))

# complete data frame (including unused levels)
df2 <- tidyr::complete(df2, ms)
df2$ms <- as.numeric(levels(df2$ms))[df2$ms]
df2 <- df2 %>% left_join(distinct(df[, c(1, 3)]), "ms")

# compute a rolling mean of the percentage of correct, with a width of 5
df2 %>%
        mutate(Window = paste(ms, ms+4, sep = "-"), # add windows
               Rolling.correct = zoo::rollapply(Count.correct, 5, sum, na.rm = T,
                                                partial = TRUE, fill = NA, align = "left") /
                       zoo::rollapply(Nb.values, 5, sum, na.rm = T, partial = TRUE,
                                      fill = NA, align = "left")) # add rolling mean

# A tibble: 43 x 5
      ms Count.correct Nb.values  Window Rolling.correct
   <dbl>         <dbl>     <int>   <chr>           <dbl>
 1   300             2         3 300-304            0.40
 2   301             0         1 301-305            0.00
 3   302            NA        NA 302-306            0.25
 4   303             0         1 303-307            0.25
 5   304            NA        NA 304-308            0.25
 6   305             0         2 305-309            0.25
 7   306             1         1 306-310            0.25
 8   307            NA        NA 307-311            0.00
 9   308             0         1 308-312            0.20
10   309            NA        NA 309-313            0.25
# ... with 33 more rows
Run Code Online (Sandbox Code Playgroud)