将 cummean 与 group_by 一起使用并忽略 NA

Joh*_*n F 1 r dplyr

df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
                 value=c(NA,2,3,4,5,NA,7,8))
Run Code Online (Sandbox Code Playgroud)

我想在上面的数据框中添加一个新列,它采用value列的累积平均值,而不考虑 NA。有可能做到这一点dplyr吗?我试过了

df <- df %>% group_by(category) %>% mutate(new_col=cummean(value))
Run Code Online (Sandbox Code Playgroud)

cummean就是不知道如何处理 NA。

编辑:我不想将 NA 计为 0。

Jac*_*kes 5

你可以使用ifelse治疗NAS作为0cummean呼叫:

library(dplyr)

df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
                 value=c(NA,2,3,4,5,NA,7,8))

df %>%
  group_by(category) %>%
  mutate(new_col = cummean(ifelse(is.na(value), 0, value)))
Run Code Online (Sandbox Code Playgroud)

输出:

# A tibble: 8 x 3
# Groups:   category [2]
  category value new_col
  <fct>    <dbl>   <dbl>
1 cat1       NA     0.  
2 cat1        2.    1.00
3 cat2        3.    3.00
4 cat1        4.    2.00
5 cat2        5.    4.00
6 cat2       NA     2.67
7 cat1        7.    3.25
8 cat2        8.    4.00
Run Code Online (Sandbox Code Playgroud)

编辑:现在我看到这与忽略 NA 不同。

试试这个。我按列分组,该列指定值是否NA存在,这意味着cummean可以在不遇到任何 NA 的情况下运行:

library(dplyr)

df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
                 value=c(NA,2,3,4,5,NA,7,8))

df %>%
  group_by(category, isna = is.na(value)) %>%
  mutate(new_col = ifelse(isna, NA, cummean(value)))
Run Code Online (Sandbox Code Playgroud)

输出:

# A tibble: 8 x 4
# Groups:   category, isna [4]
  category value isna  new_col
  <fct>    <dbl> <lgl>   <dbl>
1 cat1       NA  TRUE    NA   
2 cat1        2. FALSE    2.00
3 cat2        3. FALSE    3.00
4 cat1        4. FALSE    3.00
5 cat2        5. FALSE    4.00
6 cat2       NA  TRUE    NA   
7 cat1        7. FALSE    4.33
8 cat2        8. FALSE    5.33
Run Code Online (Sandbox Code Playgroud)