df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
Run Code Online (Sandbox Code Playgroud)
我想在上面的数据框中添加一个新列,它采用value列的累积平均值,而不考虑 NA。有可能做到这一点dplyr吗?我试过了
df <- df %>% group_by(category) %>% mutate(new_col=cummean(value))
Run Code Online (Sandbox Code Playgroud)
但cummean就是不知道如何处理 NA。
编辑:我不想将 NA 计为 0。
你可以使用ifelse治疗NAS作为0对cummean呼叫:
library(dplyr)
df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
df %>%
group_by(category) %>%
mutate(new_col = cummean(ifelse(is.na(value), 0, value)))
Run Code Online (Sandbox Code Playgroud)
输出:
# A tibble: 8 x 3
# Groups: category [2]
category value new_col
<fct> <dbl> <dbl>
1 cat1 NA 0.
2 cat1 2. 1.00
3 cat2 3. 3.00
4 cat1 4. 2.00
5 cat2 5. 4.00
6 cat2 NA 2.67
7 cat1 7. 3.25
8 cat2 8. 4.00
Run Code Online (Sandbox Code Playgroud)
编辑:现在我看到这与忽略 NA 不同。
试试这个。我按列分组,该列指定值是否NA存在,这意味着cummean可以在不遇到任何 NA 的情况下运行:
library(dplyr)
df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"),
value=c(NA,2,3,4,5,NA,7,8))
df %>%
group_by(category, isna = is.na(value)) %>%
mutate(new_col = ifelse(isna, NA, cummean(value)))
Run Code Online (Sandbox Code Playgroud)
输出:
# A tibble: 8 x 4
# Groups: category, isna [4]
category value isna new_col
<fct> <dbl> <lgl> <dbl>
1 cat1 NA TRUE NA
2 cat1 2. FALSE 2.00
3 cat2 3. FALSE 3.00
4 cat1 4. FALSE 3.00
5 cat2 5. FALSE 4.00
6 cat2 NA TRUE NA
7 cat1 7. FALSE 4.33
8 cat2 8. FALSE 5.33
Run Code Online (Sandbox Code Playgroud)