我有以下数据集:
我想用公式 n_j - (d_j+c_j) 计算第二行 n_j 列中的 NA。
创建数据:
df = structure(list(time_intervals = structure(1:8, levels = c("[0,12)",
"[12,24)", "[24,36)", "[36,48)", "[48,60)", "[60,72)", "[72,84)",
"[84,96]"), class = "factor"), d_j = c(16L, 10L, 1L, 3L, 2L,
2L, 0L, 2L), c_j = c(4L, 4L, 0L, 1L, 2L, 0L, 1L, 0L), n_j = c(48L,
NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
Run Code Online (Sandbox Code Playgroud)
我设法用for循环来做到这一点:
for (i in 1:nrow(df)) {
df <- df |>
mutate(
n_j =
ifelse(is.na(n_j), lag(n_j)- (lag(d_j)+lag(c_j)), n_j)
)
}
Run Code Online (Sandbox Code Playgroud)
有没有办法使用purrr::map或其他Tidyverse函数来做到这一点?
Gue*_*sBF 13
一条单行线dplyr
我们可以用 填充原始列中的 NA coalesce()。替换来自减去滞后的d_j和c_j。\n最后,使用cum求和sum来获得所需的输出。\n它应该相当有效,因为这依赖于矢量化减法和非常快的cumsum.
df |>\n mutate(n_j = coalesce(n_j, -lag(d_j) -lag(c_j)) |> cumsum())\n# A tibble: 8 \xc3\x97 4\nRun Code Online (Sandbox Code Playgroud)\n输出:
\ndf |>\n mutate(n_j = coalesce(n_j, -lag(d_j) -lag(c_j)) |> cumsum())\n# A tibble: 8 \xc3\x97 4\nRun Code Online (Sandbox Code Playgroud)\n
Ony*_*mbu 10
使用:
start <- df$n_j[1]
transform(df, n_j = c(start, start - cumsum(d_j + c_j)[-nrow(df)]))
Run Code Online (Sandbox Code Playgroud)
输出:
start <- df$n_j[1]
transform(df, n_j = c(start, start - cumsum(d_j + c_j)[-nrow(df)]))
Run Code Online (Sandbox Code Playgroud)
accumulate2使用from 的一种解决方案purrr是:
library(tidyverse)
df %>%
mutate(
n_j = accumulate2(
d_j,
c_j,
~..1 - (..2 + ..3),
.init = first(n_j)
)[-n()-1]
)
Run Code Online (Sandbox Code Playgroud)
输出
library(tidyverse)
df %>%
mutate(
n_j = accumulate2(
d_j,
c_j,
~..1 - (..2 + ..3),
.init = first(n_j)
)[-n()-1]
)
Run Code Online (Sandbox Code Playgroud)