我Data Frame看起来像这个例子的前三列:
id obs value newCol
a 1 uncool NA
a 2 cool 1
a 3 uncool NA
a 4 uncool NA
a 5 cool 2
a 6 uncool NA
a 7 cool 1
a 8 uncool NA
b 1 cool 0
Run Code Online (Sandbox Code Playgroud)
我需要的是一个列(上面的newCol),它计算值为"cool"的观察值或组的第一行(按id分组)之间的"uncool"数.
我该怎么做(通过dplyr理想使用)?
此外,id您还需要另一个分组变量,如下grp = cumsum(dat$value == "cool") - (dat$value == "cool")所示。
然后,您可以使用mutate我们在每个组内分配sum(value == "uncool")给观察值的位置value == "cool"和NA其他位置。
library(dplyr)
dat %>%
group_by(id, grp = cumsum(dat$value == "cool") - (dat$value == "cool")) %>%
mutate(newCool = if_else(value == "cool", sum(value == "uncool"), NA_integer_))
# A tibble: 9 x 6
# Groups: id, grp [5]
id obs value newCol grp newCool
<chr> <int> <chr> <int> <int> <int>
1 a 1 uncool NA 0 NA
2 a 2 cool 1 0 1
3 a 3 uncool NA 1 NA
4 a 4 uncool NA 1 NA
5 a 5 cool 2 1 2
6 a 6 uncool NA 2 NA
7 a 7 cool 1 2 1
8 a 8 uncool NA 3 NA
9 b 1 cool 0 3 0
Run Code Online (Sandbox Code Playgroud)
数据
dat <- structure(list(id = c("a", "a", "a", "a", "a", "a", "a", "a",
"b"), obs = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L), value = c("uncool",
"cool", "uncool", "uncool", "cool", "uncool", "cool", "uncool",
"cool"), newCol = c(NA, 1L, NA, NA, 2L, NA, 1L, NA, 0L)), .Names = c("id",
"obs", "value", "newCol"), class = "data.frame", row.names = c(NA,
-9L))
Run Code Online (Sandbox Code Playgroud)