Dal*_*n K 9 r duplicates dataframe dplyr
我有一个数据框架,如下所示:
df <- data.frame("id" = c(111,111,111,222,222,222,222,333,333,333,333),
"Location" = c("A","B","A","A","C","B","A","B","A","A","A"),
"Encounter" = c(1,2,3,1,2,3,4,1,2,3,4))
id Location Encounter
1 111 A 1
2 111 B 2
3 111 A 3
4 222 A 1
5 222 C 2
6 222 B 3
7 222 A 4
8 333 B 1
9 333 A 2
10 333 B 3
11 333 A 4
Run Code Online (Sandbox Code Playgroud)
我基本上是想为每个id组创建一个二进制标志,该标志位于先前的Encounter中。因此,它看起来像:
id Location Encounter Flag
1 111 A 1 0
2 111 B 2 0
3 111 A 3 1
4 222 A 1 0
5 222 C 2 0
6 222 B 3 0
7 222 A 4 1
8 333 B 1 0
9 333 A 2 0
10 333 B 3 1
11 333 A 4 1
Run Code Online (Sandbox Code Playgroud)
我试图弄清楚如何做一个if语句,例如:
library(dplyr)
df$Flag <- case_when((df$id - lag(df$id)) == 0 ~
case_when(df$Location == lag(df$Location, 1) |
df$Location == lag(df$Location, 2) |
df$Location == lag(df$Location, 3) ~ 1, T ~ 0), T ~ 0)
id Location Flag
1 111 A 0
2 111 B 0
3 111 A 1
4 222 A 0
5 222 C 0
6 222 B 0
7 222 A 1
8 333 B 0
9 333 A 1
10 333 B 1
11 333 A 1
Run Code Online (Sandbox Code Playgroud)
但这是一个问题,第9行被错误地分配为1,在实际数据中遇到15次以上的情况,因此变得非常麻烦。我希望找到一种方法来做类似的事情
lag(df$Location, 1:df$Encounter)
Run Code Online (Sandbox Code Playgroud)
但我知道lag()k需要一个整数,因此该特定命令将不起作用。
一个选项 duplicated
library(dplyr)
df %>%
group_by(id) %>%
mutate(Flag = +(duplicated(Location)))
# A tibble: 11 x 4
# Groups: id [3]
# id Location Encounter Flag
# <dbl> <fct> <dbl> <int>
# 1 111 A 1 0
# 2 111 B 2 0
# 3 111 A 3 1
# 4 222 A 1 0
# 5 222 C 2 0
# 6 222 B 3 0
# 7 222 A 4 1
# 8 333 B 1 0
# 9 333 A 2 0
#10 333 A 3 1
#11 333 A 4 1
Run Code Online (Sandbox Code Playgroud)