And*_*d_R 0 r duplicates dataframe
我有一个大的data.frame,看起来类似于下面的例子:
ID date sex grade location
1 1 2000 m 1 x
2 1 2001 m 2 y
3 2 1999 f 3 z
4 2 2000 f 4 f
5 3 2000 m 5 k
6 3 2001 m 6 l
Run Code Online (Sandbox Code Playgroud)
重现它运行:
df <- data.frame(ID=c(1,1,2,2,3,3),
date=c(2000,2001,1999,2000,2000,2001),
sex = c("m", "m", "f", "f", "m", "m"),
grade =c(1,2,3,4,5,6),
location =c("x","y","z", "f","k","l") )
Run Code Online (Sandbox Code Playgroud)
我渴望操纵/更改我的data.frame以获得以下结构:
ID date sex grade location
1 1 1999 m 0 0
2 1 2000 m 1 x
3 1 2001 m 2 y
4 2 1999 f 3 z
5 2 2000 f 4 f
6 2 2001 f 0 0
7 3 1999 m 0 0
8 3 2000 m 5 k
9 3 2001 m 6 l
Run Code Online (Sandbox Code Playgroud)
这可以这样完成data.table:
library(data.table)
setDT(df, key = c("ID", "date"))
> df[CJ(ID, date, unique = TRUE)]
ID date sex grade location
1: 1 1999 NA NA NA
2: 1 2000 m 1 x
3: 1 2001 m 2 y
4: 2 1999 f 3 z
5: 2 2000 f 4 f
6: 2 2001 NA NA NA
7: 3 1999 NA NA NA
8: 3 2000 m 5 k
9: 3 2001 m 6 l
Run Code Online (Sandbox Code Playgroud)
如果你想在sex内部统一ID:
df <- df[CJ(ID, date, unique = TRUE)]
df[ , sex := unique(na.omit(sex)), by = ID]
Run Code Online (Sandbox Code Playgroud)
如果你真的想0s,而不是NA为grade和location(你应该重新考虑这个,因为它很可能最好离开它NA):
df[is.na(grade), grade := 0]
levels(df$location) <- c("0", levels(df$location))
df[is.na(location), location := "0"]
Run Code Online (Sandbox Code Playgroud)