R中data.frame中的行重复

And*_*d_R 0 r duplicates dataframe

我有一个大的data.frame,看起来类似于下面的例子:

  ID date sex grade location
1  1 2000   m     1        x
2  1 2001   m     2        y
3  2 1999   f     3        z
4  2 2000   f     4        f
5  3 2000   m     5        k
6  3 2001   m     6        l
Run Code Online (Sandbox Code Playgroud)

重现它运行:

df <- data.frame(ID=c(1,1,2,2,3,3),
                     date=c(2000,2001,1999,2000,2000,2001),
                     sex = c("m", "m", "f", "f", "m", "m"),
                     grade =c(1,2,3,4,5,6),
                     location =c("x","y","z", "f","k","l") )
Run Code Online (Sandbox Code Playgroud)

我渴望操纵/更改我的data.frame以获得以下结构:

      ID date sex grade location
    1  1 1999   m     0        0
    2  1 2000   m     1        x
    3  1 2001   m     2        y
    4  2 1999   f     3        z
    5  2 2000   f     4        f
    6  2 2001   f     0        0
    7  3 1999   m     0        0
    8  3 2000   m     5        k
    9  3 2001   m     6        l
Run Code Online (Sandbox Code Playgroud)

Mic*_*ico 5

这可以这样完成data.table:

library(data.table)
setDT(df, key = c("ID", "date"))

> df[CJ(ID, date, unique = TRUE)]
   ID date sex grade location
1:  1 1999  NA    NA       NA
2:  1 2000   m     1        x
3:  1 2001   m     2        y
4:  2 1999   f     3        z
5:  2 2000   f     4        f
6:  2 2001  NA    NA       NA
7:  3 1999  NA    NA       NA
8:  3 2000   m     5        k
9:  3 2001   m     6        l
Run Code Online (Sandbox Code Playgroud)

如果你想在sex内部统一ID:

df <- df[CJ(ID, date, unique = TRUE)]

df[ , sex := unique(na.omit(sex)), by = ID]
Run Code Online (Sandbox Code Playgroud)

如果你真的想0s,而不是NAgradelocation(你应该重新考虑这个,因为它很可能最好离开它NA):

df[is.na(grade), grade := 0]
levels(df$location) <- c("0", levels(df$location))
df[is.na(location), location := "0"]
Run Code Online (Sandbox Code Playgroud)