df <- data.frame(id = c(1, 1, 1, 2, 2),
gender = c("Female", "Female", "Male", "Female", "Male"),
variant = c("a", "b", "c", "d", "e"))
> df
id gender variant
1 1 Female a
2 1 Female b
3 1 Male c
4 2 Female d
5 2 Male e
Run Code Online (Sandbox Code Playgroud)
我想根据gender数据集中的列删除data.frame中的重复行。我知道(这里)有一个类似的问题,但是这里的区别是,我想删除数据集每个子集中的重复行,其中每个子集由一个unique定义id。
我想要的结果是这样的:
id gender variant
1 1 Female a
3 1 Male c
4 2 Female d
5 2 Male e
Run Code Online (Sandbox Code Playgroud)
我已经尝试了以下方法并且可以工作,但是我想知道是否存在更清洁,更有效的方法?
out = list()
for(i in 1:2){
df2 <- subset(df, id == i)
out[[i]] <- df2[!duplicated(df2$gender), ]
}
do.call(rbind.data.frame, out)
Run Code Online (Sandbox Code Playgroud)
df[!duplicated(df[c("id","gender")]),]
# id gender variant
# 1 1 Female a
# 3 1 Male c
# 4 2 Female d
# 5 2 Male e
Run Code Online (Sandbox Code Playgroud)
使用此方法的另一种方法subset如下:
subset(df, !duplicated(subset(df, select=c(id, gender))))
# id gender variant
# 1 1 Female a
# 3 1 Male c
# 4 2 Female d
# 5 2 Male e
Run Code Online (Sandbox Code Playgroud)