我有以下三个数据帧:
df1 <- data.frame(name=c("John", "Anne", "Christine", "Andy"),
age=c(31, 26, 54, 48),
height=c(180, 175, 160, 168),
group=c("Student",3,5,"Employer"), stringsAsFactors=FALSE)
df2 <- data.frame(name=c("Anne", "Christine"),
age=c(26, 54),
height=c(175, 160),
group=c(3,5),
group2=c("Teacher",6), stringsAsFactors=FALSE)
df2 <- data.frame(name=c("Christine"),
age=c(54),
height=c(160),
group=c(5),
group2=c(6),
group3=c("Scientist"), stringsAsFactors=FALSE)
Run Code Online (Sandbox Code Playgroud)
我想将它们组合起来,以便得到以下结果:
df.all <- data.frame(name=c("John", "Anne", "Christine", "Andy"),
age=c(31, 26, 54, 48),
height=c(180, 175, 160, 168),
group=c("Student", "Teacher", "Scientist", "Employer"))
Run Code Online (Sandbox Code Playgroud)
目前我正是这样做的:
df.all <- merge(merge(df1[,c(1,4)], df2[,c(1,5)], all=TRUE, by="name"),
df3[,c(1,6)], all=TRUE, by="name")
row.ind <- which(df.all$group %in% c(6,5))
df.all[row.ind, c("group")] <- df.all[row.ind, c("group2")]
row.ind2 <- which(df.all$group2 %in% c(6))
df.all[row.ind2, c("group")] <- df.all[row.ind2, c("group3")]
Run Code Online (Sandbox Code Playgroud)
这不是一般性的,而且非常混乱.也许有一种方法可以使用merge_all或merge_recurse用于合并步骤(特别是因为可能有两个以上的数据框要合并),但我还没弄清楚如何.这两个不能产生正确的结果:
df.all <- merge_all(list(df1, df2, df3))
df.all <- merge_recurse(list(df1, df2, df3), by=c("name"))
Run Code Online (Sandbox Code Playgroud)
有没有更通用和优雅的方法来解决这个问题?
如果我明白你最终会追求什么,这是另一种可行的方法.(目前尚不清楚"组"列中的数值是什么,所以我不确定这正是您正在寻找的.)
使用Reduce()合并的多个data.frame秒.
temp <- Reduce(function(x, y) merge(x, y, all=TRUE), list(df1, df2, df3))
names(temp)[4] <- "group1" # Rename "group" to "group1" for reshaping
temp
# name age height group1 group2 group3
# 1 Andy 48 168 Employer <NA> <NA>
# 2 Anne 26 175 3 Teacher <NA>
# 3 Christine 54 160 5 6 Scientist
# 4 John 31 180 Student <NA> <NA>
Run Code Online (Sandbox Code Playgroud)
用于reshape()从长到长重塑您的数据.
df.all <- reshape(temp, direction = "long", idvar="name", varying=4:6, sep="")
df.all
# name age height time group
# Andy.1 Andy 48 168 1 Employer
# Anne.1 Anne 26 175 1 3
# Christine.1 Christine 54 160 1 5
# John.1 John 31 180 1 Student
# Andy.2 Andy 48 168 2 <NA>
# Anne.2 Anne 26 175 2 Teacher
# Christine.2 Christine 54 160 2 6
# John.2 John 31 180 2 <NA>
# Andy.3 Andy 48 168 3 <NA>
# Anne.3 Anne 26 175 3 <NA>
# Christine.3 Christine 54 160 3 Scientist
# John.3 John 31 180 3 <NA>
Run Code Online (Sandbox Code Playgroud)
利用as.numeric()将强制字符强制转换的事实NA,并使用它na.omit()来删除所有具有NA值的行.
na.omit(df.all[is.na(as.numeric(df.all$group)), ])
# name age height time group
# Andy.1 Andy 48 168 1 Employer
# John.1 John 31 180 1 Student
# Anne.2 Anne 26 175 2 Teacher
# Christine.3 Christine 54 160 3 Scientist
Run Code Online (Sandbox Code Playgroud)
同样,这可能会过度概括您的问题 - 例如,其他列中可能存在NA值 - 但它可能有助于指导您解决问题.
| 归档时间: |
|
| 查看次数: |
4945 次 |
| 最近记录: |