我有两个数据框df1和df2.我想创建一个新的数据帧df3是简单的总和df1,并df2在列名相同.
df1 <- data.frame(x1=c(1,4,5),x2=c(5,6,7),x3=c(9,9,10))
df2 <- data.frame(x1=c(1,6,3),x2=c(4,3,1),x3=c(5,4,6),x4=c(7,6,7))
df1
x1 x2 x3
1 1 5 9
2 4 6 9
3 5 7 10
df2
x1 x2 x3 x4
1 1 4 5 7
2 6 3 4 6
3 3 1 6 7
df3
x1 x2 x3 x4
1 2 9 14 7
2 10 9 13 6
3 8 8 16 7
Run Code Online (Sandbox Code Playgroud)
我们找到'df1'和'df2'('nm1')中常见的列名.创建'df2'('df3')的副本.添加数据集子集(df1[nm1],df2[nm1])并将其分配给'df3'的相应子集.
nm1 <- intersect(names(df1), names(df2))
df3 <- df2
df3[nm1] <- df1[nm1]+df2[nm1]
df3
# x1 x2 x3 x4
#1 2 9 14 7
#2 10 9 13 6
#3 8 8 16 7
Run Code Online (Sandbox Code Playgroud)
如果'df1'中的其他唯一列不在'df2'中,反之亦然,一个选项是将数据集放在a中list,然后rbind用rbindlist(from data.table)创建一个序列列('N')并使用lapply得到sum每个列的.
library(data.table)
rbindlist(list(df1, df2), fill=TRUE, idcol=TRUE)[,
N:= 1:.N, .id][,lapply(.SD, sum, na.rm=TRUE) ,
by = N , .SDcols=x1:x4][, N:= NULL][]
# x1 x2 x3 x4
#1: 2 9 14 7
#2: 10 9 13 6
#3: 8 8 16 7
Run Code Online (Sandbox Code Playgroud)