合并两个具有相同列的相似数据框

blu*_*e13 5 python merge dataframe pandas

我想合并 df_1 和 df_2 以创建 df_merged,但我想合并两者共有的列,而不是创建 A_x 和 A_y 之类的列。

index = [np.array(['foo', 'foo', 'qux', 'qux']),
         np.array(['one', 'two', 'one', 'two',])]
columns = ["A",  "B"]
df_1 = pd.DataFrame(np.random.randn(4, 2), index=index, columns=columns)

index = [np.array(['bar', 'bar', 'baz', 'baz',]),
         np.array(['one', 'two', 'one', 'two',])]
columns = ["A",  "B"]
df_2 = pd.DataFrame(np.random.randn(4, 2), index=index, columns=columns)

df_merge = pd.merge(df_1, df_2, left_index=True, right_index=True, how='outer')

print df_1
print df_2
print df_merge
Run Code Online (Sandbox Code Playgroud)

df_1

                A         B
foo one  2.082229  1.575985
    two -0.805592  0.444195
qux one  0.368874  0.253556
    two  1.017632 -0.471978
Run Code Online (Sandbox Code Playgroud)

df_2

                A         B
bar one  0.134571  0.415209
    two -1.288889 -0.144284
baz one -0.117345 -0.095292
    two -0.256708 -0.682542
Run Code Online (Sandbox Code Playgroud)

df_merge - 当前输出

              A_x       B_x       A_y       B_y
bar one       NaN       NaN  0.134571  0.415209
    two       NaN       NaN -1.288889 -0.144284
baz one       NaN       NaN -0.117345 -0.095292
    two       NaN       NaN -0.256708 -0.682542
foo one  2.082229  1.575985       NaN       NaN
    two -0.805592  0.444195       NaN       NaN
qux one  0.368874  0.253556       NaN       NaN
    two  1.017632 -0.471978       NaN       NaN
Run Code Online (Sandbox Code Playgroud)

df_merge - 所需

              A         B       
bar one        0.134571  0.415209
    two       -1.288889 -0.144284
baz one       -0.117345 -0.095292
    two       -0.256708 -0.682542
foo one        2.082229  1.575985       
    two       -0.805592  0.444195       
qux one        0.368874  0.253556      
    two        1.017632 -0.471978       
Run Code Online (Sandbox Code Playgroud)

jez*_*ael 5

最简单的是 use concat,默认情况下,沿着'outer'特定轴连接和串联 pandas 对象(此处axis=0为默认值):

print (pd.concat([df_1,df_2]))

                A         B
foo one -0.329887 -0.966898
    two  0.552272 -1.964264
qux one -0.629764 -0.765578
    two -0.148118  0.904920
bar one  0.305685 -1.269400
    two  1.256213 -0.686447
baz one -2.194461  0.529666
    two -1.487217 -0.520045
Run Code Online (Sandbox Code Playgroud)

然后sort_index如果有必要的话:

print (pd.concat([df_1,df_2]).sort_index())

                A         B
bar one -0.463547 -0.002780
    two -0.421346 -1.730840
baz one -0.086068  1.179000
    two  0.756876 -0.492985
foo one -0.223900 -0.302643
    two  0.460265  0.216632
qux one -0.296815  0.799978
    two -0.420700  1.147312
Run Code Online (Sandbox Code Playgroud)