Fun*_*der 5 python merge concat dataframe pandas
我有三个数据框。它们都有一个公共列,我需要根据公共列合并它们,而不会丢失任何数据
输入项
>>> df1 0 Col1 Col2 Col3 1个数据1 3 4 2个数据2 4 3 3个数据3 2 3 4个数据4 2 4 5个数据5 1 4 >>> df2 0 Col1 Col4 Col5 1个数据1 7 4 2个数据2 6 9 3个数据3 1 4 >>> df3 0 Col1 Col6 Col7 1个数据2 5 8 2个数据3 2 7 3个数据5 5 3
预期产量
>>> df 0 Col1 Col2 Col3 Col4 Col5 Col6 Col7 1个数据1 3 4 7 4 2个数据2 4 3 6 9 5 8 3个数据3 2 3 1 4 2 7 4个数据4 2 4 5个数据5 1 4 5 3
San*_*apa 12
使用pd.concat:
df1.set_index('Col1',inplace=True)
df2.set_index('Col1',inplace=True)
df3.set_index('Col1',inplace=True)
df = pd.concat([df1,df2,df3],axis=1,sort=False).reset_index()
df.rename(columns = {'index':'Col1'})
Col1 Col2 Col3 Col4 Col5 Col6 Col7
0 data1 3 4 7.0 4.0 NaN NaN
1 data2 4 3 6.0 9.0 5.0 8.0
2 data3 2 3 1.0 4.0 2.0 7.0
3 data4 2 4 NaN NaN NaN NaN
4 data5 1 4 NaN NaN 5.0 3.0
Run Code Online (Sandbox Code Playgroud)
Zer*_*ero 10
使用merge和reduce
In [86]: from functools import reduce
In [87]: reduce(lambda x,y: pd.merge(x,y, on='Col1', how='outer'), [df1, df2, df3])
Out[87]:
Col1 Col2 Col3 Col4 Col5 Col6 Col7
0 data1 3 4 7.0 4.0 NaN NaN
1 data2 4 3 6.0 9.0 5.0 8.0
2 data3 2 3 1.0 4.0 2.0 7.0
3 data4 2 4 NaN NaN NaN NaN
4 data5 1 4 NaN NaN 5.0 3.0
Run Code Online (Sandbox Code Playgroud)
细节
In [88]: df1
Out[88]:
Col1 Col2 Col3
0 data1 3 4
1 data2 4 3
2 data3 2 3
3 data4 2 4
4 data5 1 4
In [89]: df2
Out[89]:
Col1 Col4 Col5
0 data1 7 4
1 data2 6 9
2 data3 1 4
In [90]: df3
Out[90]:
Col1 Col6 Col7
0 data2 5 8
1 data3 2 7
2 data5 5 3
Run Code Online (Sandbox Code Playgroud)
你可以做
df1.merge(df2, how='left', left_on='Col1', right_on='Col1').merge(df3, how='left', left_on='Col1', right_on='Col1')