我有两个pandas数据帧,df1和df2.我想创建一个数据帧df3,其中包含使用df1中的一列和df2中的一列的所有组合.无效率地执行此操作的伪代码将是这样的:
df3 = []
for i in df1:
for j in df2:
df3.append(i + j) # where i + j is the row with the combined cols from df1 and df2
Run Code Online (Sandbox Code Playgroud)
这是df1的格式:
df1_id other_data_1 other_data_2
1 0 1
2 1 5
Run Code Online (Sandbox Code Playgroud)
DF2:
df2_id other_data_3 other_data_4
1 0 1
3 2 2
Run Code Online (Sandbox Code Playgroud)
目标是为df3获取此输出:
df1_id df2_id other_data_1 other_data_2 other_data_3 other_data_4
1 1 0 1 0 1
1 3 0 1 2 2
2 1 1 5 0 1
2 3 1 5 2 2
Run Code Online (Sandbox Code Playgroud)
Sco*_*ton 10
在两个数据帧之间设置公共密钥并使用pd.merge
:
df1['key'] = 1
df2['key'] = 1
Run Code Online (Sandbox Code Playgroud)
合并和删除键列:
df3 = pd.merge(df1,df2,on='key').drop('key',axis=1)
df3
Run Code Online (Sandbox Code Playgroud)
输出:
df1_id other_data_1 other_data_2 df2_id other_data_3 other_data_4
0 1 0 1 1 0 1
1 1 0 1 3 2 2
2 2 1 5 1 0 1
3 2 1 5 3 2 2
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1692 次 |
最近记录: |