根据列值加入pandas数据帧

Question

根据列值加入pandas数据帧

fre*_*rie 8 python mysql sql dataframe pandas

我对pandas数据帧很新,我遇到了加入两个表的麻烦.

第一个df只有3列:

DF1:
item_id    position    document_id
336        1           10
337        2           10
338        3           10
1001       1           11
1002       2           11
1003       3           11
38         10          146

Run Code Online (Sandbox Code Playgroud)

而第二列完全相同的两列(还有很多其他列):

DF2
item_id    document_id    col1    col2   col3    ...
337        10             ...     ...    ...
1002       11             ...     ...    ...
1003       11             ...     ...    ...

Run Code Online (Sandbox Code Playgroud)

我需要的是执行一个在SQL中看起来如下的操作:

DF1 join DF2 on 
DF1.document_id = DF2.document_id
and
DF1.item_id = DF2.item_id

Run Code Online (Sandbox Code Playgroud)

因此,我希望看到DF2,并补充列"位置":

item_id    document_id    position    col1   col2   col3   ...

Run Code Online (Sandbox Code Playgroud)

用熊猫做这个的好方法是什么？

谢谢!

Answer 1

jez*_*ael 17

我认为你需要merge使用默认inner连接,但是在两列中都没有重复的值组合:

print (df2)
   item_id  document_id col1  col2  col3
0      337           10    s     4     7
1     1002           11    d     5     8
2     1003           11    f     7     0

df = pd.merge(df1, df2, on=['document_id','item_id'])
print (df)
   item_id  position  document_id col1  col2  col3
0      337         2           10    s     4     7
1     1002         2           11    d     5     8
2     1003         3           11    f     7     0

Run Code Online (Sandbox Code Playgroud)

但如果必要的position列位置3:

df = pd.merge(df2, df1, on=['document_id','item_id'])
cols = df.columns.tolist()
df = df[cols[:2] + cols[-1:] + cols[2:-1]]
print (df)
   item_id  document_id  position col1  col2  col3
0      337           10         2    s     4     7
1     1002           11         2    d     5     8
2     1003           11         3    f     7     0

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，6 月前
查看次数：	14055 次
最近记录：	8 年，6 月前