Jar*_*rek 3 python merge dataframe pandas
我有两个数据框:
df1 = pd.DataFrame(data =
{'Invoice' : [1, 2, 3, 4, 5], 'Value' : [10, 25, 40, 10, 15]})
df2 = pd.DataFrame(data =
{'Invoice' : [2, 3, 5, 2], 'Value' : [25, 11, 15,25], 'TestData':["A",'B','C','D']})
Run Code Online (Sandbox Code Playgroud)
我已经合并了它们并得到df3:
df3=pd.merge(df1,df2, left_on=["Invoice","Value"], right_on=["Invoice","Value"])
Run Code Online (Sandbox Code Playgroud)
Df3输出:
Invoice Value TestData
0 2 25 A
1 2 25 D
2 5 15 C
Run Code Online (Sandbox Code Playgroud)
我的问题是如何以“一对一”方式合并数据框(我的意思是-当2号发票中的2号发票仅出现一次(或通常较少)时,然后不要在其中创建2号发票的另一行合并的数据框)。我想得到这样的东西:
Invoice Value TestData
0 2 25 A
1 5 15 C
Run Code Online (Sandbox Code Playgroud)
或这个:
Invoice Value TestData
0 2 25 D
1 5 15 C
Run Code Online (Sandbox Code Playgroud)
我只尝试左右合并,但这不起作用-总是有两行发票编号为2。
谢谢你
Jarek
使用drop_duplicates与指定的列名,参数keep='last'是最后一行重复:
df2 = df2.drop_duplicates(["Invoice","Value"])
#same as
#df2 = df2.drop_duplicates(["Invoice","Value"], keep='first')
df3=pd.merge(df1,df2, on=["Invoice","Value"])
print (df3)
Invoice Value TestData
0 2 25 A
1 5 15 C
Run Code Online (Sandbox Code Playgroud)
df2 = df2.drop_duplicates(["Invoice","Value"], keep='last')
df3=pd.merge(df1,df2, on=["Invoice","Value"])
print (df3)
Invoice Value TestData
0 2 25 D
1 5 15 C
Run Code Online (Sandbox Code Playgroud)
编辑:
如果需要按所有行分组,则必须添加新列以确保唯一性:
df1['g'] = df1.groupby(['Invoice','Value']).cumcount()
df2['g'] = df2.groupby(['Invoice','Value']).cumcount()
print (df1)
Invoice Value g
0 1 10 0
1 2 25 0
2 3 40 0
3 4 10 0
4 5 15 0
print (df2)
Invoice TestData Value g
0 2 A 25 0
1 3 B 11 0
2 5 C 15 0
3 2 D 25 1
df3=pd.merge(df1,df2, on=["Invoice","Value", "g"]).drop('g', axis=1)
print (df3)
Invoice Value TestData
0 2 25 A
1 5 15 C
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1927 次 |
| 最近记录: |