sac*_*cuL 8 python merge dataframe pandas
我有2个数据帧的情况:
test1 = pd.DataFrame({'id_A':['Ben', 'Julie', 'Jack', 'Jack'],
'id_B':['Julie', 'Ben', 'Nina', 'Julie']})
test2 = pd.DataFrame({'id_a':['Ben', 'Ben', 'Ben', 'Julie', 'Julie', 'Nina'],
'id_b':['Julie', 'Nina', 'Jack', 'Nina', 'Jack', 'Jack'],
'value':[1,1,0,0,1,0]})
>>> test1
id_A id_B
0 Ben Julie
1 Julie Ben
2 Jack Nina
3 Jack Julie
>>> test2
id_a id_b value
0 Ben Julie 1
1 Ben Nina 1
2 Ben Jack 0
3 Julie Nina 0
4 Julie Jack 1
5 Nina Jack 0
Run Code Online (Sandbox Code Playgroud)
我想要做的是合并test2
与test1
地方id_A == id_a
和id_B == id_b
OR其中id_A == id_b
和id_B == id_a
,导致下面的数据帧:
>>> final_df
id_A id_B value
0 Ben Julie 1
1 Julie Ben 1
2 Jack Nina 0
3 Jack Julie 1
Run Code Online (Sandbox Code Playgroud)
我的解决方案有效,但看起来很混乱,我想看看我是否忽略了一些更聪明的做事方式.它涉及test2
与自身连接,但反转2列感兴趣(id_a
变为id_b
反之亦然),然后从那里合并.
test3 = pd.concat([test2, test2.rename(columns = {'id_a':'id_b', 'id_b':'id_a'})])
final_df = (test1.merge(test3, left_on = ['id_A', 'id_B'],
right_on = ['id_a', 'id_b'])
.drop(['id_a', 'id_b'], axis=1))
Run Code Online (Sandbox Code Playgroud)
有没有人知道一个更简洁的方法来做到这一点?我觉得我可能会忽略一些令人惊讶的令人愉快的做事方式.
谢谢你的帮助!
和frozenset
test1.assign(
value=test1.apply(frozenset, 1).map({frozenset(a): b for *a, b in test2.values}))
id_A id_B value
0 Ben Julie 1
1 Julie Ben 1
2 Jack Nina 0
3 Jack Julie 1
Run Code Online (Sandbox Code Playgroud)
少一点可爱,多一点坚强。之后删除您需要的内容。
t1 = test1.assign(ref=list(map(frozenset, zip(test1.id_A, test1.id_B))))
t2 = test2.assign(ref=list(map(frozenset, zip(test2.id_a, test2.id_b))))
t1.merge(t2, on='ref')
id_A id_B ref id_a id_b value
0 Ben Julie (Julie, Ben) Ben Julie 1
1 Julie Ben (Julie, Ben) Ben Julie 1
2 Jack Nina (Jack, Nina) Nina Jack 0
3 Jack Julie (Jack, Julie) Julie Jack 1
Run Code Online (Sandbox Code Playgroud)