New*_*bie 7 python intersection pandas
我有以下三个数据帧:
df_A = pd.DataFrame( {'id_A': [1, 1, 1, 1, 2, 2, 3, 3],
'Animal_A': ['cat','dog','fish','bird','cat','fish','bird','cat' ]})
df_B = pd.DataFrame( {'id_B': [1, 2, 2, 3, 4, 4, 5],
'Animal_B': ['dog','cat','fish','dog','fish','cat','cat' ]})
df_P = pd.DataFrame( {'id_A': [1, 1, 2, 3],
'id_B': [2, 3, 4, 5]})
df_A
id_A Animal_A
0 1 cat
1 1 dog
2 1 fish
3 1 bird
4 2 cat
5 2 fish
6 3 bird
7 3 cat
df_B
id_B Animal_B
0 1 dog
1 2 cat
2 2 fish
3 3 dog
4 4 fish
5 4 cat
6 5 cat
df_P
id_A id_B
0 1 2
1 1 3
2 2 4
3 3 5
Run Code Online (Sandbox Code Playgroud)
我想获得一个额外的列到df_P,它告诉id_A和id_B之间共享的动物数量.我在做的是:
df_P["n_common"] = np.nan
for i in df_P.index.tolist():
id_A = df_P["id_A"][i]
id_B = df_P["id_B"][i]
df_P.iloc[i,df_P.columns.get_loc('n_common')] = len(set(df_A['Animal_A'][df_A['id_A']==id_A]).intersection(df_B['Animal_B'][df_B['id_B']==id_B]))
Run Code Online (Sandbox Code Playgroud)
结果是:
df_P
id_A id_B n_common
0 1 2 2.0
1 1 3 1.0
2 2 4 2.0
3 3 5 1.0
Run Code Online (Sandbox Code Playgroud)
是否有更快,更pythonic的方式来做到这一点?有没有办法避免for循环?
不确定它是否更快或更Pythonic,但它避免了for循环:)
import pandas as pd
df_A = pd.DataFrame( {'id_A': [1, 1, 1, 1, 2, 2, 3, 3],
'Animal_A': ['cat','dog','fish','bird','cat','fish','bird','cat' ]})
df_B = pd.DataFrame( {'id_B': [1, 2, 2, 3, 4, 4, 5],
'Animal_B': ['dog','cat','fish','dog','fish','cat','cat' ]})
df_P = pd.DataFrame( {'id_A': [1, 1, 2, 3],
'id_B': [2, 3, 4, 5]})
df = pd.merge(df_A, df_P, on='id_A')
df = pd.merge(df_B, df, on='id_B')
df = df[df['Animal_A'] == df['Animal_B']].groupby(['id_A', 'id_B'])['Animal_A'].count().reset_index()
df.rename({'Animal_A': 'n_common'},inplace=True,axis=1)
Run Code Online (Sandbox Code Playgroud)