use*_*916 5 python dataframe pandas
假设我们有两个Pandas DataFrame,如下所示:
df1 = pd.DataFrame({'id': ['a', 'b', 'c']})
df1
id
0 a
1 b
2 c
df2 = pd.DataFrame({'ids': [['b','c'], ['a', 'b'], ['a', 'z']],
'info': ['asdf', 'zxcv', 'sdfg']})
df2
ids info
0 [b, c] asdf
1 [a, b] zxcv
2 [a, z] sdfg
Run Code Online (Sandbox Code Playgroud)
我如何加入/合并的行df1与df2地方df1.id是df2.ids?
换句话说,我如何实现以下目标:
df3
id ids info
0 a [a, b] asdf
1 a [a, z] sdfg
2 b [b, c] asdf
3 b [a, b] zxcv
4 c [b, c] asdf
Run Code Online (Sandbox Code Playgroud)
还有上面聚合的版本,如下所示id:
df3
id ids info
0 a [[a, b], [a, z]] [asdf, sdfg]
2 b [[a, b], [b, c]] [asdf, zxcv]
3 c [[b, c]] [asdf]
Run Code Online (Sandbox Code Playgroud)
我尝试了以下方法:
df1.merge(df2, how = 'left', left_on = 'id', right_on = 'ids')
TypeError: unhashable type: 'list'
df1.id.isin(df2.ids)
TypeError: unhashable type: 'list'
Run Code Online (Sandbox Code Playgroud)
使用 -
df2[['id1','id2']] = pd.DataFrame(df2.ids.values.tolist(), index= df2.index)
new_df1 = pd.merge(df1, df2, how='inner', left_on=['id'], right_on = ['id1'])
new_df2 = pd.merge(df1, df2, how='inner', left_on=['id'], right_on = ['id2'])
new_df = new_df1.append(new_df2)[['id','ids','info']]
Run Code Online (Sandbox Code Playgroud)
输出
id ids info
0 a [a, b] zxcv
1 a [a, z] sdfg
2 b [b, c] asdf
0 b [a, b] zxcv
1 c [b, c] asdf
Run Code Online (Sandbox Code Playgroud)
聚合部分
new_df.groupby('id')['ids', 'info'].agg(lambda x: list(x))
Run Code Online (Sandbox Code Playgroud)
输出
ids info
id
a [[a, b], [a, z]] [zxcv, sdfg]
b [[b, c], [a, b]] [asdf, zxcv]
c [[b, c]] [asdf]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
241 次 |
| 最近记录: |