Pandas:如果df1列的值在df2列的列表中,则加入

use*_*916 5 python dataframe pandas

假设我们有两个Pandas DataFrame,如下所示:

df1 = pd.DataFrame({'id': ['a', 'b', 'c']})
df1
    id
0   a
1   b
2   c

df2 = pd.DataFrame({'ids': [['b','c'], ['a', 'b'], ['a', 'z']], 
                    'info': ['asdf', 'zxcv', 'sdfg']})
df2
    ids     info
0   [b, c]  asdf
1   [a, b]  zxcv
2   [a, z]  sdfg
Run Code Online (Sandbox Code Playgroud)

我如何加入/合并的行df1df2地方df1.iddf2.ids

换句话说,我如何实现以下目标:

df3
   id   ids     info
0  a    [a, b]  asdf
1  a    [a, z]  sdfg
2  b    [b, c]  asdf
3  b    [a, b]  zxcv
4  c    [b, c]  asdf
Run Code Online (Sandbox Code Playgroud)

还有上面聚合的版本,如下所示id:

df3
   id   ids               info
0  a    [[a, b], [a, z]]  [asdf, sdfg]
2  b    [[a, b], [b, c]]  [asdf, zxcv]
3  c    [[b, c]]          [asdf]
Run Code Online (Sandbox Code Playgroud)

我尝试了以下方法:

df1.merge(df2, how = 'left', left_on = 'id', right_on = 'ids')
TypeError: unhashable type: 'list'

df1.id.isin(df2.ids)
TypeError: unhashable type: 'list'
Run Code Online (Sandbox Code Playgroud)

Viv*_*gan 0

使用 -

df2[['id1','id2']] = pd.DataFrame(df2.ids.values.tolist(), index= df2.index)
new_df1 = pd.merge(df1, df2,  how='inner', left_on=['id'], right_on = ['id1'])
new_df2 = pd.merge(df1, df2,  how='inner', left_on=['id'], right_on = ['id2'])
new_df = new_df1.append(new_df2)[['id','ids','info']]
Run Code Online (Sandbox Code Playgroud)

输出

id  ids info
0   a   [a, b]  zxcv
1   a   [a, z]  sdfg
2   b   [b, c]  asdf
0   b   [a, b]  zxcv
1   c   [b, c]  asdf
Run Code Online (Sandbox Code Playgroud)

聚合部分

new_df.groupby('id')['ids', 'info'].agg(lambda x: list(x))
Run Code Online (Sandbox Code Playgroud)

输出

ids info
id      
a   [[a, b], [a, z]]    [zxcv, sdfg]
b   [[b, c], [a, b]]    [asdf, zxcv]
c   [[b, c]]    [asdf]
Run Code Online (Sandbox Code Playgroud)