熊猫:如何获取包含值列表的列的唯一值?

ℕʘʘ*_*ḆḽḘ 5 python pandas

考虑以下数据框

df = pd.DataFrame({'name' : [['one two','three four'], ['one'],[], [],['one two'],['three']],
                   'col' : ['A','B','A','B','A','B']})       
df.sort_values(by='col',inplace=True)

df
Out[62]: 
  col                   name
0   A  [one two, three four]
2   A                     []
4   A              [one two]
1   B                  [one]
3   B                     []
5   B                [three]
Run Code Online (Sandbox Code Playgroud)

我想获得一列,以跟踪的name每个组合所包含的所有唯一字符串col

也就是说,预期的输出是

df
Out[62]: 
  col                   name    unique_list
0   A  [one two, three four]    [one two, three four]
2   A                     []    [one two, three four]
4   A              [one two]    [one two, three four]
1   B                  [one]    [one, three]
3   B                     []    [one, three]
5   B                [three]    [one, three]
Run Code Online (Sandbox Code Playgroud)

事实上,说A组,你可以看到,唯一的一组字符串中包含的[one two, three four][]并且 [one two][one two]

我可以使用Pandas获得相应数量的唯一值:当单元格包含列表时,如何获得单元格中唯一值的数量?

df['count_unique']=df.groupby('col')['name'].transform(lambda x: list(pd.Series(x.apply(pd.Series).stack().reset_index(drop=True, level=1).nunique())))


df
Out[65]: 
  col                   name count_unique
0   A  [one two, three four]            2
2   A                     []            2
4   A              [one two]            2
1   B                  [one]            2
3   B                     []            2
5   B                [three]            2
Run Code Online (Sandbox Code Playgroud)

但用nuniqueunique上面的失败。

有任何想法吗?谢谢!

piR*_*red 4

这是解决方案

df['unique_list'] = df.col.map(df.groupby('col')['name'].sum().apply(np.unique))
    df
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述