熊猫：如何获取包含值列表的列的唯一值？

Question

熊猫：如何获取包含值列表的列的唯一值？

考虑以下数据框

df = pd.DataFrame({'name' : [['one two','three four'], ['one'],[], [],['one two'],['three']],
                   'col' : ['A','B','A','B','A','B']})       
df.sort_values(by='col',inplace=True)

df
Out[62]: 
  col                   name
0   A  [one two, three four]
2   A                     []
4   A              [one two]
1   B                  [one]
3   B                     []
5   B                [three]

Run Code Online (Sandbox Code Playgroud)

我想获得一列，以跟踪的name每个组合所包含的所有唯一字符串col。

也就是说，预期的输出是

df
Out[62]: 
  col                   name    unique_list
0   A  [one two, three four]    [one two, three four]
2   A                     []    [one two, three four]
4   A              [one two]    [one two, three four]
1   B                  [one]    [one, three]
3   B                     []    [one, three]
5   B                [three]    [one, three]

Run Code Online (Sandbox Code Playgroud)

事实上，说A组，你可以看到，唯一的一组字符串中包含的[one two, three four]，[]并且 [one two]是 [one two]

我可以使用Pandas获得相应数量的唯一值：当单元格包含列表时，如何获得单元格中唯一值的数量？：

df['count_unique']=df.groupby('col')['name'].transform(lambda x: list(pd.Series(x.apply(pd.Series).stack().reset_index(drop=True, level=1).nunique())))


df
Out[65]: 
  col                   name count_unique
0   A  [one two, three four]            2
2   A                     []            2
4   A              [one two]            2
1   B                  [one]            2
3   B                     []            2
5   B                [three]            2

Run Code Online (Sandbox Code Playgroud)

但用nunique与unique上面的失败。

有任何想法吗？谢谢！

Answer 1

piR*_*red 4

这是解决方案

df['unique_list'] = df.col.map(df.groupby('col')['name'].sum().apply(np.unique))
    df

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，1 月前
查看次数：	1834 次
最近记录：	9 年，1 月前