Pandas:按列分组,将列表行合并到单个列中进行分组?

n0r*_*0ro 6 python pandas pandas-groupby

我有一个 Pandas 数据框,如下所示:

import pandas as pd

f1 = [['abc', 'def'], ['ghi', 'jkl'], ['mno', 'pqr'], ['stu', 'vwx'], ['yz', 'xx'], ['yx', 'zx'], ['text', 'more'], ['stuff', 'here'], ['last', 'one']]

f2 = ['1', '1', '1', '2', '2', '2', '3', '3', '3']

groups = ['GROUP A', 'GROUP A', 'GROUP A', 'GROUP B', 'GROUP B', 'GROUP B', 'GROUP C', 'GROUP C', 'GROUP C']


df = pd.DataFrame({'Groups': groups, 'Feature 1': f1, 'Feature 2': f2})
df


# DataFrame print:
    Groups    Feature 1   Feature 2
0   GROUP A   [abc, def]      1
1   GROUP A   [ghi, jkl]      1
2   GROUP A   [mno, pqr]      1
3   GROUP B   [stu, vwx]      2
4   GROUP B   [yz, xx]    2
5   GROUP B   [yx, zx]    2
6   GROUP C   [text, more]    3
7   GROUP C   [stuff, here]   3
8   GROUP C   [last, one]     3
Run Code Online (Sandbox Code Playgroud)

我试图按“组”列对数据进行分组,这样我就可以生成如下所示的数据框:

Groups      Feature 1                                Feature 2
GROUP A     [abc, def, ghi, jkl, mno, pqr]           1
GROUP B     [stu, vwx, yz, xx, yx, zx]               2
GROUP C     [text, more, stuff, here, last, one]     3
Run Code Online (Sandbox Code Playgroud)

换句话说,我的组在“组”列中重复,每个重复对应于属于该组的单个列表。

我想消除“组”列中组的多次重复,并将与每个组关联的所有单独列表组合成一个合并列表,其中包含单行中的所有元素。

我已经尝试了groupby()一些并四处搜索,但我正在努力实施。

谢谢!

yat*_*atu 4

您可以GroupBy在包含列表的列上进行聚合,sum以连接组内的列表并Feature 2使用first

\n\n
df.groupby('Groups').agg({'Feature 1':'sum', 'Feature 2':'first'}).reset_index()\n\n   Groups                        Feature 1          Feature 2\n0  GROUP A        [abc, def, ghi, jkl, mno, pqr]         1\n1  GROUP B            [stu, vwx, yz, xx, yx, zx]         2\n2  GROUP C  [text, more, stuff, here, last, one]         3\n\xe2\x80\x8b\n
Run Code Online (Sandbox Code Playgroud)\n