pandas 中的 groupby 并从输出 DataFrame 中排除 grouper 列

Question

pandas 中的 groupby 并从输出 DataFrame 中排除 grouper 列

LIB*_*LIB 4 python pandas pandas-groupby

我正在尝试对 Pandas df 进行分组，以便将键作为索引保留，但它不包含每个组中的键。

这是我的意思的一个例子。

原始数据框

ungrouped_df = pd.DataFrame({'col1':['A','A','B','C','C','C'], 'col2':[8,5,1,4,1,2], 'col3':[7,4,2,1,2,1],'col4':[1,8,0,2,0,0]})

出去：

| index | col1 | col2 | col3 | col4 |
|-------|------|------|------|------|
| 1     |    A |    8 |    7 |    1 |
| 2     |    A |    5 |    4 |    8 |
| 3     |    B |    1 |    2 |    0 |
| 4     |    C |    4 |    1 |    2 |
| 5     |    C |    1 |    2 |    0 |
| 6     |    C |    2 |    1 |    0 |

Run Code Online (Sandbox Code Playgroud)

现在，我想从分组数据帧创建一个 numpy 数组

grouped_df = ungrouped_df.groupby(by='col1', group_keys=False).apply(np.asarray)

这就是我得到的

| index | col1                                      | 
|-------|-------------------------------------------|
| A     | [[A, 8, 7, 1],[A, 5, 4, 8],[A, 8, 7, 1]]  |
| B     | [[B, 1, 2, 0]]                            |
| C     | [[C, 4, 1, 2], [C, 1, 2, 0], [C, 2, 1, 0]]|

Run Code Online (Sandbox Code Playgroud)

这是我想要的

出去：

| index | col1                             | 
|-------|----------------------------------|
| A     | [[8, 7, 1],[5, 4, 8],[8, 7, 1]]  |
| B     | [[1, 2, 0]]                      |
| C     | [[4, 1, 2], [1, 2, 0], [2, 1, 0]]|

Run Code Online (Sandbox Code Playgroud)

我可以在这里使用一些建议，因为我有点迷茫。我认为“group_keys=False”可以解决问题，但事实并非如此。任何帮助深表感谢。

谢谢

Answer 1

cs9*_*s95 5

我通常不建议将列表存储在列中，但解决此问题的最明显方法是确保不会将不需要的列分组。

您可以通过

将“col1”设置为分组前的索引，或
在分组之前删除“col1”，或
选择您要分组的列

df.set_index('col1').groupby(level=0).apply(np.array)

col1
A               [[8, 7, 1], [5, 4, 8]]
B                          [[1, 2, 0]]
C    [[4, 1, 2], [1, 2, 0], [2, 1, 0]]

Run Code Online (Sandbox Code Playgroud)

或者，

df.drop('col1', 1).groupby(df['col1']).apply(np.array)

col1
A               [[8, 7, 1], [5, 4, 8]]
B                          [[1, 2, 0]]
C    [[4, 1, 2], [1, 2, 0], [2, 1, 0]]

Run Code Online (Sandbox Code Playgroud)

或者，

(df.groupby('col1')[df.columns.difference(['col1'])]
   .apply(lambda x: x.values.tolist()))

col1
A               [[8, 7, 1], [5, 4, 8]]
B                          [[1, 2, 0]]
C    [[4, 1, 2], [1, 2, 0], [2, 1, 0]]
dtype: object

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，10 月前
查看次数：	165 次
最近记录：	4 年，10 月前