Python:Pandas错误地排除了groupby中的列

Vik*_*h B 6 python pandas

我已经看到熊猫无声地排除了讨厌的栏目,如下所述:Pandas Nuisance专栏

它声称如果无法将聚合函数应用于列,它将以静默方式排除列.

请考虑以下示例:

我有一个数据框:

df = pd.DataFrame({'C': {0: -0.91985400000000006, 1: -0.042379, 2: 1.2476419999999999, 3: -0.00992, 4: 0.290213, 5: 0.49576700000000001, 6: 0.36294899999999997, 7: 1.548106}, 'A': {0: 'foo', 1: 'bar', 2: 'foo', 3: 'bar', 4: 'foo', 5: 'bar', 6: 'foo', 7: 'foo'}, 'B': {0: -1.131345, 1: -0.089328999999999992, 2: 0.33786300000000002, 3: -0.94586700000000001, 4: -0.93213199999999996, 5: 1.9560299999999999, 6: 0.017587000000000002, 7: -0.016691999999999999}})

df:
     A      B           C
0   foo -1.131345   -0.919854
1   bar -0.089329   -0.042379
2   foo 0.337863    1.247642
3   bar -0.945867   -0.009920
4   foo -0.932132   0.290213
5   bar 1.956030    0.495767
6   foo 0.017587    0.362949
7   foo -0.016692   1.548106
Run Code Online (Sandbox Code Playgroud)

让我将两列B和C组合并转换为numpy ndarray:

df = df.assign(D=df[['B', 'C']].values.tolist())
df['D'] = df['D'].apply(np.array)

df:

     A       B          C                   D
0   foo -1.131345   -0.919854   [-1.131345, -0.9198540000000001]
1   bar -0.089329   -0.042379   [-0.08932899999999999, -0.042379]
2   foo 0.337863    1.247642    [0.337863, 1.247642]
3   bar -0.945867   -0.009920   [-0.945867, -0.00992]
4   foo -0.932132   0.290213    [-0.932132, 0.290213]
5   bar 1.956030    0.495767    [1.95603, 0.495767]
6   foo 0.017587    0.362949    [0.017587000000000002, 0.36294899999999997]
7   foo -0.016692   1.548106    [-0.016692, 1.548106]
Run Code Online (Sandbox Code Playgroud)

现在我可以将均值应用于D列:

print(df['D'].mean())
print(df['B'].mean())
print(df['C'].mean())

[-0.10048563  0.3715655 ]
-0.100485625
0.3715655
Run Code Online (Sandbox Code Playgroud)

但是当我尝试用A组合并获得平均值时,D列就会被删除:

df.groupby('A').mean()

        B         C
 A      
bar  0.306945   0.147823
foo  -0.344944  0.505811
Run Code Online (Sandbox Code Playgroud)

我的问题是,为什么列D被排除在外,即使聚合函数可以成功应用?

而且,一般来说,当一个特定的感兴趣的列是一个numpy数组时,我如何使用像mean或sum这样的聚合函数?

jez*_*ael 2

是否可以,但需要if-else自定义功能:

def f(x):
    a = x.mean()
    return a if isinstance(a, (float, int)) else list(a)

df = df.groupby('A').agg(f)
print (df)
            B         C                                 D
A                                                        
bar  0.306945  0.147823  [0.306944666667, 0.147822666667]
foo -0.344944  0.505811           [-0.3449438, 0.5058112]
Run Code Online (Sandbox Code Playgroud)
df = df.groupby('A').agg(lambda x: x.mean())
print (df)
            B         C   D
A                          
bar  0.306945  0.147823 NaN
foo -0.344944  0.505811 NaN
Run Code Online (Sandbox Code Playgroud)