Pandas:基于另一列的过滤器聚合

Tuu*_*nas 5 python aggregate pandas

我有一个看起来像这样的数据框

Month   Fruit   Sales
1       Apple   45
1       Bananas 12
3       Apple   6
1       Kiwi    34
12      Melon   12
Run Code Online (Sandbox Code Playgroud)

我正在尝试获得这样的数据帧

Fruit         Sales (month=1)     Sales (month=2)
Apple         55                  65
Bananas       12                  102
Kiwi          54                  78
Melon         132                 43
Run Code Online (Sandbox Code Playgroud)

现在我有

df=df.groupby(['Fruit']).agg({'Sales':np.sum}).reset_index()
Run Code Online (Sandbox Code Playgroud)

必须有一些方法可以根据"Month"变量过滤agg()中的参数.我只是无法在文档中找到它.有帮助吗?

编辑:感谢您的解决方案.为了使事情复杂化,我想总结另一个专栏.例:

Month    Fruit    Sales  Revenue
1       Apple    45     45
1       Bananas  12     12
3       Apple    6      6
1       Kiwi     34     34
12      Melon    12     12
Run Code Online (Sandbox Code Playgroud)

首选输出类似于

            Sales      Revenue
     Fruit   1  3  12  1   3  12
0    Apple  61  6   0  61  6  0
1  Bananas  12  6   0  12  6  0
2     Kiwi  34  0   0  34  0  0
3    Melon   0  0  12  0   0  12
Run Code Online (Sandbox Code Playgroud)

我设法得到了这个df.pivot_table(values=['Sales','Revenue'], index='Fruit', columns=['Month'], aggfunc='np.sum').reset_index(),所以我的问题得到了解决.

我尝试了同样的事情df.groupby(['Fruit', 'Month'])['Sales','Revenue'].sum().unstack('Month', fill_value=0).rename_axis(None, 1).reset_index(),但这会抛出一个TypeError.上述操作也可以完成groupby吗?

dot*_*tcs 4

要回答更新的问题,您应该做一些不同的事情。首先按后面应为列的元素(月份和水果)进行分组。然后计算这些组的总和,然后取消堆叠 DataFrame,将 Fruit 列保留为索引列。

data = '''
Month    Fruit   Sales  Revenue
1       Apple    45     45
1       Bananas  12     12
1       Apple    16     16
3       Apple    6      6
1       Kiwi     34     34
3       Bananas  6      6
12      Melon    12     12
'''
df = pd.read_csv(StringIO(data), sep='\s+')

df.groupby(['Month', 'Fruit'])\
    .sum()\
    .unstack(level=0)
Run Code Online (Sandbox Code Playgroud)

结果

        Sales            Revenue           
Month      1    3     12      1    3     12
Fruit                                      
Apple    61.0  6.0   NaN    61.0  6.0   NaN
Bananas  12.0  6.0   NaN    12.0  6.0   NaN
Kiwi     34.0  NaN   NaN    34.0  NaN   NaN
Melon     NaN  NaN  12.0     NaN  NaN  12.0
Run Code Online (Sandbox Code Playgroud)

旧答案

使用pivot_table方法:

import pandas as pd
from io import StringIO

data = '''\
Month Fruit  Sales
1       Apple   45
1       Bananas 12
1       Apple   16
3       Apple   6
1       Kiwi    34
3       Bananas 6
12      Melon   12
'''
df = pd.read_csv(StringIO(data), sep='\s+')

df.pivot_table('Sales', index='Fruit', columns=['Month'], aggfunc='sum')
Run Code Online (Sandbox Code Playgroud)

结果:

Month      1    3     12
Fruit                   
Apple    61.0  6.0   NaN
Bananas  12.0  6.0   NaN
Kiwi     34.0  NaN   NaN
Melon     NaN  NaN  12.0
Run Code Online (Sandbox Code Playgroud)