Pandas按groupby求和,但不包括某些列

Question

Pandas按groupby求和,但不包括某些列

use*_*827 76 python group-by aggregate pandas

在Pandas数据帧上进行groupby的最佳方法是什么,但从该组中排除某些列？例如,我有以下数据帧:

Code   Country      Item_Code   Item    Ele_Code    Unit    Y1961    Y1962   Y1963
2      Afghanistan  15          Wheat   5312        Ha      10       20      30
2      Afghanistan  25          Maize   5312        Ha      10       20      30
4      Angola       15          Wheat   7312        Ha      30       40      50
4      Angola       25          Maize   7312        Ha      30       40      50

Run Code Online (Sandbox Code Playgroud)

我想通过列Country和Item_Code进行分组,并且只计算落在Y1961,Y1962和Y1963列下的行的总和.生成的数据框应如下所示:

Code   Country      Item_Code   Item    Ele_Code    Unit    Y1961    Y1962   Y1963
2      Afghanistan  15          C3      5312        Ha      20       40       60
4      Angola       25          C4      7312        Ha      60       80      100

Run Code Online (Sandbox Code Playgroud)

现在我这样做:

df.groupby('Country').sum()

Run Code Online (Sandbox Code Playgroud)

但是,这也会将Item_Code列中的值相加.有什么方法可以指定要在sum()操作中包含哪些列以及要排除哪些列？

Answer 1

And*_*den 104

您可以选择groupby的列:

In [11]: df.groupby(['Country', 'Item_Code'])[["Y1961", "Y1962", "Y1963"]].sum()
Out[11]:
                       Y1961  Y1962  Y1963
Country     Item_Code
Afghanistan 15            10     20     30
            25            10     20     30
Angola      15            30     40     50
            25            30     40     50

Run Code Online (Sandbox Code Playgroud)

请注意,传递的列表必须是列的子集,否则您将看到KeyError.

Answer 2

ler*_*yJr 34

该agg功能将为您完成此操作.传递列并作为带有列的dict函数,输出:

df.groupby(['Country', 'Item_Code']).agg({'Y1961': np.sum, 'Y1962': [np.sum, np.mean]})  # Added example for two output columns from a single input column

Run Code Online (Sandbox Code Playgroud)

这将仅按列显示组和指定的聚合列.在这个例子中,我包括两个应用于'Y1962'的agg函数.

要获得您希望看到的内容,请在组中包含其他列,并将和应用于框架中的Y变量:

df.groupby(['Code', 'Country', 'Item_Code', 'Item', 'Ele_Code', 'Unit']).agg({'Y1961': np.sum, 'Y1962': np.sum, 'Y1963': np.sum})

Run Code Online (Sandbox Code Playgroud)

Answer 3

Sup*_*tar 11

如果您正在寻找一种更通用的方法来应用于许多列,您可以做的是构建列名列表并将其作为分组数据帧的索引传递.在您的情况下,例如:

columns = ['Y'+str(i) for year in range(1967, 2011)]

df.groupby('Country')[columns].agg('sum')

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，2 月前
查看次数：	131351 次
最近记录：	6 年，9 月前