Pandas 数据框组：对一列求和，从其他列中获取第一个元素

Question

Pandas 数据框组：对一列求和，从其他列中获取第一个元素

Bar*_*ich 5 python group-by dataframe pandas

我有一个熊猫数据框

x = pd.DataFrame.from_dict({'row':[1, 1, 2, 2, 3, 3, 3], 'add': [1, 2, 3, 4, 5, 6, 7], 'take1': ['a', 'b', 'c', 'd', 'e', 'f', 'g'], 'take2': ['11', '22', '33', '44', '55', '66', '77'], 'range': [100, 200, 300, 400, 500, 600, 700]})


   add  range  row take1 take2
0    1    100    1     a    11
1    2    200    1     b    22
2    3    300    2     c    33
3    4    400    2     d    44
4    5    500    3     e    55
5    6    600    3     f    66
6    7    700    3     g    77

Run Code Online (Sandbox Code Playgroud)

我想按row列对其进行分组，然后将add列中的条目相加，但从take1and 中取出第一个条目take2，然后从范围中选择最小值和最大值：

   add    row take1 take2  min_range   max_range
0    3      1     a    11    100        200
1    7      2     c    33    300        400
2    18     3     e    55    500        700

Run Code Online (Sandbox Code Playgroud)

Answer 1

jez*_*ael 7

按 dict使用DataFrameGroupBy.agg，但需要进行一些清理，因为进入MultiIndex列：

#create a dictionary of column names and functions to apply to that column

d = {'add':'sum', 'take1':'first', 'take2':'first', 'range':['min','max']}

#group by the row column and apply the corresponding aggregation to each 
#column as specified in the dictionary d
df = x.groupby('row', as_index=False).agg(d)

#rename some columns
df = df.rename(columns={'first':'', 'sum':''})
df.columns = ['{0[0]}_{0[1]}'.format(x).strip('_') for x in df.columns] 
print (df)
   row take1  range_min  range_max take2  add
0    1     a        100        200    11    3
1    2     c        300        400    33    7
2    3     e        500        700    55   18

Run Code Online (Sandbox Code Playgroud)

详细信息：根据字典中指定的函数聚合列：

df = x.groupby('row', as_index=False).agg(d)

Run Code Online (Sandbox Code Playgroud)

行范围 take2 take1 添加
        最小 最大 第一个总和
0 1 100 200 11a 3
1 2 300 400 33 c 7
2 3 500 700 55 e 18

替换列名sum和first将''导致

行范围 take2 take1 添加
        最小最大                
0 1 100 200 11a 3
1 2 300 400 33 c 7
2 3 500 700 55 e 18

使用字符串格式化程序对列进行列表理解将获得所需的列名称。将其分配给df.columns将获得所需的输出。

归档时间：	8 年，3 月前
查看次数：	3170 次
最近记录：	8 年，3 月前