熊猫聚合带有动态列名

Question

熊猫聚合带有动态列名

Mar*_*kum 22 python aggregate pandas pandas-groupby

我有一个脚本，该脚本生成具有数量不定的值列的熊猫数据框。例如，此df可能是

import pandas as pd
df = pd.DataFrame({
'group': ['A', 'A', 'A', 'B', 'B'],
'group_color' : ['green', 'green', 'green', 'blue', 'blue'],
'val1': [5, 2, 3, 4, 5], 
'val2' : [4, 2, 8, 5, 7]
})

  group group_color  val1  val2
0     A       green     5     4
1     A       green     2     2
2     A       green     3     8
3     B        blue     4     5
4     B        blue     5     7

Run Code Online (Sandbox Code Playgroud)

My goal is to get the grouped mean for each of the value columns. In this specific case (with 2 value columns), I can use

df.groupby('group').agg({"group_color": "first", "val1": "mean", "val2": "mean"})

      group_color      val1      val2
group                                
A           green  3.333333  4.666667
B            blue  4.500000  6.000000

Run Code Online (Sandbox Code Playgroud)

but that does not work when the data frame in question has more value columns (val3, val4 etc.). Is there a way to dynamically take the mean of "all the other columns" or "all columns containing val in their names"?

Answer 1

WeN*_*Ben 14

More easy like

df.groupby('group').agg(lambda x : x.head(1) if x.dtype=='object' else x.mean())
Out[63]: 
      group_color      val1      val2
group                                
A           green  3.333333  4.666667
B            blue  4.500000  6.000000

Run Code Online (Sandbox Code Playgroud)

Answer 2

Qua*_*ang 7

If your group_color is always the same within one group, you can do:

df.pivot_table(index=['group','group_color'],aggfunc='mean')

Run Code Online (Sandbox Code Playgroud)

Output:

                       val1      val2
group group_color                    
A     green        3.333333  4.666667
B     blue         4.500000  6.000000

Run Code Online (Sandbox Code Playgroud)

In the other case, you can build the dictionary and pass it to agg:

agg_dict = {f: 'first' if f=='group_color' else 'mean' for f in df.columns[1:]}
df.groupby('group').agg(agg_dict)

Run Code Online (Sandbox Code Playgroud)

Which output:

      group_color      val1      val2
group                                
A           green  3.333333  4.666667
B            blue  4.500000  6.000000

Run Code Online (Sandbox Code Playgroud)

Answer 3

yat*_*atu 6

Unfortunately you will have to apply both aggregation functions separately (that or repeat "valn": "mean" as many times as valx columns). Groupby.agg can take a dictionary but the keys must be individual columns.

The way I'd do this is using DataFrame.filter to select the subset of the dataframe with the columns following the format of valx, aggregate with the mean, and then assign new columns with the aggregated results on the other columns:

(df.filter(regex=r'^val').groupby(df.group).mean()
   .assign(color = df.group_color.groupby(df.group).first()))

         val1      val2    color
group                           
A      3.333333  4.666667  green
B      4.500000  6.000000   blue

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，5 月前
查看次数：	551 次
最近记录：	6 年，5 月前