Python:根据pandas数据帧中的两列(变量)获取频率计数

ema*_*max 62 python group-by dataframe pandas

您好我有以下数据帧.

    Group           Size

    Short          Small
    Short          Small
    Moderate       Medium
    Moderate       Small
    Tall           Large
Run Code Online (Sandbox Code Playgroud)

我想计算同一行在数据帧中出现的时间的频率.

    Group           Size      Time

    Short          Small        2
    Moderate       Medium       1 
    Moderate       Small        1
    Tall           Large        1
Run Code Online (Sandbox Code Playgroud)

And*_*den 98

你可以使用groupby size:

In [11]: df.groupby(["Group", "Size"]).size()
Out[11]:
Group     Size
Moderate  Medium    1
          Small     1
Short     Small     2
Tall      Large     1
dtype: int64

In [12]: df.groupby(["Group", "Size"]).size().reset_index(name="Time")
Out[12]:
      Group    Size  Time
0  Moderate  Medium     1
1  Moderate   Small     1
2     Short   Small     2
3      Tall   Large     1
Run Code Online (Sandbox Code Playgroud)

  • 谢谢.根据频率("时间")选择前k(= 20)值的一个小的补充:df.groupby(["Group","Size"]).size().reset_index(name ="Time") .sort_values(由= '时间',升序=假).头(20); (6认同)

WeN*_*Ben 33

你也可以试试 pd.crosstab()

Group           Size

Short          Small
Short          Small
Moderate       Medium
Moderate       Small
Tall           Large

pd.crosstab(df.Group,df.Size)


Size      Large  Medium  Small
Group                         
Moderate      0       1      1
Short         0       0      2
Tall          1       0      0
Run Code Online (Sandbox Code Playgroud)

编辑:为了让你的出局

pd.crosstab(df.Group,df.Size).replace(0,np.nan).\
     stack().reset_index().rename(columns={0:'Time'})
Out[591]: 
      Group    Size  Time
0  Moderate  Medium   1.0
1  Moderate   Small   1.0
2     Short   Small   2.0
3      Tall   Large   1.0
Run Code Online (Sandbox Code Playgroud)

  • 不错.你甚至可以添加`marginins = True`来获得边际数量! (5认同)
  • 大熊猫.不断给予的礼物. (5认同)

小智 5

其他可能性是使用.pivot_table()aggfunc='size'

df_solution = df.pivot_table(index=['Group','Size'], aggfunc='size')
Run Code Online (Sandbox Code Playgroud)