cs9*_*s95 3 python group-by dataframe pandas
这是r中提出的关于数据帧的另一个很好的问题,它将受益于pandas解决方案。这是问题所在。
我想计算is
country的次数和is的次数。然后计算每。statusopenstatusclosedcloseratecountry数据:
Run Code Online (Sandbox Code Playgroud)customer country closeday status 1 1 BE 2017-08-23 closed 2 2 NL 2017-08-05 open 3 3 NL 2017-08-22 closed 4 4 NL 2017-08-26 closed 5 5 BE 2017-08-25 closed 6 6 NL 2017-08-13 open 7 7 BE 2017-08-30 closed 8 8 BE 2017-08-05 open 9 9 NL 2017-08-23 closed这个想法是获得一个描述数量
open和closed状态的输出,以及close_ratio. 这是所需的输出:Run Code Online (Sandbox Code Playgroud)country closed open closed_ratio BE 3 1 0.75 NL 3 2 0.60期待您的建议。
解决方案包含在下面的答案中。欢迎其他解决方案。
这里有一些方法
1)
In [420]: (df.groupby(['country', 'status']).size().unstack()
.assign(closed_ratio=lambda x: x.closed / x.sum(1)))
Out[420]:
status closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
Run Code Online (Sandbox Code Playgroud)
2)
In [422]: (pd.crosstab(df.country, df.status)
.assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[422]:
status closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
Run Code Online (Sandbox Code Playgroud)
3)
In [424]: (df.pivot_table(index='country', columns='status', aggfunc='size')
.assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[424]:
status closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
Run Code Online (Sandbox Code Playgroud)
4)借自piRSquared
In [430]: (df.set_index('country').status.str.get_dummies().sum(level=0)
.assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[430]:
closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1253 次 |
| 最近记录: |