met*_*rsk 26 python dataframe pandas pandas-groupby
我有一个看起来像这样的DataFrame:
+----------+---------+-------+
| username | post_id | views |
+----------+---------+-------+
| john | 1 | 3 |
| john | 2 | 23 |
| john | 3 | 44 |
| john | 4 | 82 |
| jane | 7 | 5 |
| jane | 8 | 25 |
| jane | 9 | 46 |
| jane | 10 | 56 |
+----------+---------+-------+
Run Code Online (Sandbox Code Playgroud)
我想将它转换为计算属于某些二进制文件的视图:
+------+------+-------+-------+--------+
| | 1-10 | 11-25 | 25-50 | 51-100 |
+------+------+-------+-------+--------+
| john | 1 | 1 | 1 | 1 |
| jane | 1 | 1 | 1 | 1 |
+------+------+-------+-------+--------+
Run Code Online (Sandbox Code Playgroud)
我试过了:
bins = [1, 10, 25, 50, 100]
groups = df.groupby(pd.cut(df.views, bins))
groups.username.count()
Run Code Online (Sandbox Code Playgroud)
但它只提供聚合计数而不是用户计数.我如何获得用户的bin计数?
聚合计数(使用我的实际数据)如下所示:
impressions
(2500, 5000] 2332
(5000, 10000] 1118
(10000, 50000] 570
(50000, 10000000] 14
Name: username, dtype: int64
Run Code Online (Sandbox Code Playgroud)
Ale*_*ley 30
您可以按箱和用户名进行分组,计算组大小,然后使用unstack():
>>> groups = df.groupby(['username', pd.cut(df.views, bins)])
>>> groups.size().unstack()
views (1, 10] (10, 25] (25, 50] (50, 100]
username
jane 1 1 1 1
john 1 1 1 1
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
18954 次 |
| 最近记录: |