Red*_*gon 3 epoch python-3.x pandas
因此,我想找出“平均 每次上半场与下半场的难度级别”,我找不到解决此问题的合适方法。我使用纪元时间将会话分为两半,然后找到平均难度级别。
session_id question_difficulty attempt_updated_at
5c822af21c1fba22 2 1557470128000
5c822af21c1fba22 3 1557469685000
5c822af21c1fba22 4 1557470079000
5c822af21c1fba22 5 1557472999000
5c822af21c1fba22 3 1557474145000
5c822af21c1fba22 3 1557474441000
5c822af21c1fba22 4 1557474299000
5c822af21c1fba22 4 1557474738000
5c822af21c1fba22 3 1557475430000
5c822af21c1fba22 4 1557476960000
5c822af21c1fba22 5 1557477458000
5c822af21c1fba22 2 1557478118000
5c822af21c1fba22 5 1557482556000
5c822af21c1fba22 4 1557482809000
5c822af21c1fba22 5 1557482886000
5c822af21c1fba22 5 1557484232000
Run Code Online (Sandbox Code Playgroud)
我正在研究python pandas(Jupter Notebook)。
明智的代码,我不知道从哪里开始。(Noobie警报)
我希望输出像:
session_id上半场难度下半场难度
IIUC,您可以pandas.qcut用来将纪元切成2个大小相等的垃圾箱(上半部分/下半部分)。然后使用groupby.mean:
df.groupby(['session_id', pd.qcut(df.attempt_updated_at, q=2)])['question_difficulty'].mean()
Run Code Online (Sandbox Code Playgroud)
[出]
session_id attempt_updated_at
5c822af21c1fba22 (1557469684999.999, 1557475084000.0] 3.500
(1557475084000.0, 1557484232000.0] 4.125
Name: question_difficulty, dtype: float64
Run Code Online (Sandbox Code Playgroud)
或者,根据您定义“上半部分” /“后半部分”的方式,您可能需要pandas.cut使用bins=2参数(在这种情况下,时间段的间隔将相等,而不是qcut上面所述的大小相等):
df.groupby(['session_id', pd.cut(df.attempt_updated_at, bins=2)])['question_difficulty'].mean()
Run Code Online (Sandbox Code Playgroud)
[出]
session_id attempt_updated_at
5c822af21c1fba22 (1557469670453.0, 1557476958500.0] 3.444444
(1557476958500.0, 1557484232000.0] 4.285714
Name: question_difficulty, dtype: float64
Run Code Online (Sandbox Code Playgroud)
要为唯一的session_id计算不同的时间段,您可能首先必须分组为session_id; 对每个组运行以上方法; 最后,concat结果。这是使用列表理解的示例:
groups_session_id = df.groupby('session_id')
pd.concat([g.groupby(['session_id', pd.cut(g['attempt_updated_at'], bins=2).astype(str)])
['question_difficulty'].mean() for _, g in groups_session_id])
Run Code Online (Sandbox Code Playgroud)
要将这些平均值加回到原始值中DataFrame,可以使用DataFrame.merge:
df_avg_question_difficulty = pd.concat([g.groupby(['session_id', pd.cut(g['attempt_updated_at'], bins=2, labels = [1, 2]).astype(str)])
['question_difficulty'].mean().unstack(1) for _, g in groups_session_id])
df = df.merge(df_avg_question_difficulty, left_on='session_id', right_index=True)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
50 次 |
| 最近记录: |