NPy*_*yak 7 python datetime dataframe pandas
我有一个DataFrame喜欢:
| 学生卡 | 活动时间戳 |
|---|---|
| 1001 | 2019-09-05 08:26:12 |
| 1001 | 2019-09-06 09:26:12 |
| 1001 | 2019-09-21 10:11:01 |
| 1001 | 2019-10-24 11:44:01 |
| 1001 | 2019-10-25 11:31:01 |
| 1001 | 2019-10-26 12:13:01 |
| 1002 | 2019-09-11 12:21:01 |
| 1002 | 2019-09-12 13:11:01 |
| 1002 | 2019-11-23 16:22:01 |
我想要输出类似的东西:
| 学生卡 | total_active_days_in_Sept | total_active_days_in_Oct | total_active_days_in_Nov |
|---|---|---|---|
| 1001 | 3 | 3 | 0 |
| 1002 | 2 | 0 | 1 |
如何实现这一点(必须为 的输出列计算月份actvity_timestamp)?
You can try doing somthing similar to this:
df = pd.DataFrame.from_dict({
"Student_id": [1001,1001,1001,1001,1001,1001,1002,1002,1002],
"actvity_timestamp": ["2019-09-05 08:26:12", "2019-09-06 09:26:12", "2019-09-21 10:11:01", "2019-10-24 11:44:01", "2019-10-25 11:31:01", "2019-10-26 12:13:01", "2019-09-11 12:21:01", "2019-09-12 13:11:01", "2019-11-23 16:22:01"]
})
months = pd.to_datetime(df.actvity_timestamp).dt.strftime("%B")
result = pd.crosstab(
df.Student_id,
months,
values=df.activity_timestamp.dt.date,
aggfunc=pd.Series.nunique # These last two parameters make it so that if a Student_id has been active more than once in a single day, to count it only once. (Thanks to @tlentali)
).fillna(0)
Run Code Online (Sandbox Code Playgroud)
Series.dt.strftime works on datetime Series, %B formats the datetime to only show the month's name.
result will yield
actvity_timestamp November October September
Student_id
1001 0 3 3
1002 1 0 2
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
269 次 |
| 最近记录: |