如何计算每个月或某个月份的活动日期

NPy*_*yak 7 python datetime dataframe pandas

我有一个DataFrame喜欢:

学生卡 活动时间戳
1001 2019-09-05 08:26:12
1001 2019-09-06 09:26:12
1001 2019-09-21 10:11:01
1001 2019-10-24 11:44:01
1001 2019-10-25 11:31:01
1001 2019-10-26 12:13:01
1002 2019-09-11 12:21:01
1002 2019-09-12 13:11:01
1002 2019-11-23 16:22:01

我想要输出类似的东西:

学生卡 total_active_days_in_Sept total_active_days_in_Oct total_active_days_in_Nov
1001 3 3 0
1002 2 0 1

如何实现这一点(必须为 的输出列计算月份actvity_timestamp)?

Xel*_*voz 6

You can try doing somthing similar to this:

df = pd.DataFrame.from_dict({
    "Student_id": [1001,1001,1001,1001,1001,1001,1002,1002,1002],
    "actvity_timestamp": ["2019-09-05 08:26:12", "2019-09-06 09:26:12", "2019-09-21 10:11:01", "2019-10-24 11:44:01", "2019-10-25 11:31:01", "2019-10-26 12:13:01", "2019-09-11 12:21:01", "2019-09-12 13:11:01", "2019-11-23 16:22:01"]
})

months = pd.to_datetime(df.actvity_timestamp).dt.strftime("%B")

result = pd.crosstab(
    df.Student_id,
    months,
    values=df.activity_timestamp.dt.date,
    aggfunc=pd.Series.nunique # These last two parameters make it so that if a Student_id has been active more than once in a single day, to count it only once. (Thanks to @tlentali)
).fillna(0)
Run Code Online (Sandbox Code Playgroud)

Series.dt.strftime works on datetime Series, %B formats the datetime to only show the month's name.

result will yield

actvity_timestamp  November  October  September
Student_id                                     
1001                      0        3          3
1002                      1        0          2
Run Code Online (Sandbox Code Playgroud)