How to calculate cumulative groupby counts in Pandas with point in time?

Question

How to calculate cumulative groupby counts in Pandas with point in time?

bos*_*elo 5 python dataframe pandas pandas-groupby

I have a df that contains multiple weekly snapshots of JIRA tickets. I want to calculate the YTD counts of tickets.

the df looks like this:

pointInTime   ticketId
2008-01-01         111
2008-01-01         222
2008-01-01         333
2008-01-07         444
2008-01-07         555
2008-01-07         666
2008-01-14         777
2008-01-14         888
2008-01-14         999

Run Code Online (Sandbox Code Playgroud)

So if I df.groupby(['pointInTime'])['ticketId'].count() I can get the count of Ids in every snaphsots. But what I want to achieve is calculate the cumulative sum.

and have a df looks like this:

pointInTime   ticketId   cumCount
2008-01-01         111   3
2008-01-01         222   3
2008-01-01         333   3
2008-01-07         444   6
2008-01-07         555   6
2008-01-07         666   6
2008-01-14         777   9
2008-01-14         888   9
2008-01-14         999   9

Run Code Online (Sandbox Code Playgroud)

so for 2008-01-07 number of ticket would be count of 2008-01-07 + count of 2008-01-01.

Answer 1

cs9*_*s95 6

Use GroupBy.count and cumsum, then map the result back to "pointInTime":

df['cumCount'] = (
    df['pointInTime'].map(df.groupby('pointInTime')['ticketId'].count().cumsum()))
df

  pointInTime  ticketId  cumCount
0  2008-01-01       111         3
1  2008-01-01       222         3
2  2008-01-01       333         3
3  2008-01-07       444         6
4  2008-01-07       555         6
5  2008-01-07       666         6
6  2008-01-14       777         9
7  2008-01-14       888         9
8  2008-01-14       999         9

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，6 月前
查看次数：	60 次
最近记录：	6 年，1 月前