VPN*_*VPN 3 python dataframe pandas
这是我的数据示例:
Date Count
11.01.2019 1
01.02.2019 7
25.01.2019 4
23.01.2019 4
16.03.2019 1
04.02.2019 5
06.04.2019 1
04.04.2019 5
Run Code Online (Sandbox Code Playgroud)
所需输出:
Month Total_Count
Jan 9
Feb 12
Mar 1
Apr 6
Run Code Online (Sandbox Code Playgroud)
我使用了下面的代码,用于上面的总结操作,它工作正常,但是月份都是混乱的,并且没有像一月,二月那样相应地排序
(df.groupby(pd.to_datetime(df['Date'], format='%d.%m.%Y')
.dt.month_name()
.str[:3])['Count']
.sum()
.rename_axis('Month')
.reset_index(name='Total_Count'))
Run Code Online (Sandbox Code Playgroud)
想法是将列转换为日期时间,然后进行排序和分组sort=False
以避免默认排序groupby
:
df['Date'] = pd.to_datetime(df['Date'], format='%d.%m.%Y')
df1 = (df.sort_values('Date')
.groupby(df['Date'].dt.month_name().str[:3], sort=False)['Count']
.sum()
.rename_axis('Month')
.reset_index(name='Total_Count'))
print (df1)
Month Total_Count
0 Jan 9
1 Feb 12
2 Mar 1
3 Apr 6
Run Code Online (Sandbox Code Playgroud)
另一个想法,谢谢 anky 是使用有序的Categorical
s,然后有必要删除sort=False
:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df1 = (df.groupby(pd.Categorical(pd.to_datetime(df['Date'], format='%d.%m.%Y')
.dt.month_name().str[:3],ordered=True,categories=months))['Count']
.sum()
.rename_axis('Month')
.reset_index(name='Total_Count'))
Run Code Online (Sandbox Code Playgroud)
或者使用Series.reindex
:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df1 = (df.groupby(pd.to_datetime(df['Date'], format='%d.%m.%Y')
.dt.month_name().str[:3])['Count']
.sum()
.rename_axis('Month')
.reindex(months, fill_value=0)
.reset_index(name='Total_Count'))
print (df1)
Month Total_Count
0 Jan 9
1 Feb 12
2 Mar 1
3 Apr 6
4 May 0
5 Jun 0
6 Jul 0
7 Aug 0
8 Sep 0
9 Oct 0
10 Nov 0
11 Dec 0
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
608 次 |
最近记录: |