Art*_*ijk 5 python group-by pandas
我有一个带事件的DataFrame.一个或多个事件可以在某个日期发生(因此日期不能是索引).日期范围是几年.我想分组数年和数月,并计算类别值.日Thnx
in [12]: df = pd.read_excel('Pandas_Test.xls', 'sheet1')
In [13]: df
Out[13]:
EventRefNr DateOccurence Type Category
0 86596 2010-01-02 00:00:00 3 Small
1 86779 2010-01-09 00:00:00 13 Medium
2 86780 2010-02-10 00:00:00 6 Small
3 86781 2010-02-09 00:00:00 17 Small
4 86898 2010-02-10 00:00:00 6 Small
5 86898 2010-02-11 00:00:00 6 Small
6 86902 2010-02-17 00:00:00 9 Small
7 86908 2010-02-19 00:00:00 3 Medium
8 86908 2010-03-05 00:00:00 3 Medium
9 86909 2010-03-06 00:00:00 8 Small
10 86930 2010-03-12 00:00:00 29 Small
11 86934 2010-03-16 00:00:00 9 Small
12 86940 2010-04-08 00:00:00 9 High
13 86941 2010-04-09 00:00:00 17 Small
14 86946 2010-04-14 00:00:00 10 Small
15 86950 2011-01-19 00:00:00 12 Small
16 86956 2011-01-24 00:00:00 13 Small
17 86959 2011-01-27 00:00:00 17 Small
Run Code Online (Sandbox Code Playgroud)
我试过了:
df.groupby(df['DateOccurence'])
Run Code Online (Sandbox Code Playgroud)
对于月份和年份的突破,我经常在数据框中添加额外的列,将每个部分的日期分成几部分:
df['year'] = [t.year for t in df.DateOccurence]
df['month'] = [t.month for t in df.DateOccurence]
df['day'] = [t.day for t in df.DateOccurence]
Run Code Online (Sandbox Code Playgroud)
它增加了空间复杂性(向df添加列),但与datetime索引相比,复杂时间更少(对groupby的处理更少),但它真的取决于你.datetime index是更多熊猫做事的方式.
按年,月,日分组后,您可以根据需要进行任何组合.
df.groupby['year','month'].Category.apply(pd.value_counts)
Run Code Online (Sandbox Code Playgroud)
要在多年中获得数月:
df.groupby['month'].Category.apply(pd.value_counts)
Run Code Online (Sandbox Code Playgroud)
或者在Andy Hayden的日期时间指数中
df.groupby[di.month].Category.apply(pd.value_counts)
Run Code Online (Sandbox Code Playgroud)
您可以选择更适合您需求的方法.
您可以将value_counts应用于 SeriesGroupby(对于列):
In [11]: g = df.groupby('DateOccurence')
In [12]: g.Category.apply(pd.value_counts)
Out[12]:
DateOccurence
2010-01-02 Small 1
2010-01-09 Medium 1
2010-02-09 Small 1
2010-02-10 Small 2
2010-02-11 Small 1
2010-02-17 Small 1
2010-02-19 Medium 1
2010-03-05 Medium 1
2010-03-06 Small 1
2010-03-12 Small 1
2010-03-16 Small 1
2010-04-08 High 1
2010-04-09 Small 1
2010-04-14 Small 1
2011-01-19 Small 1
2011-01-24 Small 1
2011-01-27 Small 1
dtype: int64
Run Code Online (Sandbox Code Playgroud)
其实我希望这个返回以下数据帧,但你需要拆散它:
In [13]: g.Category.apply(pd.value_counts).unstack(-1).fillna(0)
Out[13]:
High Medium Small
DateOccurence
2010-01-02 0 0 1
2010-01-09 0 1 0
2010-02-09 0 0 1
2010-02-10 0 0 2
2010-02-11 0 0 1
2010-02-17 0 0 1
2010-02-19 0 1 0
2010-03-05 0 1 0
2010-03-06 0 0 1
2010-03-12 0 0 1
2010-03-16 0 0 1
2010-04-08 1 0 0
2010-04-09 0 0 1
2010-04-14 0 0 1
2011-01-19 0 0 1
2011-01-24 0 0 1
2011-01-27 0 0 1
Run Code Online (Sandbox Code Playgroud)
如果有多个具有相同日期的不同类别,它们将位于同一行...
| 归档时间: |
|
| 查看次数: |
5699 次 |
| 最近记录: |