aho*_*osh 5 python aggregate time-series pandas
我有一个pandas排序的数据框(基于时间)是这样的:
from datetime import datetime
df = pd.DataFrame({ 'ActivityDateTime' : [datetime(2016,5,13,6,14),datetime(2016,5,13,6,16),
datetime(2016,5,13,6,20),datetime(2016,5,13,6,27),datetime(2016,5,13,6,31),
datetime(2016,5,13,6,32),
datetime(2016,5,13,17,34),datetime(2016,5,13,17,36),
datetime(2016,5,13,17,38),datetime(2016,5,13,17,45),datetime(2016,5,13,17,47),
datetime(2016,5,16,13,3),datetime(2016,5,16,13,6),
datetime(2016,5,16,13,10),datetime(2016,5,16,13,14),datetime(2016,5,16,13,16)],
'Value1' : [0.0,2.0,3.0,4.0,0.0,0.0,0.0,7.0,8.0,4.0,0.0,0.0,3.0,9.0,1.0,0.0],
'Value2' : [0.0,2.0,3.0,4.0,0.0,0.0,0.0,7.0,8.0,4.0,0.0,0.0,3.0,9.0,1.0,0.0]
})
Run Code Online (Sandbox Code Playgroud)
结果是这样的:
ActivityDateTime Value1 Value2
0 2016-05-13 06:14:00 0.0 0.0
1 2016-05-13 06:16:00 2.0 2.0
2 2016-05-13 06:20:00 3.0 3.0
3 2016-05-13 06:27:00 4.0 4.0
4 2016-05-13 06:31:00 0.0 0.0
5 2016-05-13 06:32:00 0.0 0.0
6 2016-05-13 17:34:00 0.0 0.0
7 2016-05-13 17:36:00 7.0 7.0
8 2016-05-13 17:38:00 8.0 8.0
9 2016-05-13 17:45:00 4.0 4.0
10 2016-05-13 17:47:00 0.0 0.0
11 2016-05-16 13:03:00 0.0 0.0
12 2016-05-16 13:06:00 3.0 3.0
13 2016-05-16 13:10:00 9.0 9.0
14 2016-05-16 13:14:00 1.0 1.0
15 2016-05-16 13:16:00 0.0 0.0
Run Code Online (Sandbox Code Playgroud)
我想聚合数据(平均)而没有for循环。但是,我要对观察结果进行分组的方法并不简单!看一下Value1,我想将它们作为non-zero值分组在一起。例如,索引1,2,3将在一组中。7,8,9一组和另一组的折衷方式是12,13,14。value1==0应避免使用的行,而零仅充当组之间的分隔。最终我想得到这样的东西:
Activity_end Activity_start Value1 Value2 num_observations
0 2016-05-13 06:27:00 2016-05-13 06:16:00 4.50 4.50 3
1 2016-05-13 17:45:00 2016-05-13 17:36:00 6.33 6.33 3
2 2016-05-16 13:14:00 2016-05-16 13:06:00 4.33 4.33 3
Run Code Online (Sandbox Code Playgroud)
目前,我认为我应该以某种方式分配number 1,2并将3其分配给新列,然后根据该值对其进行汇总。我不知道如何使该列不带for循环!请注意,Value1而Value2不一定是相同的。
一种方法是创建一些临时列
# First create a new series, which is true whenever the value changes from a zero value to a non-zero value (which will be at the start of each group)
nonzero = (df['Value1'] > 0) & (df['Value1'].shift(1) == 0)
# Take a cumulative sum. This means each group will have it's own number.
df['group'] = df['nonzero'].cumsum()
# Group by the group column
gb = df[df['Value1'] > 0].groupby('group')
Run Code Online (Sandbox Code Playgroud)
然后,您可以使用聚合函数http://pandas.pydata.org/pandas-docs/stable/groupby.html获取该组的聚合
对于您特别想要获得的输出,也请看看这个答案:Python Pandas: Multiple Aggregations of the same column
df2 = gb.agg({
'ActivityDateTime': ['first', 'last'],
'Value1': 'mean',
'Value2': 'mean'})
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
538 次 |
| 最近记录: |