计算时间序列中事件的持续时间python

Chi*_*Chi 0 python time-series dataframe pandas

我有一个数据框,如下所示:

index                value
2003-01-01 00:00:00  14.5
2003-01-01 01:00:00  15.8
2003-01-01 02:00:00     0
2003-01-01 03:00:00     0
2003-01-01 04:00:00  13.6
2003-01-01 05:00:00   4.3
2003-01-01 06:00:00  13.7
2003-01-01 07:00:00  14.4
2003-01-01 08:00:00     0
2003-01-01 09:00:00     0
2003-01-01 10:00:00     0
2003-01-01 11:00:00  17.2
2003-01-01 12:00:00     0
2003-01-01 13:00:00   5.3
2003-01-01 14:00:00     0
2003-01-01 15:00:00   2.0
2003-01-01 16:00:00   4.0
2003-01-01 17:00:00     0
2003-01-01 18:00:00     0
2003-01-01 19:00:00   3.9
2003-01-01 20:00:00   7.2
2003-01-01 21:00:00   1.0
2003-01-01 22:00:00   1.0
2003-01-01 23:00:00  10.0
Run Code Online (Sandbox Code Playgroud)

索引是日期时间,并有列记录每小时的降雨量(单位:毫米),我想计算“平均湿期持续时间”,这意味着一天中存在值(不为零)的连续小时数的平均值,所以计算为

2 + 4 + 1 + 1 + 2 + 5 / 6 (events) = 2.5 (hr)
Run Code Online (Sandbox Code Playgroud)

“平均湿润量”是指一天中连续几小时的数值总和的平均值。

{ (14.5 + 15.8) + ( 13.6 + 4.3 + 13.7 + 14.4 ) + (17.2) + (5.3) + (2 + 4)+ (3.9 + 7.2 + 1 + 1 + 10) } /  6 (events) = 21.32 (mm)
Run Code Online (Sandbox Code Playgroud)

上面的数据框只是一个例子,我拥有的数据框有更长的时间序列(例如超过一年),我怎样才能编写一个函数以便它可以更好地计算上面提到的两个值?提前致谢!

PS 这些值可能是 NaN,我想忽略它。

jpp*_*jpp 5

我相信这就是您正在寻找的。我已经为每个步骤的代码添加了解释。

# create helper columns defining contiguous blocks and day
df['block'] = (df['value'].astype(bool).shift() != df['value'].astype(bool)).cumsum()
df['day'] = df['index'].dt.normalize()

# group by day to get unique block count and value count
session_map = df[df['value'].astype(bool)].groupby('day')['block'].nunique()
hour_map = df[df['value'].astype(bool)].groupby('day')['value'].count()

# map to original dataframe
df['sessions'] = df['day'].map(session_map)
df['hours'] = df['day'].map(hour_map)

# calculate result
res = df.groupby(['day', 'hours', 'sessions'], as_index=False)['value'].sum()
res['duration'] = res['hours'] / res['sessions']
res['amount'] = res['value'] / res['sessions']
Run Code Online (Sandbox Code Playgroud)

结果

         day  sessions  duration  value     amount
0 2003-01-01         6       2.5  127.9  21.316667
Run Code Online (Sandbox Code Playgroud)