Del*_*rge 2 python numpy time-series scipy pandas
我想从降雨时间系列中提取降雨事件,同时在同一事件中允许X干小时(作为参数).因此,通过降雨事件,我的意思是大约连续降雨(RF> 0),内部最大X连续干小时(RF = 0).
我实际上不想用迭代器和增量的方式来做它,我寻找可以放心的pandas或numpy/scipy工具.
这是我的数据帧的示例.RF是原始降雨量,RFfillRF.interpolate()填充nodata.evtId是为了存储事件唯一ID而创建的字段.
TS RF RFfill evtId
0 1997-11-27 14:00:00 0.3 0.3 NaN
1 1997-11-27 15:00:00 1.1 1.1 NaN
2 1997-11-27 16:00:00 0.2 0.2 NaN
3 1997-11-27 17:00:00 0.0 0.0 NaN
4 1997-11-27 18:00:00 0.0 0.0 NaN
5 1997-11-27 19:00:00 1.1 1.1 NaN
6 1997-11-27 20:00:00 0.6 0.6 NaN
7 1997-11-27 21:00:00 0.0 0.0 NaN
8 1997-11-27 22:00:00 0.0 0.0 NaN
9 1997-11-27 23:00:00 0.0 0.0 NaN
10 1997-11-28 00:00:00 0.0 0.0 NaN
11 1997-11-28 01:00:00 0.0 0.0 NaN
12 1997-11-28 02:00:00 0.0 0.0 NaN
13 1997-11-28 03:00:00 0.0 0.0 NaN
14 1997-11-28 04:00:00 0.0 0.0 NaN
15 1997-11-28 05:00:00 0.0 0.0 NaN
16 1997-11-28 06:00:00 0.0 0.0 NaN
17 1997-11-28 07:00:00 0.0 0.0 NaN
18 1997-11-28 08:00:00 0.0 0.0 NaN
19 1997-11-28 09:00:00 0.8 0.8 NaN
20 1997-11-28 10:00:00 1.1 1.1 NaN
21 1997-11-28 11:00:00 2.3 2.3 NaN
22 1997-11-28 12:00:00 1.4 1.4 NaN
23 1997-11-28 13:00:00 0.4 0.4 NaN
24 1997-11-28 14:00:00 0.2 0.2 NaN
25 1997-11-28 15:00:00 0.0 0.0 NaN
26 1997-11-28 16:00:00 0.0 0.0 NaN
27 1997-11-28 17:00:00 0.0 0.0 NaN
28 1997-11-28 18:00:00 0.0 0.0 NaN
29 1997-11-28 19:00:00 0.0 0.0 NaN
30 1997-11-28 20:00:00 0.0 0.0 NaN
Run Code Online (Sandbox Code Playgroud)
这是预计的产量,允许干燥时间为5小时:
TS RF RFfill evtId
0 1997-11-27 14:00:00 0.3 0.3 0
1 1997-11-27 15:00:00 1.1 1.1 0
2 1997-11-27 16:00:00 0.2 0.2 0
3 1997-11-27 17:00:00 0.0 0.0 0
4 1997-11-27 18:00:00 0.0 0.0 0
5 1997-11-27 19:00:00 1.1 1.1 0
6 1997-11-27 20:00:00 0.6 0.6 0
7 1997-11-27 21:00:00 0.0 0.0 NaN
8 1997-11-27 22:00:00 0.0 0.0 NaN
9 1997-11-27 23:00:00 0.0 0.0 NaN
10 1997-11-28 00:00:00 0.0 0.0 NaN
11 1997-11-28 01:00:00 0.0 0.0 NaN
12 1997-11-28 02:00:00 0.0 0.0 NaN
13 1997-11-28 03:00:00 0.0 0.0 NaN
14 1997-11-28 04:00:00 0.0 0.0 NaN
15 1997-11-28 05:00:00 0.0 0.0 NaN
16 1997-11-28 06:00:00 0.0 0.0 NaN
17 1997-11-28 07:00:00 0.0 0.0 NaN
18 1997-11-28 08:00:00 0.0 0.0 NaN
19 1997-11-28 09:00:00 0.8 0.8 1
20 1997-11-28 10:00:00 1.1 1.1 1
21 1997-11-28 11:00:00 2.3 2.3 1
22 1997-11-28 12:00:00 1.4 1.4 1
23 1997-11-28 13:00:00 0.4 0.4 1
24 1997-11-28 14:00:00 0.2 0.2 1
25 1997-11-28 15:00:00 0.0 0.0 NaN
26 1997-11-28 16:00:00 0.0 0.0 NaN
27 1997-11-28 17:00:00 0.0 0.0 NaN
28 1997-11-28 18:00:00 0.0 0.0 NaN
29 1997-11-28 19:00:00 0.0 0.0 NaN
30 1997-11-28 20:00:00 0.0 0.0 NaN
Run Code Online (Sandbox Code Playgroud)
任何可以帮助我实现这一目标的想法?
import numpy as np
import pandas as pd
import scipy.ndimage as ndimage
df = pd.DataFrame({'RF': [ 0.3, 1.1, 0.2, 0. , 0. , 0. , 0. , 0. ,
1.1, 0.6, 0. , 0. , 0. , 0. , 0. , 0. ,
0.8, 1.1, 2.3, 1.4, 0.4, 0.2, 0. , 0. ,
0. , 0. , 0. , 0. ]})
consecutive = 5
mask = df['RF'] > 0
df['mask'] = mask
df['dilation'] = ndimage.binary_dilation(mask, structure=[1]*(consecutive+1))
df['erosion'] = ndimage.binary_erosion(df['dilation'],
structure=[1]*(consecutive+1), border_value=1)
df['labeled'], nobjs = ndimage.label(df['erosion'])
df['evtId'] = np.where(df['labeled'] > 0, df['labeled']-1, np.nan)
print(df[['RF', 'evtId']])
Run Code Online (Sandbox Code Playgroud)
产量
# RF evtId
# 0 0.3 0
# 1 1.1 0
# 2 0.2 0
# 3 0.0 0
# 4 0.0 0
# 5 0.0 0
# 6 0.0 0
# 7 0.0 0
# 8 1.1 0
# 9 0.6 0
# 10 0.0 NaN
# 11 0.0 NaN
# 12 0.0 NaN
# 13 0.0 NaN
# 14 0.0 NaN
# 15 0.0 NaN
# 16 0.8 1
# 17 1.1 1
# 18 2.3 1
# 19 1.4 1
# 20 0.4 1
# 21 0.2 1
# 22 0.0 NaN
# 23 0.0 NaN
# 24 0.0 NaN
# 25 0.0 NaN
# 26 0.0 NaN
# 27 0.0 NaN
Run Code Online (Sandbox Code Playgroud)
说明:首先准备一个二进制掩码,其中为True df['RF'] > 0:
mask = (df['RF'] > 0)
df['mask'] = mask
# RF mask
# 0 0.3 True
# 1 1.1 True
# 2 0.2 True
# 3 0.0 False
# 4 0.0 False
# 5 0.0 False
# 6 0.0 False
# 7 0.0 False
# 8 1.1 True
# 9 0.6 True
# ...
Run Code Online (Sandbox Code Playgroud)
接下来,扩大面具以将Trues(雨天)的岛屿连接在一起,相隔5个或更少的Falses(非雨天):
df['dilation'] = ndimage.binary_dilation(mask, structure=[1]*(consecutive+1))
# RF mask dilation
# 0 0.3 True True
# 1 1.1 True True
# 2 0.2 True True
# 3 0.0 False True <--,
# 4 0.0 False True |
# 5 0.0 False True | dilation filled over 5 rainy days
# 6 0.0 False True |
# 7 0.0 False True <--'
# 8 1.1 True True
# 9 0.6 True True
# 10 0.0 False True <-- But the `True`s extend a bit too far
# 11 0.0 False True <--
# 12 0.0 False False
# 13 0.0 False True
# 14 0.0 False True
# 15 0.0 False True
# 16 0.8 True True
# 17 1.1 True True
# 18 2.3 True True
# 19 1.4 True True
# 20 0.4 True True
# 21 0.2 True True
# 22 0.0 False True
# 23 0.0 False True
# 24 0.0 False False
# 25 0.0 False False
# 26 0.0 False False
# 27 0.0 False False
Run Code Online (Sandbox Code Playgroud)
接下来使用二进制侵蚀来移除True已经扩展得太远的s.
df['erosion'] = ndimage.binary_erosion(df['dilation'], structure=[1]*(consecutive+1),
border_value=1)
# RF mask dilation erosion
# 0 0.3 True True True
# 1 1.1 True True True
# 2 0.2 True True True
# 3 0.0 False True True
# 4 0.0 False True True
# 5 0.0 False True True
# 6 0.0 False True True
# 7 0.0 False True True
# 8 1.1 True True True
# 9 0.6 True True True
# 10 0.0 False True False <--,
# 11 0.0 False True False |
# 12 0.0 False False False | The Falses have been expanded
# 13 0.0 False True False | (The Trues eroded)
# 14 0.0 False True False |
# 15 0.0 False True False <--'
# 16 0.8 True True True
# 17 1.1 True True True
# 18 2.3 True True True
# 19 1.4 True True True
# 20 0.4 True True True
# 21 0.2 True True True
# 22 0.0 False True False
# 23 0.0 False True False
# 24 0.0 False False False
# 25 0.0 False False False
# 26 0.0 False False False
# 27 0.0 False False False
Run Code Online (Sandbox Code Playgroud)
既然Trues代表"降雨事件",我们可以使用ndimage.label以下命令为每个降雨事件分配一个唯一的编号:
df['labeled'], nobjs = ndimage.label(df['erosion'])
# RF mask dilation erosion labeled
# 0 0.3 True True True 1
# 1 1.1 True True True 1
# 2 0.2 True True True 1
# 3 0.0 False True True 1
# 4 0.0 False True True 1
# 5 0.0 False True True 1
# 6 0.0 False True True 1
# 7 0.0 False True True 1
# 8 1.1 True True True 1
# 9 0.6 True True True 1
# 10 0.0 False True False 0
# 11 0.0 False True False 0
# 12 0.0 False False False 0
# 13 0.0 False True False 0
# 14 0.0 False True False 0
# 15 0.0 False True False 0
# 16 0.8 True True True 2
# 17 1.1 True True True 2
# 18 2.3 True True True 2
# 19 1.4 True True True 2
# 20 0.4 True True True 2
# 21 0.2 True True True 2
# 22 0.0 False True False 0
# 23 0.0 False True False 0
# 24 0.0 False False False 0
# 25 0.0 False False False 0
# 26 0.0 False False False 0
# 27 0.0 False False False 0
Run Code Online (Sandbox Code Playgroud)
并用于np.where将标签号减1 df['labeled'] > 0,并np.nan另行指定:
df['evtId'] = np.where(df['labeled'] > 0, df['labeled']-1, np.nan)
# RF mask dilation erosion labeled evtId
# 0 0.3 True True True 1 0
# 1 1.1 True True True 1 0
# 2 0.2 True True True 1 0
# 3 0.0 False True True 1 0
# 4 0.0 False True True 1 0
# 5 0.0 False True True 1 0
# 6 0.0 False True True 1 0
# 7 0.0 False True True 1 0
# 8 1.1 True True True 1 0
# 9 0.6 True True True 1 0
# 10 0.0 False True False 0 NaN
# 11 0.0 False True False 0 NaN
# 12 0.0 False False False 0 NaN
# 13 0.0 False True False 0 NaN
# 14 0.0 False True False 0 NaN
# 15 0.0 False True False 0 NaN
# 16 0.8 True True True 2 1
# 17 1.1 True True True 2 1
# 18 2.3 True True True 2 1
# 19 1.4 True True True 2 1
# 20 0.4 True True True 2 1
# 21 0.2 True True True 2 1
# 22 0.0 False True False 0 NaN
# 23 0.0 False True False 0 NaN
# 24 0.0 False False False 0 NaN
# 25 0.0 False False False 0 NaN
# 26 0.0 False False False 0 NaN
# 27 0.0 False False False 0 NaN
Run Code Online (Sandbox Code Playgroud)
请注意,扩张后进行侵蚀称为
关闭.我使用ndimage.binary_dilation而ndimage.binary_erosion不是仅仅调用ndimage.binary_closing的原因是因为我需要设置
border_value=1以防止边缘边缘被侵蚀.比较df['erosion']有
ndimage.binary_closing(mask, structure=[1]*(consecutive+1))
Run Code Online (Sandbox Code Playgroud)
你会看到差异.
| 归档时间: |
|
| 查看次数: |
168 次 |
| 最近记录: |