Swi*_*ier 4 python python-2.7 pandas
我在python 2.7中有一个pandas日期帧,我想迭代这些行并获得两种类型事件之间的时间以及中间其他类型事件的计数(给定某些条件).
我的数据pandas.DateFrame
如下所示:
Time Var1 EvntType Var2
0 15 1 2 17
1 19 1 1 45
2 21 6 2 43
3 23 3 2 65
4 25 0 2 76 #this one should be skipped
5 26 2 2 35
6 28 3 2 25
7 31 5 1 16
8 33 1 2 25
9 36 5 1 36
10 39 1 2 21
Run Code Online (Sandbox Code Playgroud)
我想忽略Var1
等于0的行,然后在类型1的事件之间计算类型1的事件和类型2的事件(除了where Var1 == 0
)之间的时间.所以在上面的例子中:
Start_time: 19, Time_inbetween: 12, Event_count: 4
Start_time: 31, Time_inbetween: 5, Event_count: 1
Run Code Online (Sandbox Code Playgroud)
我是通过以下方式做到这一点的:
i=0
eventCounter = 0
lastStartTime = 0
length = data[data['EvntType']==1].shape[0]
results = np.zeros((length,3),dtype=int)
for row in data[data['Var1'] > 0].iterrows():
myRow = row[1]
if myRow['EvntType'] == 1:
results[i,0] = lastStartTime
results[i,1] = myRow['Time'] - lastStartTime
results[i,2] = eventCounter
lastStartTime = myRow['Time']
eventCounter = 0
i += 1
else:
eventCounter += 1
Run Code Online (Sandbox Code Playgroud)
这给了我想要的结果:
>>> results[1:]
array([[19, 12, 4],
[31, 5, 1]])
Run Code Online (Sandbox Code Playgroud)
但这似乎真的很规避并且需要很长时间才能使用大型数据帧.我怎样才能改善这个?
您可以使用以下方法删除Var1
等于0 的行:
df = df.loc[df['Var1'] != 0]
Run Code Online (Sandbox Code Playgroud)
然后创建一个布尔掩码,其中True EvntType
为1:
mask = df['EvntType']==1
# 0 False
# 1 True
# ...
# 9 True
# 10 False
# Name: EvntType, dtype: bool
Run Code Online (Sandbox Code Playgroud)
找到Time
与行相关的s,其中mask
为True:
times = df.loc[mask, 'Time']
# 1 19
# 7 31
# 9 36
# Name: Time, dtype: int64
Run Code Online (Sandbox Code Playgroud)
并且还找到其中的序数索引mask
为True:
idx = np.flatnonzero(mask)
# array([1, 6, 8])
Run Code Online (Sandbox Code Playgroud)
该start_time
s为所有的值times[:-1]
.
In [56]: times[:-1]
Out[56]:
1 19
7 31
Name: Time, dtype: int64
Run Code Online (Sandbox Code Playgroud)
该time_inbetween
是在时间上的差异,np.diff(times)
In [55]: np.diff(times)
Out[55]: array([12, 5])
Run Code Online (Sandbox Code Playgroud)
这event_count
是差异idx
,减去1.
In [57]: np.diff(idx)-1
Out[57]: array([4, 1])
Run Code Online (Sandbox Code Playgroud)
import numpy as np
import pandas as pd
df = pd.DataFrame({'EvntType': [2, 1, 2, 2, 2, 2, 2, 1, 2, 1, 2],
'Time': [15, 19, 21, 23, 25, 26, 28, 31, 33, 36, 39],
'Var1': [1, 1, 6, 3, 0, 2, 3, 5, 1, 5, 1],
'Var2': [17, 45, 43, 65, 76, 35, 25, 16, 25, 36, 21]})
# Remove rows where Var1 equals 0
df = df.loc[df['Var1'] != 0]
mask = df['EvntType']==1
times = df.loc[mask, 'Time']
idx = np.flatnonzero(mask)
result = pd.DataFrame(
{'start_time': times[:-1],
'time_inbetween': np.diff(times),
'event_count': np.diff(idx)-1})
print(result)
Run Code Online (Sandbox Code Playgroud)
产量
event_count start_time time_inbetween
1 4 19 12
7 1 31 5
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
809 次 |
最近记录: |