熊猫在滚动时间窗口中找到最大值

Question

熊猫在滚动时间窗口中找到最大值

我有一个df包含列"timestamp"和的表"Y"。我想添加另一列"MaxY"，其中包含Y未来最多 24 小时内的最大值。那是

df.MaxY.iloc[i] = df[(df.timestamp > df.timestamp.iloc[i]) &
                     (df.timestamp < df.timestamp.iloc[i] + timedelta(hours=24))].Y.max()

Run Code Online (Sandbox Code Playgroud)

显然，这样计算是非常慢的。有没有更好的办法？

在类似的计算情况下，"SumY"我可以使用的技巧来完成cumsum()。然而，在这里类似的技巧似乎不起作用。

根据要求，一个示例表（MaxY 是输出。输入仅是前两列）。

-------------------------------
| timestamp        | Y | MaxY |
-------------------------------
| 2016-03-29 12:00 | 1 |   3  |  rows 2 and 3 fall within 24 hours, so MaxY = max(2,3)
| 2016-03-29 13:00 | 2 |   4  |  rows 3 and 4 fall in the time interval, so MaxY = max(3, 4)
| 2016-03-30 11:00 | 3 |   4  |  rows 4, 5, 6 all fall in the interval so MaxY = max(4, 3, 2)
| 2016-03-30 12:30 | 4 |   3  |  max (3, 2)
| 2016-03-30 13:30 | 3 |   2  |  row 6 is the only row in the interval
| 2016-03-30 14:00 | 2 | nan? |  there are no rows in the time interval. Any value will do.
-------------------------------

Run Code Online (Sandbox Code Playgroud)

Answer 1

jfa*_*iro 1

怎么了

df['MaxY'] = df[::-1].Y.shift(-1).rolling('24H').max()

Run Code Online (Sandbox Code Playgroud)

df[::-1]反转 df （你希望它“向后”）并shift(-1)照顾“未来”。

归档时间：	9 年，10 月前
查看次数：	9595 次
最近记录：	5 年，11 月前