Pandas 修改滚动平均值

r r*_*ram 2 python numpy dataframe pandas

下面是我在 pandas 中的异常值检测代码。我正在滚动 15 个窗口,我想要的是滚动 5 个窗口,其中该窗口基于居中日期的星期几,即如果中心是星期一,则在星期一向后移动 2 个,在星期一向前移动 2 个。Rolling 对此没有任何支持。怎么做?

import pandas as pd
import numpy as np

np.random.seed(0)

dates = pd.date_range(start='2022-01-01', end='2023-12-31', freq='D')

prices1 = np.random.randint(10, 100, size=len(dates))
prices2 = np.random.randint(20, 120, size=len(dates)).astype(float)

data = {'Date': dates, 'Price1': prices1, 'Price2': prices2}
df = pd.DataFrame(data)

r = df.Price1.rolling(window=15, center=True)
price_up, price_low = r.mean() + 2 * r.std(), r.mean()  -  2 * r.std()

mask_upper = df['Price1'] > price_up
mask_lower = df['Price1'] < price_low

df.loc[mask_upper, 'Price1'] = r.mean()
df.loc[mask_lower, 'Price1'] = r.mean()
Run Code Online (Sandbox Code Playgroud)

moz*_*way 5

一种选择是使用 agroupby.rolling和 thedayofweek作为石斑鱼,以确保在滚动中仅使用相同的天数:

r = (df.set_index('Date')
       .groupby(df['Date'].dt.dayofweek.values) # avoid index alignment
       .rolling(f'{5*7}D', center=True)
       ['Price1']
    )
avg = r.mean().set_axis(df.index) # restore correct index
std = r.std().set_axis(df.index)
price_up, price_low = avg + 2 * std, avg  -  2 * std

mask_upper = df['Price1'] > price_up
mask_lower = df['Price1'] < price_low

df.loc[mask_upper, 'Price1'] = avg
df.loc[mask_lower, 'Price1'] = avg
Run Code Online (Sandbox Code Playgroud)

输出示例:

          Date  Price1  Price2
0   2022-01-01    54.0    86.0
1   2022-01-02    57.0   117.0
2   2022-01-03    74.0    32.0
3   2022-01-04    77.0    35.0
4   2022-01-05    77.0    53.0
..         ...     ...     ...
725 2023-12-27    44.0    37.0
726 2023-12-28    60.0    65.0
727 2023-12-29    30.0   116.0
728 2023-12-30    53.0    82.0
729 2023-12-31    10.0    42.0

[730 rows x 3 columns]
Run Code Online (Sandbox Code Playgroud)