r r*_*ram 2 python numpy dataframe pandas
下面是我在 pandas 中的异常值检测代码。我正在滚动 15 个窗口,我想要的是滚动 5 个窗口,其中该窗口基于居中日期的星期几,即如果中心是星期一,则在星期一向后移动 2 个,在星期一向前移动 2 个。Rolling 对此没有任何支持。怎么做?
import pandas as pd
import numpy as np
np.random.seed(0)
dates = pd.date_range(start='2022-01-01', end='2023-12-31', freq='D')
prices1 = np.random.randint(10, 100, size=len(dates))
prices2 = np.random.randint(20, 120, size=len(dates)).astype(float)
data = {'Date': dates, 'Price1': prices1, 'Price2': prices2}
df = pd.DataFrame(data)
r = df.Price1.rolling(window=15, center=True)
price_up, price_low = r.mean() + 2 * r.std(), r.mean() - 2 * r.std()
mask_upper = df['Price1'] > price_up
mask_lower = df['Price1'] < price_low
df.loc[mask_upper, 'Price1'] = r.mean()
df.loc[mask_lower, 'Price1'] = r.mean()
Run Code Online (Sandbox Code Playgroud)
一种选择是使用 agroupby.rolling
和 thedayofweek
作为石斑鱼,以确保在滚动中仅使用相同的天数:
r = (df.set_index('Date')
.groupby(df['Date'].dt.dayofweek.values) # avoid index alignment
.rolling(f'{5*7}D', center=True)
['Price1']
)
avg = r.mean().set_axis(df.index) # restore correct index
std = r.std().set_axis(df.index)
price_up, price_low = avg + 2 * std, avg - 2 * std
mask_upper = df['Price1'] > price_up
mask_lower = df['Price1'] < price_low
df.loc[mask_upper, 'Price1'] = avg
df.loc[mask_lower, 'Price1'] = avg
Run Code Online (Sandbox Code Playgroud)
输出示例:
Date Price1 Price2
0 2022-01-01 54.0 86.0
1 2022-01-02 57.0 117.0
2 2022-01-03 74.0 32.0
3 2022-01-04 77.0 35.0
4 2022-01-05 77.0 53.0
.. ... ... ...
725 2023-12-27 44.0 37.0
726 2023-12-28 60.0 65.0
727 2023-12-29 30.0 116.0
728 2023-12-30 53.0 82.0
729 2023-12-31 10.0 42.0
[730 rows x 3 columns]
Run Code Online (Sandbox Code Playgroud)