我有一个 Pandas 数据框(例如 df),其中一些值突然跳跃(如步进或尖峰)。识别它们的最佳方法是什么?
我写了一个非常简单的代码,通过它计算了几个下一个和上一个值的差异。然后通过比较这些,程序将决定是阶梯还是尖峰。
# to create a dataframe
df=pd.DataFrame(np.random.randn(25), index=pd.date_range(start='2010-1-1',end='2010-1-2',freq='H'), columns=['value'])
# to manipulate the dataframe
df[10:11] = -0.933463
df[11:12] = 15
df[12:13] = 15
df[13:14] = 15
# to calculated the differnces of a value with a couple next and previous values
df_diff = pd.DataFrame()
df_diff['p1'] = df['value'].diff(periods=1).abs()
df_diff['p2'] = df['value'].diff(periods=2).abs()
df_diff['n1'] = df['value'].diff(periods=-1).abs()
df_diff['n2'] = df['value'].diff(periods=-2).abs()
max=5 # as an eligible maximum value
results = (df_diff['n1'] >max) & (df_diff['n1'] == df_diff['n2']) & (df_diff['p1']==0)
Run Code Online (Sandbox Code Playgroud)
我期望的是:
2010-01-01 00:00:00 …
Run Code Online (Sandbox Code Playgroud)