我正在使用不同的标准偏差标准对两个通行证运行异常值检查pandas Series对象.但是,我使用两个循环,它运行速度非常慢.我想知道是否有任何大熊猫"伎俩"来加速这一步.
这是我正在使用的代码(警告非常丑陋的代码!):
def find_outlier(point, window, n):
return np.abs(point - nanmean(window)) >= n * nanstd(window)
def despike(self, std1=2, std2=20, block=100, keep=0):
res = self.values.copy()
# First run with std1:
for k, point in enumerate(res):
if k <= block:
window = res[k:k + block]
elif k >= len(res) - block:
window = res[k - block:k]
else:
window = res[k - block:k + block]
window = window[~np.isnan(window)]
if np.abs(point - window.mean()) >= std1 * window.std():
res[k] = np.NaN
# Second run with …Run Code Online (Sandbox Code Playgroud)