p0p*_*c1e 4 median outliers pandas rolling-computation
我试图从带有日期的GPS高程位移的散点图中滤除一些异常值
我正在尝试使用df.rolling来计算每个窗口的中位数和标准偏差,如果它大于3个标准差,则删除该点.
但是,我无法找到一种方法来遍历列并比较滚动计算的中值.
这是我到目前为止的代码
import pandas as pd
import numpy as np
def median_filter(df, window):
cnt = 0
median = df['b'].rolling(window).median()
std = df['b'].rolling(window).std()
for row in df.b:
#compare each value to its median
df = pd.DataFrame(np.random.randint(0,100,size=(100,2)), columns = ['a', 'b'])
median_filter(df, 10)
Run Code Online (Sandbox Code Playgroud)
如何循环并比较每个点并将其删除?
只需过滤数据框
df['median']= df['b'].rolling(window).median()
df['std'] = df['b'].rolling(window).std()
#filter setup
df = df[(df.b <= df['median']+3*df['std']) & (df.b >= df['median']-3*df['std'])]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4306 次 |
| 最近记录: |