我正在尝试使用 IQR 作为参数自动从 Pandas 数据框中删除异常值,并将变量放入列表中。
此代码有效 - (其中 dummy_df 是数据帧,“pdays”是我想要删除异常值的第一个变量)。
q1 = np.percentile(dummy_df['pdays'], 25, interpolation = 'midpoint')
q3 = np.percentile(dummy_df['pdays'], 75, interpolation = 'midpoint')
iqr = q3 - q1
upper = np.where(dummy_df['pdays'] >= (q3+1.5*iqr))
lower = np.where(dummy_df['pdays'] <= (q1-1.5*iqr))
dummy_df.drop(upper[0], inplace = True)
dummy_df.drop(lower[0], inplace = True)
print("New Shape: ", dummy_df.shape)
Run Code Online (Sandbox Code Playgroud)
然而,这并不——
remove_outliers = ['pdays','poutcome', 'campaign', 'previous']
for outlier in remove_outliers:
q1 = np.percentile(dummy_df[outlier], 25, interpolation = 'midpoint')
q3 = np.percentile(dummy_df[outlier], 75, interpolation = 'midpoint')
iqr = q3 - q1 …Run Code Online (Sandbox Code Playgroud)