我想以一种方式过滤一个numpy array(或pandas DataFrame),只window_size保留至少长度的相同值的连续序列,并将其他所有值设置为0.
例如:
[1,1,1,0,0,1,1,1,1,0,0,1,0,0,0,1,1,1,0,1,1,1,1]
Run Code Online (Sandbox Code Playgroud)
当使用4的窗口大小时应该成为
[0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1]
Run Code Online (Sandbox Code Playgroud)
我已经尝试使用rolling_apply和scipy.ndimage.filtes.gerneric_filter,但由于滚动内核函数的性质,我不认为这是正确的做法在这里(和我坚持了下来,此刻).
无论如何我在这里插入我的尝试:
import numpy as np
import pandas as pd
import scipy
#from scipy import ndimage
df= pd.DataFrame({'x':np.array([1,1,1,0,0,1,1,1,1,0,0,1,0,0,0,1,1,1,0,1,1,1,1])})
df_alt = df.copy()
def filter_df(df, colname, window_size):
rolling_func = lambda z: z.sum() >= window_size
df[colname] = pd.rolling_apply(df[colname],
window_size,
rolling_func,
min_periods=window_size/2,
center = True)
def filter_alt(df, colname, window_size):
rolling_func = lambda z: z.sum() >= window_size
return scipy.ndimage.filters.generic_filter(df[colname].values,
rolling_func,
size = window_size,
origin = 0)
window_size = …Run Code Online (Sandbox Code Playgroud)