小编pho*_*pho的帖子

过滤pandas或numpy数组以获得具有最小窗口长度的连续序列

我想以一种方式过滤一个numpy array(或pandas DataFrame),只window_size保留至少长度的相同值的连续序列,并将其他所有值设置为0.

例如:

[1,1,1,0,0,1,1,1,1,0,0,1,0,0,0,1,1,1,0,1,1,1,1]
Run Code Online (Sandbox Code Playgroud)

当使用4的窗口大小时应该成为

[0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1]
Run Code Online (Sandbox Code Playgroud)

我已经尝试使用rolling_applyscipy.ndimage.filtes.gerneric_filter,但由于滚动内核函数的性质,我不认为这是正确的做法在这里(和我坚持了下来,此刻).

无论如何我在这里插入我的尝试:

import numpy as np
import pandas as pd
import scipy
#from scipy import ndimage
df= pd.DataFrame({'x':np.array([1,1,1,0,0,1,1,1,1,0,0,1,0,0,0,1,1,1,0,1,1,1,1])})
df_alt = df.copy()
def filter_df(df, colname, window_size):
    rolling_func = lambda z: z.sum() >= window_size
    df[colname] = pd.rolling_apply(df[colname],
                                    window_size,
                                    rolling_func,
                                    min_periods=window_size/2,
                                    center = True) 

def filter_alt(df, colname, window_size):
    rolling_func = lambda z: z.sum() >= window_size
    return scipy.ndimage.filters.generic_filter(df[colname].values,
                                                rolling_func,
                                                size = window_size,                                       
                                                origin = 0)

window_size = …
Run Code Online (Sandbox Code Playgroud)

python performance numpy scipy pandas

9
推荐指数
1
解决办法
864
查看次数

标签 统计

numpy ×1

pandas ×1

performance ×1

python ×1

scipy ×1