在条件下向前填充列

red*_*981 5 python numpy fill conditional-statements pandas

我的数据框看起来像这样;

df = pd.DataFrame({'Col1':[0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0]
                   ,'Col2':[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]})
Run Code Online (Sandbox Code Playgroud)

如果 col1 在第 2 列中包含值 1,我想用 1 n 次向前填充。例如,如果 n = 4 那么我需要结果看起来像这样。

df = pd.DataFrame({'Col1':[0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0]
                   ,'Col2':[0,1,1,1,1,0,0,0,1,1,1,1,0,0,0,0,0,1,1,1,1]})
Run Code Online (Sandbox Code Playgroud)

我想我可以使用带有计数器的 for 循环来做到这一点,该计数器在每次出现条件时都会重置,但有没有更快的方法来产生相同的结果?

谢谢!

Div*_*kar 4

方法#1:基于 NumPy 的方法1D convolution-

\n\n
N = 4 # window size\nK = np.ones(N,dtype=bool)\ndf[\'Col2\'] = (np.convolve(df.Col1,K)[:-N+1]>0).view(\'i1\')\n
Run Code Online (Sandbox Code Playgroud)\n\n

更紧凑的单线 -

\n\n
df[\'Col2\'] = (np.convolve(df.Col1,[1]*N)[:-N+1]>0).view(\'i1\')\n
Run Code Online (Sandbox Code Playgroud)\n\n

方法#2:这是一种SciPy\'s binary_dilation-

\n\n
from scipy.ndimage.morphology import binary_dilation\n\nN = 4 # window size\nK = np.ones(N,dtype=bool)\ndf[\'Col2\'] = binary_dilation(df.Col1,K,origin=-(N//2)).view(\'i1\')\n
Run Code Online (Sandbox Code Playgroud)\n\n

方法#3:利用 NumPy 的基于跨步视图的工具发挥出最好的效果 -

\n\n
from skimage.util.shape import view_as_windows\n\nN = 4 # window size\nmask = df.Col1.values==1\nw = view_as_windows(mask,N)\nidx = len(df)-(N-mask[-N:].argmax())\nif mask[-N:].any():\n    mask[idx:idx+N-1] = 1\nw[mask[:-N+1]] = 1\ndf[\'Col2\'] = mask.view(\'i1\')\n
Run Code Online (Sandbox Code Playgroud)\n\n

标杆管理

\n\n

给定样本的设置按比例放大10,000x-

\n\n
In [67]: df = pd.DataFrame({\'Col1\':[0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0]\n    ...:                    ,\'Col2\':[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]})\n    ...: \n    ...: df = pd.concat([df]*10000)\n    ...: df.index = range(len(df.index))\n
Run Code Online (Sandbox Code Playgroud)\n\n

时间安排

\n\n
# @jezrael\'s soln\nIn [68]: %%timeit\n    ...: n = 3\n    ...: df[\'Col2_1\'] = df[\'Col1\'].where(df[\'Col1\'].eq(1)).ffill(limit=n).fillna(df[\'Col1\']).astype(int)\n5.15 ms \xc2\xb1 25.3 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n\n# App-1 from this post\nIn [72]: %%timeit\n    ...: N = 4 # window size\n    ...: K = np.ones(N,dtype=bool)\n    ...: df[\'Col2_2\'] = (np.convolve(df.Col1,K)[:-N+1]>0).view(\'i1\')\n1.41 ms \xc2\xb1 20.9 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n\n# App-2 from this post\nIn [70]: %%timeit\n    ...: N = 4 # window size\n    ...: K = np.ones(N,dtype=bool)\n    ...: df[\'Col2_3\'] = binary_dilation(df.Col1,K,origin=-(N//2)).view(\'i1\')\n2.92 ms \xc2\xb1 13.2 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n\n# App-3 from this post\nIn [35]: %%timeit\n    ...: N = 4 # window size\n    ...: mask = df.Col1.values==1\n    ...: w = view_as_windows(mask,N)\n    ...: idx = len(df)-(N-mask[-N:].argmax())\n    ...: if mask[-N:].any():\n    ...:     mask[idx:idx+N-1] = 1\n    ...: w[mask[:-N+1]] = 1\n    ...: df[\'Col2_4\'] = mask.view(\'i1\')\n1.22 ms \xc2\xb1 3.02 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n\n# @yatu\'s soln\nIn [71]: %%timeit\n    ...: n = 4\n    ...: ix = (np.flatnonzero(df.Col1 == 1) + np.arange(n)[:,None]).ravel(\'F\')\n    ...: df.loc[ix, \'Col2_5\'] = 1\n7.55 ms \xc2\xb1 32 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n