red*_*981 5 python numpy fill conditional-statements pandas
我的数据框看起来像这样;
df = pd.DataFrame({'Col1':[0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0]
,'Col2':[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]})
Run Code Online (Sandbox Code Playgroud)
如果 col1 在第 2 列中包含值 1,我想用 1 n 次向前填充。例如,如果 n = 4 那么我需要结果看起来像这样。
df = pd.DataFrame({'Col1':[0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0]
,'Col2':[0,1,1,1,1,0,0,0,1,1,1,1,0,0,0,0,0,1,1,1,1]})
Run Code Online (Sandbox Code Playgroud)
我想我可以使用带有计数器的 for 循环来做到这一点,该计数器在每次出现条件时都会重置,但有没有更快的方法来产生相同的结果?
谢谢!
方法#1:基于 NumPy 的方法1D convolution-
N = 4 # window size\nK = np.ones(N,dtype=bool)\ndf[\'Col2\'] = (np.convolve(df.Col1,K)[:-N+1]>0).view(\'i1\')\nRun Code Online (Sandbox Code Playgroud)\n\n更紧凑的单线 -
\n\ndf[\'Col2\'] = (np.convolve(df.Col1,[1]*N)[:-N+1]>0).view(\'i1\')\nRun Code Online (Sandbox Code Playgroud)\n\n方法#2:这是一种SciPy\'s binary_dilation-
from scipy.ndimage.morphology import binary_dilation\n\nN = 4 # window size\nK = np.ones(N,dtype=bool)\ndf[\'Col2\'] = binary_dilation(df.Col1,K,origin=-(N//2)).view(\'i1\')\nRun Code Online (Sandbox Code Playgroud)\n\n方法#3:利用 NumPy 的基于跨步视图的工具发挥出最好的效果 -
\n\nfrom skimage.util.shape import view_as_windows\n\nN = 4 # window size\nmask = df.Col1.values==1\nw = view_as_windows(mask,N)\nidx = len(df)-(N-mask[-N:].argmax())\nif mask[-N:].any():\n mask[idx:idx+N-1] = 1\nw[mask[:-N+1]] = 1\ndf[\'Col2\'] = mask.view(\'i1\')\nRun Code Online (Sandbox Code Playgroud)\n\n给定样本的设置按比例放大10,000x-
In [67]: df = pd.DataFrame({\'Col1\':[0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0]\n ...: ,\'Col2\':[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]})\n ...: \n ...: df = pd.concat([df]*10000)\n ...: df.index = range(len(df.index))\nRun Code Online (Sandbox Code Playgroud)\n\n时间安排
\n\n# @jezrael\'s soln\nIn [68]: %%timeit\n ...: n = 3\n ...: df[\'Col2_1\'] = df[\'Col1\'].where(df[\'Col1\'].eq(1)).ffill(limit=n).fillna(df[\'Col1\']).astype(int)\n5.15 ms \xc2\xb1 25.3 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n\n# App-1 from this post\nIn [72]: %%timeit\n ...: N = 4 # window size\n ...: K = np.ones(N,dtype=bool)\n ...: df[\'Col2_2\'] = (np.convolve(df.Col1,K)[:-N+1]>0).view(\'i1\')\n1.41 ms \xc2\xb1 20.9 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n\n# App-2 from this post\nIn [70]: %%timeit\n ...: N = 4 # window size\n ...: K = np.ones(N,dtype=bool)\n ...: df[\'Col2_3\'] = binary_dilation(df.Col1,K,origin=-(N//2)).view(\'i1\')\n2.92 ms \xc2\xb1 13.2 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n\n# App-3 from this post\nIn [35]: %%timeit\n ...: N = 4 # window size\n ...: mask = df.Col1.values==1\n ...: w = view_as_windows(mask,N)\n ...: idx = len(df)-(N-mask[-N:].argmax())\n ...: if mask[-N:].any():\n ...: mask[idx:idx+N-1] = 1\n ...: w[mask[:-N+1]] = 1\n ...: df[\'Col2_4\'] = mask.view(\'i1\')\n1.22 ms \xc2\xb1 3.02 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n\n# @yatu\'s soln\nIn [71]: %%timeit\n ...: n = 4\n ...: ix = (np.flatnonzero(df.Col1 == 1) + np.arange(n)[:,None]).ravel(\'F\')\n ...: df.loc[ix, \'Col2_5\'] = 1\n7.55 ms \xc2\xb1 32 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\nRun Code Online (Sandbox Code Playgroud)\n