如何让熊猫在非均匀 x 网格上执行滚动平均

Dan*_*and 6 python numpy scipy pandas

我想执行滚动平均值,但窗口在 x 中只有有限的“视觉”。我想要类似于下面的内容,但我想要一个基于 x 值而不是位置索引的窗口范围。

虽然在 Pandas 中这样做是首选,但 numpy/scipy 等价物也可以

import numpy as np 
import pandas as pd 

x_val = [1,2,4,8,16,32,64,128,256,512]
y_val = [x+np.random.random()*200 for x in x_val]

df = pd.DataFrame(data={'x':x_val,'y':y_val})
df.set_index('x', inplace=True)

df.plot()
df.rolling(1, win_type='gaussian').mean(std=2).plot()
Run Code Online (Sandbox Code Playgroud)

所以我希望前 5 个值被平均在一起,因为它们彼此相差 10 个 xunits,但最后一个值保持不变。

meT*_*sky 3

根据pandas 文档rolling

\n
\n

移动窗口的大小。这是用于计算统计数据的观测值数量。每个窗口都有固定的大小。

\n
\n

因此,也许您需要像这样伪造具有各种窗口大小的滚动操作

\n
test_df = pd.DataFrame({\'x\':np.linspace(1,10,10),\'y\':np.linspace(1,10,10)})\ntest_df[\'win_locs\'] = np.linspace(1,10,10).astype(\'object\')\nfor ind in range(10): test_df.at[ind,\'win_locs\'] = np.random.randint(0,10,np.random.randint(5)).tolist()\n\n    \n# rolling operation with various window sizes\ndef worker(idx_list):\n    \n    x_slice = test_df.loc[idx_list,\'x\']\n    return np.sum(x_slice)\n\ntest_df[\'rolling\'] = test_df[\'win_locs\'].apply(worker)\n
Run Code Online (Sandbox Code Playgroud)\n

正如你所看到的,test_df

\n
      x     y      win_locs  rolling\n0   1.0   1.0        [5, 2]      9.0\n1   2.0   2.0  [4, 8, 7, 1]     24.0\n2   3.0   3.0            []      0.0\n3   4.0   4.0           [9]     10.0\n4   5.0   5.0     [6, 2, 9]     20.0\n5   6.0   6.0            []      0.0\n6   7.0   7.0     [5, 7, 9]     24.0\n7   8.0   8.0            []      0.0\n8   9.0   9.0            []      0.0\n9  10.0  10.0  [9, 4, 7, 1]     25.0\n
Run Code Online (Sandbox Code Playgroud)\n

其中滚动操作是通过apply方法实现的。

\n

然而,这种方法比原生方法要慢得多rolling,例如,

\n
test_df = pd.DataFrame({\'x\':np.linspace(1,10,10),\'y\':np.linspace(1,10,10)})\ntest_df[\'win_locs\'] = np.linspace(1,10,10).astype(\'object\')\nfor ind in range(10): test_df.at[ind,\'win_locs\'] = np.arange(ind-1,ind+1).tolist() if ind >= 1 else []\n
Run Code Online (Sandbox Code Playgroud)\n

使用上面的方法

\n
%%timeit\n# rolling operation with various window sizes\ndef worker(idx_list):\n    \n    x_slice = test_df.loc[idx_list,\'x\']\n    return np.sum(x_slice)\n\ntest_df[\'rolling_apply\'] = test_df[\'win_locs\'].apply(worker)\n
Run Code Online (Sandbox Code Playgroud)\n

结果是

\n
41.4 ms \xc2\xb1 4.44 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n

而使用本机则rolling快约 x50

\n
%%timeit\ntest_df[\'rolling_native\'] = test_df[\'x\'].rolling(window=2).sum()\n\n863 \xc2\xb5s \xc2\xb1 118 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n