带有Pandas或Numpy的n维滑动窗口

Tho*_*wne 1 python arrays numpy r pandas

如何使用Numpy或Pandas进行R(xts)等效rollapply(....,by.column = FALSE)?给定数据帧时,pandas rolling_apply似乎只能逐列工作,而不是提供向目标函数提供完整(窗口大小)x(数据帧宽度)矩阵的选项.

import pandas as pd
import numpy as np

xx = pd.DataFrame(np.zeros([10, 10]))
pd.rolling_apply(xx, 5, lambda x: np.shape(x)[0]) 

    0   1   2   3   4   5   6   7   8   9
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4   5   5   5   5   5   5   5   5   5   5
5   5   5   5   5   5   5   5   5   5   5
6   5   5   5   5   5   5   5   5   5   5
7   5   5   5   5   5   5   5   5   5   5
8   5   5   5   5   5   5   5   5   5   5
9   5   5   5   5   5   5   5   5   5   5
Run Code Online (Sandbox Code Playgroud)

所以正在发生的事情是,rolling_apply依次向下移动每一列,并在每一列中应用一个滑动的5长度窗口,而我想要的是每次滑动窗口为5x10阵列,在这种情况下,我会获得单列向量(不是2d数组)结果.

imm*_*rrr 6

我确实找不到在pandas docs中计算"宽"滚动应用程序的方法,所以我使用numpy来获取数组的"窗口"视图并对其应用ufunc.这是一个例子:

In [40]: arr = np.arange(50).reshape(10, 5); arr
Out[40]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49]])

In [41]: win_size = 5

In [42]: isize = arr.itemsize; isize
Out[42]: 8
Run Code Online (Sandbox Code Playgroud)

arr.itemsize是8,因为默认的dtype是np.int64,你需要它为以下"窗口"视图成语:

In [43]: windowed = np.lib.stride_tricks.as_strided(arr,
                                                    shape=(arr.shape[0] - win_size + 1, win_size, arr.shape[1]),
                                                    strides=(arr.shape[1] * isize, arr.shape[1] * isize, isize)); windowed
Out[43]: 
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]],

       [[ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]],

       [[10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34]],

       [[15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44]],

       [[25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49]]])
Run Code Online (Sandbox Code Playgroud)

Strides是沿给定轴的两个相邻元素之间的字节数,因此strides=(arr.shape[1] * isize, arr.shape[1] * isize, isize)意味着当从windowed [0]到windowed [1]时跳过5个元素,并且当从windowed [0,0]到windowed [0,1]时跳过5个元素.现在你可以在结果数组上调用任何ufunc,例如:

In [44]: windowed.sum(axis=(1,2))
Out[44]: array([300, 425, 550, 675, 800, 925])
Run Code Online (Sandbox Code Playgroud)