day*_*yum 9 python numpy vectorization covariance multidimensional-array
我有一个 3D numpy 形状数组(t, n1, n2):
x = np.random.rand(10, 2, 4)
Run Code Online (Sandbox Code Playgroud)
我需要计算另一个 3D 数组y,其形状(t, n1, n1)为:
y[0] = np.cov(x[0,:,:])
Run Code Online (Sandbox Code Playgroud)
...对沿第一个轴的所有切片依此类推。
所以,一个循环的实现将是:
y = np.zeros((10,2,2))
for i in np.arange(x.shape[0]):
y[i] = np.cov(x[i, :, :])
Run Code Online (Sandbox Code Playgroud)
有什么方法可以将其矢量化,以便一次性计算所有协方差矩阵?我试着做:
x1 = x.swapaxes(1, 2)
y = np.dot(x, x1)
Run Code Online (Sandbox Code Playgroud)
但它没有用。
入侵numpy.cov source code并尝试使用默认参数。事实证明,np.cov(x[i,:,:])这很简单:
N = x.shape[2]
m = x[i,:,:]
m -= np.sum(m, axis=1, keepdims=True) / N
cov = np.dot(m, m.T) /(N - 1)
Run Code Online (Sandbox Code Playgroud)
因此,任务是对这个循环进行矢量化,该循环将一次迭代i并处理所有数据x。同样,我们可以broadcasting在第三步使用。对于最后一步,我们sum-reduction沿第一轴的所有切片执行。这可以以矢量化的方式有效地实现np.einsum。因此,最终的实现是这样的——
N = x.shape[2]
m1 = x - x.sum(2,keepdims=1)/N
y_out = np.einsum('ijk,ilk->ijl',m1,m1) /(N - 1)
Run Code Online (Sandbox Code Playgroud)
运行时测试
In [155]: def original_app(x):
...: n = x.shape[0]
...: y = np.zeros((n,2,2))
...: for i in np.arange(x.shape[0]):
...: y[i]=np.cov(x[i,:,:])
...: return y
...:
...: def proposed_app(x):
...: N = x.shape[2]
...: m1 = x - x.sum(2,keepdims=1)/N
...: out = np.einsum('ijk,ilk->ijl',m1,m1) / (N - 1)
...: return out
...:
In [156]: # Setup inputs
...: n = 10000
...: x = np.random.rand(n,2,4)
...:
In [157]: np.allclose(original_app(x),proposed_app(x))
Out[157]: True # Results verified
In [158]: %timeit original_app(x)
1 loops, best of 3: 610 ms per loop
In [159]: %timeit proposed_app(x)
100 loops, best of 3: 6.32 ms per loop
Run Code Online (Sandbox Code Playgroud)
那里有巨大的加速!