如何加速 numpy.all 和 numpy.nonzero()?

f. *_* c. 6 python numpy

我需要检查一个点是否位于边界长方体内。长方体的数量非常大(~4M)。我想出的代码是:

import numpy as np

# set the numbers of points and cuboids
n_points = 64
n_cuboid = 4000000

# generate the test data
points = np.random.rand(1, 3, n_points)*512
cuboid_min = np.random.rand(n_cuboid, 3, 1)*512
cuboid_max = cuboid_min + np.random.rand(n_cuboid, 3, 1)*8

# main body: check if the points are inside the cuboids
inside_cuboid = np.all((points > cuboid_min) & (points < cuboid_max), axis=1)
indices = np.nonzero(inside_cuboid)
Run Code Online (Sandbox Code Playgroud)

在我的电脑上运行需要 8 秒,运行需要 np.all3 秒np.nonzero。有什么想法可以加快代码速度吗?

Div*_*kar 3

我们可以减少内存拥塞all-reduction沿着slicing的最小轴长度3得到inside_cuboid-

\n
out = (points[0,0,:] > cuboid_min[:,0]) & (points[0,0,:] < cuboid_max[:,0]) & \\\n      (points[0,1,:] > cuboid_min[:,1]) & (points[0,1,:] < cuboid_max[:,1]) & \\\n      (points[0,2,:] > cuboid_min[:,2]) & (points[0,2,:] < cuboid_max[:,2])\n
Run Code Online (Sandbox Code Playgroud)\n

时间安排 -

\n
In [43]: %timeit np.all((points > cuboid_min) & (points < cuboid_max), axis=1)\n2.49 s \xc2\xb1 20 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n\nIn [51]: %%timeit\n    ...: out = (points[0,0,:] > cuboid_min[:,0]) & (points[0,0,:] < cuboid_max[:,0]) & \\\n    ...:       (points[0,1,:] > cuboid_min[:,1]) & (points[0,1,:] < cuboid_max[:,1]) & \\\n    ...:       (points[0,2,:] > cuboid_min[:,2]) & (points[0,2,:] < cuboid_max[:,2])\n1.95 s \xc2\xb1 10.6 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n
Run Code Online (Sandbox Code Playgroud)\n