如何拆分numpy数组并对拆分数组执行某些操作[Python]

Vik*_*bey 5 python arrays split numpy

之前只询问过这个问题的一部分([1] [2]),它解释了如何分割numpy数组.我是Python的新手.我有一个包含262144个项目的数组,并希望将它分成长度为512的小数组,单独对它们进行排序并总结它们的前五个值,但我不确定如何超出这一行:

np.array_split(vector, 512)
Run Code Online (Sandbox Code Playgroud)

如何调用和分析每个数组?继续使用numpy数组或者我应该还原并使用字典而不是它是个好主意吗?

Div*_*kar 3

这样的拆分并不是一个有效的解决方案,相反,我们可以重新整形,这可以有效地将子数组创建为数组的行2D。这些将是输入数组的视图,因此不需要额外的内存。然后,我们将获取 argsort 索引并选择每行的前五个索引,最后将它们相加以获得所需的输出。

因此,我们会有这样的实现 -

N = 512 # Number of elements in each split array
M = 5   # Number of elements in each subarray for sorting and summing

b = a.reshape(-1,N)
out = b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
Run Code Online (Sandbox Code Playgroud)

逐步示例运行 -

In [217]: a   # Input array
Out[217]: array([45, 19, 71, 53, 20, 33, 31, 20, 41, 19, 38, 31, 86, 34])

In [218]: N = 7 # 512 for original case, 7 for sample

In [219]: M = 5

# Reshape into M rows 2D array
In [220]: b = a.reshape(-1,N)

In [224]: b
Out[224]: 
array([[45, 19, 71, 53, 20, 33, 31],
       [20, 41, 19, 38, 31, 86, 34]])

# Get argsort indices per row
In [225]: b.argsort(1)
Out[225]: 
array([[1, 4, 6, 5, 0, 3, 2],
       [2, 0, 4, 6, 3, 1, 5]])

# Select first M ones
In [226]: b.argsort(1)[:,:M]
Out[226]: 
array([[1, 4, 6, 5, 0],
       [2, 0, 4, 6, 3]])

# Use fancy-indexing to select those M ones per row
In [227]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]]
Out[227]: 
array([[19, 20, 31, 33, 45],
       [19, 20, 31, 34, 38]])

# Finally sum along each row
In [228]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
Out[228]: array([148, 142])
Run Code Online (Sandbox Code Playgroud)

性能提升np.argpartition

out = b[np.arange(b.shape[0])[:,None], np.argpartition(b,M,axis=1)[:,:M]].sum(1)
Run Code Online (Sandbox Code Playgroud)

运行时测试 -

In [236]: a = np.random.randint(11,99,(512*512))

In [237]: N = 512

In [238]: M = 5

In [239]: b = a.reshape(-1,N)

In [240]: %timeit b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
100 loops, best of 3: 14.2 ms per loop

In [241]: %timeit b[np.arange(b.shape[0])[:,None], \
                np.argpartition(b,M,axis=1)[:,:M]].sum(1)
100 loops, best of 3: 3.57 ms per loop
Run Code Online (Sandbox Code Playgroud)