对于排序数组来说，更快地替代 np.where

Question

对于排序数组来说，更快地替代 np.where

chr*_*mos 5 python arrays numpy where-clause

给定一个沿每行排序的大数组a，是否有比 numpy 更快的替代方法np.where来查找索引，其中min_v <= a <= max_v?\xc2\xa0 我想，利用数组的排序性质应该能够加快速度。

\n\n

np.where这是一个用于在大型数组中查找给定索引的设置示例。

\n\n

import numpy as np\n\n# Initialise an example of an array in which to search\nr, c = int(1e2), int(1e6)\na = np.arange(r*c).reshape(r, c)\n\n# Set up search limits\nmin_v = (r*c/2)-10\nmax_v = (r*c/2)+10\n\n# Find indices of occurrences\nidx = np.where(((a >= min_v) & (a <= max_v)))\n

Run Code Online (Sandbox Code Playgroud)\n

Answer 1

Arm*_*ali 2

当我将np.searchsorted原始示例中的 1 亿个数字与不是最新的 NumPy 版本 1.12.1 一起使用时（无法分辨较新的版本），它并不比以下快多少np.where：

>>> import timeit
>>> timeit.timeit('np.where(((a >= min_v) & (a <= max_v)))', number=10, globals=globals())
6.685825735330582
>>> timeit.timeit('np.searchsorted(a.ravel(), [min_v, max_v])', number=10, globals=globals())
5.304438766092062

Run Code Online (Sandbox Code Playgroud)

但是，尽管 NumPy 文档说searchsorted这个函数使用与内置 python和函数相同的算法bisect.bisect_leftbisect.bisect_right，但后者要快得多：

>>> import bisect
>>> timeit.timeit('bisect.bisect_left(a.base, min_v), bisect.bisect_right(a.base, max_v)', number=10, globals=globals())
0.002058468759059906

Run Code Online (Sandbox Code Playgroud)

因此，我会用这个：

idx = np.unravel_index(range(bisect.bisect_left(a.base, min_v),
                             bisect.bisect_right(a.base, max_v)), a.shape)

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，10 月前
查看次数：	845 次
最近记录：	5 年，10 月前