在Python列表中有效查找索引(与MATLAB相比)

Question

在Python列表中有效查找索引(与MATLAB相比)

我很难找到在Python列表中查找索引的有效解决方案.到目前为止,我测试过的所有解决方案都比MATLAB中的"find"功能慢.我刚刚开始使用Python(因此,我不是很有经验).

在MATLAB中我会使用以下内容:

a = linspace(0, 1000, 1000); % monotonically increasing vector
b = 1000 * rand(1, 100); % 100 points I want to find in a
for i = 1 : numel(b)
    indices(i) = find(b(i) <= a, 1); % find the first index where b(i) <= a
end

Run Code Online (Sandbox Code Playgroud)

如果我使用MATLAB的arrayfun(),我可以加快这个过程.在Python中我尝试了几种可能性.我用了

for i in xrange(0, len(b)):
   tmp = numpy.where(b[i] <= a)
   indices.append(tmp[0][0])

Run Code Online (Sandbox Code Playgroud)

这花费了很多时间,特别是如果a非常大的话.如果b排序比我可以使用

for i in xrange(0, len(b)):
    if(b[curr_idx] <= a[i]):
        indices.append(i)
        curr_idx += 1
    if(curr_idx >= len(b)):
        return indices
        break

Run Code Online (Sandbox Code Playgroud)

这比numpy.where()解决方案快得多,因为我只需要在列表中搜索一次,但这仍然比MATLAB解决方案慢.

有谁能建议更好/更有效的解决方案？提前致谢.

Answer 1

seb*_*ian 5

试试numpy.searchsorted:

>> a = np.array([0, 1, 2, 3, 4, 5, 6, 7])
>> b = np.array([1, 2, 4, 3, 1, 0, 2, 9])
% sorting b "into" a
>> np.searchsorted(a, b, side='right')-1
array([1, 2, 4, 3, 1, 0, 2, 9])

Run Code Online (Sandbox Code Playgroud)

您可能必须对b中的值进行一些特殊处理,这些值超出范围 - 例如上例中的9.尽管如此,这应该比任何基于循环的方法更快.

暂且不说:同样,histc在MATLAB中将比循环快得多.

编辑:

如果你想得到b最接近的索引a,你应该能够使用相同的代码,只需修改一个:

>> a_mod = 0.5*(a[:-1] + a[1:]) % take the centers between the elements in a
>> np.searchsorted(a_mod, np.array([0.9, 2.1, 4.2, 2.9, 1.1]), side='right')
array([1, 2, 4, 3, 1])

Run Code Online (Sandbox Code Playgroud)

请注意,您可以删除-1因为a_mod有一个元素小于a.

在numpy的当前开发分支中,`np.searchsorted`比在1.8中快2倍,所以如果你可以编译自己的numpy,或者等待几周直到numpy 1.9发布,Python可能再次处于领先地位. (2认同)

归档时间：	11 年，9 月前
查看次数：	252 次
最近记录：	11 年，9 月前