Numpy:对于一个数组中的每个元素,在另一个数组中查找索引

Question

Numpy:对于一个数组中的每个元素,在另一个数组中查找索引

Chr*_*ris 39 python arrays indexing search numpy

我有两个一维数组,x和y,一个比另一个小.我试图在x中找到y的每个元素的索引.

我发现了两种天真的方法,第一种是缓慢的,第二种是内存密集型.

缓慢的方式

indices= []
for iy in y:
    indices += np.where(x==iy)[0][0]

Run Code Online (Sandbox Code Playgroud)

记忆猪

xe = np.outer([1,]*len(x), y)
ye = np.outer(x, [1,]*len(y))
junk, indices = np.where(np.equal(xe, ye))

Run Code Online (Sandbox Code Playgroud)

有更快的方式或更少的内存密集型方法吗？理想情况下,搜索将利用这样一个事实,即我们在列表中搜索的不是一件事,而是很多事情,因此稍微更适合并行化.如果您不假设y的每个元素实际上都在x中,则奖励积分.

Answer 1

Rom*_*anS 25

我想建议一行解决方案:

indices = np.where(np.in1d(x, y))[0]

Run Code Online (Sandbox Code Playgroud)

结果是一个带有x数组索引的数组,它对应于y中找到的元素.

如果需要,可以在没有numpy的情况下使用它.

虽然这确实返回了x中存在的y元素的索引,但返回的索引的顺序与x中的值的顺序不匹配.考虑:x = np.array([1,2,3,4,5]; y = np.array([5,4,3,2,1]).上面的方法返回数组([0,1, 2,3,4]),所以x [0] = 1匹配y [0] = 5,这不是想要的...... (18认同)
这只是简单的说明了x中的元素是否存在于y中，然后给出了x中对应的索引。它不会为 x 中的每一项给出 y 中相应的索引。 (3认同)
in1d（）解决方案不起作用。取y = np.array（[10，5，5，1，'auto'，6，'auto'，1，5，10，10，'auto']）和x = np.array（['auto' ，5，6，10，1]）。您会期望[3，1，1，4，0，2，0，4，3，3，0]。np.where（np.in1d（x，y））[0]不会产生该结果。 (2认同)

Answer 2

HYR*_*YRY 24

正如Joe Kington所说,searchsorted()可以非常快速地搜索元素.要处理不在x中的元素,可以使用原始y检查搜索结果,并创建一个掩码数组:

import numpy as np
x = np.array([3,5,7,1,9,8,6,6])
y = np.array([2,1,5,10,100,6])

index = np.argsort(x)
sorted_x = x[index]
sorted_index = np.searchsorted(sorted_x, y)

yindex = np.take(index, sorted_index, mode="clip")
mask = x[yindex] != y

result = np.ma.array(yindex, mask=mask)
print result

Run Code Online (Sandbox Code Playgroud)

结果是:

[-- 3 1 -- -- 6]

Run Code Online (Sandbox Code Playgroud)

Answer 3

Joe*_*ton 18

这个怎么样？

它确实假设y的每个元素都在x中,(并且即使对于不是!的元素也将返回结果),但速度要快得多.

import numpy as np

# Generate some example data...
x = np.arange(1000)
np.random.shuffle(x)
y = np.arange(100)

# Actually preform the operation...
xsorted = np.argsort(x)
ypos = np.searchsorted(x[xsorted], y)
indices = xsorted[ypos]

Run Code Online (Sandbox Code Playgroud)

Answer 4

Eel*_*orn 6

numpy_indexed包（免责声明：我是它的作者）包含一个执行此操作的函数：

import numpy_indexed as npi
indices = npi.indices(x, y, missing='mask')

Run Code Online (Sandbox Code Playgroud)

如果 y 中的所有元素不存在于 x 中，它当前会引发 KeyError；但也许我应该添加一个 kwarg，以便人们可以选择用 -1 或其他东西来标记这些项目。

它应该与当前接受的答案具有相同的效率，因为实现方式相似。然而 numpy_indexed 更灵活，例如还允许搜索多维数组的行索引。

编辑：我改变了缺失值的处理；现在可以使用“raise”、“ignore”或“mask”来设置“missing”kwarg。在后一种情况下，您将获得一个与 y 长度相同的掩码数组，您可以在该数组上调用 .compressed() 来获取有效索引。请注意，如果您只需要知道的话，还有 npi.contains(x, y) 。

Answer 5

小智 5

我会这样做：

indices = np.where(y[:, None] == x[None, :])[1]

Run Code Online (Sandbox Code Playgroud)

与您的内存占用方式不同，此方法利用广播直接生成2D布尔数组，而无需同时为x和y创建2D数组。

为了记录，这也占用了内存。 (2认同)

Answer 6

her*_*alc 5

我认为这是一个更清晰的版本：

np.where(y.reshape(y.size, 1) == x)[1]

Run Code Online (Sandbox Code Playgroud)

比indices = np.where(y[:, None] == x[None, :])[1]。您无需将x广播为2D。

我发现这种类型的解决方案是最好的，因为与在此或其他地方发布的基于searchsorted（）或in1d（）的解决方案不同，以上解决方案适用于重复项，并且不关心是否对任何内容进行了排序。这对我很重要，因为我希望x遵循特定的自定义顺序。

归档时间：	14 年，2 月前
查看次数：	25272 次
最近记录：	6 年，7 月前