检查numpy数组中的每个元素是否在另一个数组中

Question

检查numpy数组中的每个元素是否在另一个数组中

这个问题似乎很容易,但我不能得到一个漂亮的解决方案.我有两个numpy数组(A和B),我想获得A的索引,其中A的元素在B中,并且还得到A的索引,其中元素不在B中.

因此,如果

A = np.array([1,2,3,4,5,6,7])
B = np.array([2,4,6])

Run Code Online (Sandbox Code Playgroud)

目前我正在使用

C = np.searchsorted(A,B)

Run Code Online (Sandbox Code Playgroud)

它利用了A有序的事实,并给了我[1, 3, 5],元素的索引A.这很好,但我怎么得到D = [0,2,4,6],元素的索引A不在B？

Answer 1

HYR*_*YRY 36

searchsorted如果不是B的每个元素都在A中,你可能会给你错误的答案.你可以使用numpy.in1d:

A = np.array([1,2,3,4,5,6,7])
B = np.array([2,4,6,8])
mask = np.in1d(A, B)
print np.where(mask)[0]
print np.where(~mask)[0]

Run Code Online (Sandbox Code Playgroud)

输出是:

[1 3 5]
[0 2 4 6]

Run Code Online (Sandbox Code Playgroud)

但是in1d()使用sort,这对于大型数据集来说很慢.如果数据集很大,可以使用pandas:

import pandas as pd
np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0]

Run Code Online (Sandbox Code Playgroud)

这是时间比较:

A = np.random.randint(0, 1000, 10000)
B = np.random.randint(0, 1000, 10000)

%timeit np.where(np.in1d(A, B))[0]
%timeit np.where(pd.Index(pd.unique(B)).get_indexer(A) >= 0)[0]

Run Code Online (Sandbox Code Playgroud)

输出:

100 loops, best of 3: 2.09 ms per loop
1000 loops, best of 3: 594 µs per loop

Run Code Online (Sandbox Code Playgroud)

很高兴知道这种有效的方法,因为我的数据集非常大.非常感谢这个解决方案! (2认同)

Answer 2

ale*_*xhb 7

import numpy as np

a = np.array([1, 2, 3, 4, 5, 6, 7])
b = np.array([2, 4, 6])
c = np.searchsorted(a, b)
d = np.searchsorted(a, np.setdiff1d(a, b))

d
#array([0, 2, 4, 6])

Run Code Online (Sandbox Code Playgroud)

Answer 3

ask*_*han 6

import numpy as np

A = np.array([1,2,3,4,5,6,7])
B = np.array([2,4,6])
C = np.searchsorted(A, B)

D = np.delete(np.arange(np.alen(A)), C)

D
#array([0, 2, 4, 6])

Run Code Online (Sandbox Code Playgroud)

Answer 4

小智 5

A 的元素也在 B 中：

套装(A) & 套装(B)

A 中不在 B 中的元素：

集合(A) - 集合(B)

归档时间：	12 年，5 月前
查看次数：	13814 次
最近记录：	8 年，4 月前