在两个数组中查找公共值的索引

ber*_*lem 5 python arrays performance numpy indices

我正在使用Python 2.7.我有两个数组,A和B.要找到B中存在的A中元素的索引,我可以这样做

A_inds = np.in1d(A,B)
Run Code Online (Sandbox Code Playgroud)

我还想得到A中存在的B中元素的索引,即使用上面的代码找到的相同重叠元素的B中的索引.

目前我再次运行同一行,如下所示:

B_inds = np.in1d(B,A)
Run Code Online (Sandbox Code Playgroud)

但这个额外的计算似乎应该是不必要的.是否有更高效的计算方法来获取A_indsB_inds

我愿意使用列表或数组方法.

Div*_*kar 3

np.uniquenp.searchsorted可以一起使用来解决它 -

def unq_searchsorted(A,B):

    # Get unique elements of A and B and the indices based on the uniqueness
    unqA,idx1 = np.unique(A,return_inverse=True)
    unqB,idx2 = np.unique(B,return_inverse=True)

    # Create mask equivalent to np.in1d(A,B) and np.in1d(B,A) for unique elements
    mask1 = (np.searchsorted(unqB,unqA,'right') - np.searchsorted(unqB,unqA,'left'))==1
    mask2 = (np.searchsorted(unqA,unqB,'right') - np.searchsorted(unqA,unqB,'left'))==1

    # Map back to all non-unique indices to get equivalent of np.in1d(A,B), 
    # np.in1d(B,A) results for non-unique elements
    return mask1[idx1],mask2[idx2]
Run Code Online (Sandbox Code Playgroud)

运行时测试并验证结果 -

In [233]: def org_app(A,B):
     ...:     return np.in1d(A,B), np.in1d(B,A)
     ...: 

In [234]: A = np.random.randint(0,10000,(10000))
     ...: B = np.random.randint(0,10000,(10000))
     ...: 

In [235]: np.allclose(org_app(A,B)[0],unq_searchsorted(A,B)[0])
Out[235]: True

In [236]: np.allclose(org_app(A,B)[1],unq_searchsorted(A,B)[1])
Out[236]: True

In [237]: %timeit org_app(A,B)
100 loops, best of 3: 7.69 ms per loop

In [238]: %timeit unq_searchsorted(A,B)
100 loops, best of 3: 5.56 ms per loop
Run Code Online (Sandbox Code Playgroud)

如果两个输入数组已经是sortedunique,则性能提升将是巨大的。因此,解决方案函数将简化为 -

def unq_searchsorted_v1(A,B):
    out1 = (np.searchsorted(B,A,'right') - np.searchsorted(B,A,'left'))==1
    out2 = (np.searchsorted(A,B,'right') - np.searchsorted(A,B,'left'))==1  
    return out1,out2
Run Code Online (Sandbox Code Playgroud)

随后的运行时测试 -

In [275]: A = np.random.randint(0,100000,(20000))
     ...: B = np.random.randint(0,100000,(20000))
     ...: A = np.unique(A)
     ...: B = np.unique(B)
     ...: 

In [276]: np.allclose(org_app(A,B)[0],unq_searchsorted_v1(A,B)[0])
Out[276]: True

In [277]: np.allclose(org_app(A,B)[1],unq_searchsorted_v1(A,B)[1])
Out[277]: True

In [278]: %timeit org_app(A,B)
100 loops, best of 3: 8.83 ms per loop

In [279]: %timeit unq_searchsorted_v1(A,B)
100 loops, best of 3: 4.94 ms per loop
Run Code Online (Sandbox Code Playgroud)