如何选择numpy数组索引的逆

b10*_*ard 15 python numpy scipy

我有一大堆数据,我需要将这个数组中一组样本的距离与数组的所有其他元素进行比较.下面是我的数据集的一个非常简单的例子.

import numpy as np
import scipy.spatial.distance as sd

data = np.array(
    [[ 0.93825827,  0.26701143],
     [ 0.99121108,  0.35582816],
     [ 0.90154837,  0.86254049],
     [ 0.83149103,  0.42222948],
     [ 0.27309625,  0.38925281],
     [ 0.06510739,  0.58445673],
     [ 0.61469637,  0.05420098],
     [ 0.92685408,  0.62715114],
     [ 0.22587817,  0.56819403],
     [ 0.28400409,  0.21112043]]
)


sample_indexes = [1,2,3]

# I'd rather not make this
other_indexes = list(set(range(len(data))) - set(sample_indexes))

sample_data = data[sample_indexes]
other_data = data[other_indexes]

# compare them
dists = sd.cdist(sample_data, other_data)
Run Code Online (Sandbox Code Playgroud)

有没有办法为不是样本索引的索引索引numpy数组?在上面的例子中,我创建了一个名为other_indexes的列表.我宁愿不必出于各种原因这样做(大数据集,线程,在运行等等的系统上非常少量的内存等).有没有办法做...

other_data = data[ indexes not in sample_indexes]
Run Code Online (Sandbox Code Playgroud)

我读到numpy面具可以做到这一点,但我试过......

other_data = data[~sample_indexes]
Run Code Online (Sandbox Code Playgroud)

这给了我一个错误.我必须创建一个面具吗?

Eel*_*orn 19

mask = np.ones(len(data), np.bool)
mask[sample_indexes] = 0
other_data = data[mask]
Run Code Online (Sandbox Code Playgroud)

或许应该是单行语句的最优雅,但它相当有效,并且内存开销也很小.

如果内存是您最关心的问题,np.delete将避免创建掩码,并且fancy-indexing无论如何都会创建一个副本.

第二个想法; np.delete不会修改现有数组,因此它几乎就是您要查找的单行语句.

  • 这正是官方文档推荐的:https://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html (2认同)

CT *_*Zhu 6

你可能想试试 in1d

In [5]:

select = np.in1d(range(data.shape[0]), sample_indexes)
In [6]:

print data[select]
[[ 0.99121108  0.35582816]
 [ 0.90154837  0.86254049]
 [ 0.83149103  0.42222948]]
In [7]:

print data[~select]
[[ 0.93825827  0.26701143]
 [ 0.27309625  0.38925281]
 [ 0.06510739  0.58445673]
 [ 0.61469637  0.05420098]
 [ 0.92685408  0.62715114]
 [ 0.22587817  0.56819403]
 [ 0.28400409  0.21112043]]
Run Code Online (Sandbox Code Playgroud)


Pol*_*eer 5

您还可以使用setdiff1d

In [11]: data[np.setdiff1d(np.arange(data.shape[0]), sample_indexes)]
Out[11]: 
array([[ 0.93825827,  0.26701143],
       [ 0.27309625,  0.38925281],
       [ 0.06510739,  0.58445673],
       [ 0.61469637,  0.05420098],
       [ 0.92685408,  0.62715114],
       [ 0.22587817,  0.56819403],
       [ 0.28400409,  0.21112043]])
Run Code Online (Sandbox Code Playgroud)