如何一次从数组中删除多个值

dbl*_*iss 6 python arrays numpy

有人可以为我提供更好的(更简单,更可读,更Pythonic,更高效等)从数组中删除多个值的方法,而不是以下内容:

import numpy as np

# The array.
x = np.linspace(0, 360, 37)

# The values to be removed.
a = 0
b = 180
c = 360

new_array = np.delete(x, np.where(np.logical_or(np.logical_or(x == a,
                                                              x == b),
                                                x == c)))
Run Code Online (Sandbox Code Playgroud)

这个问题的一个好答案会产生与上面代码相​​同的结果(即new_array),但是在处理浮点数之间的相等性方面可能比上面的代码更好.

奖金

有人可以向我解释为什么会产生错误的结果吗?

In [5]: np.delete(x, x == a)
/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py:3254: FutureWarning: in the future insert will treat boolean arrays and array-likes as boolean index instead of casting it to integer
  "of casting it to integer", FutureWarning)
Out[5]: 
array([  20.,   30.,   40.,   50.,   60.,   70.,   80.,   90.,  100.,
        110.,  120.,  130.,  140.,  150.,  160.,  170.,  180.,  190.,
        200.,  210.,  220.,  230.,  240.,  250.,  260.,  270.,  280.,
        290.,  300.,  310.,  320.,  330.,  340.,  350.,  360.])
Run Code Online (Sandbox Code Playgroud)

值0和10都已被删除,而不是仅仅0(a).

注意,x == a正如预期的那样(问题出在里面np.delete):

In [6]: x == a
Out[6]: 
array([ True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False, False], dtype=bool)
Run Code Online (Sandbox Code Playgroud)

请注意,它np.delete(x, np.where(x == a))会产生正确的结果.因此,在我看来,np.delete无法处理布尔索引.

sty*_*ane 5

您还可以使用np.ravel获取索引,values然后使用删除它们 np.delete

In [32]: r =  [a,b,c]

In [33]: indx = np.ravel([np.where(x == i) for i in r])

In [34]: indx
Out[34]: array([ 0, 18, 36])

In [35]: np.delete(x, indx)
Out[35]: 
array([  10.,   20.,   30.,   40.,   50.,   60.,   70.,   80.,   90.,
        100.,  110.,  120.,  130.,  140.,  150.,  160.,  170.,  190.,
        200.,  210.,  220.,  230.,  240.,  250.,  260.,  270.,  280.,
        290.,  300.,  310.,  320.,  330.,  340.,  350.])
Run Code Online (Sandbox Code Playgroud)


hol*_*web 5

你的代码似乎有点复杂.我想知道你是否考虑过numpy的布尔矢量索引.

在设置与您相同的设置后,我为您的代码计时:

In [175]: %%timeit
   .....: np.delete(x, np.where(np.logical_or(np.logical_or(x == a, x == b), x == c)))
   .....:
10000 loops, best of 3: 32.9 µs per loop
Run Code Online (Sandbox Code Playgroud)

然后,我计时两个独立的布尔索引应用程序.

In [176]: %%timeit
   .....: x1 = x[x != a]
   .....: x2 = x1[x1 != b]
   .....: new_array = x2[x2 != c]
   .....:
100000 loops, best of 3: 6.56 µs per loop
Run Code Online (Sandbox Code Playgroud)

最后,为了方便编程并将技术扩展到任意数量的排除值,我重写了与循环相同的代码.这会慢一些,因为需要先制作副本,但它仍然非常值得尊敬.

In [177]: %%timeit
   .....: new_array = x.copy()
   .....: for val in (a, b, c):
   .....:     new_array = new_array[new_array != val]
   .....:
100000 loops, best of 3: 7.61 µs per loop
Run Code Online (Sandbox Code Playgroud)

不过,我认为真正的收获在于编程的清晰度.最后,我想最好验证三种算法给出了相同的结果...

In [179]: new_array1 = np.delete(x,
   .....:                 np.where(np.logical_or(np.logical_or(x == a, x == b), x == c)))

In [180]: x1 = x[x != a]

In [181]: x2 = x1[x1 != b]

In [182]: new_array2 = x2[x2 != c]

In [183]: new_array3 = x.copy()

In [184]: for val in (a, b, c):
   .....:         new_array3 = new_array3[new_array3 != val]
   .....:

In [185]: all(new_array1 == new_array2)
Out[185]: True

In [186]: all(new_array1 == new_array3)
Out[186]: True
Run Code Online (Sandbox Code Playgroud)

要处理浮点比较的问题,您需要使用numpy的isclose()函数.正如所料,这将时间发送到地狱:

In [188]: %%timeit
   .....: new_array = x.copy()
   .....: for val in (a, b, c):
   .....:     new_array = new_array[~np.isclose(new_array, val)]
   .....:
10000 loops, best of 3: 126 µs per loop
Run Code Online (Sandbox Code Playgroud)

你的奖金的答案包含在警告中,但警告不是很有用,除非你知道False并且在True数字上分别等于零和一.所以你的代码相当于

np.delete(1, 1)
Run Code Online (Sandbox Code Playgroud)

正如警告所表明的那样,numpy团队最终打算使用布尔参数的结果np.delete()可能会改变,但目前它只需要索引参数.