我正在使用不同数据类型的numpy数组.我想知道,在任何特定数组中,哪些元素是NaN.通常,这np.isnan是为了什么.
但是,np.isnan对数据类型数组object(或任何字符串数据类型)不友好:
>>> str_arr = np.array(["A", "B", "C"])
>>> np.isnan(str_arr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Not implemented for this type
>>> obj_arr = np.array([1, 2, "A"], dtype=object)
>>> np.isnan(obj_arr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Run Code Online (Sandbox Code Playgroud)
我想从这两个电话中得到的只是简单的np.array([False, False, False]).我不能只是把try和except TypeError在我的呼吁np.isnan,并认为其产生的任何阵列TypeError不包含NaN的:毕竟,我想np.isnan(np.array([1, np.NaN, "A"]))返回np.array([False, True, False]).
我目前的解决方案是创建一个类型的新数组,np.float64循环遍历原始数组的元素,将该try元素放入新数组中(如果失败,则将其保留为零),然后调用np.isnan新数组.然而,这当然是相当缓慢的.(至少对于大型对象数组.)
def isnan(arr):
if isinstance(arr, np.ndarray) and (arr.dtype == object):
# Create a new array of dtype float64, fill it with the same values as the input array (where possible), and
# then call np.isnan on the new array. This way, np.isnan is only called once. (Much faster than calling it on
# every element in the input array.)
new_arr = np.zeros((len(arr),), dtype=np.float64)
for idx in xrange(len(arr)):
try:
new_arr[idx] = arr[idx]
except Exception:
pass
return np.isnan(new_arr)
else:
try:
return np.isnan(arr)
except TypeError:
return False
Run Code Online (Sandbox Code Playgroud)
这个特殊的实现也只适用于一维数组,我想不出一个合适的方法让for循环运行在任意数量的维度上.
有没有更有效的方法来确定object-type数组中的哪些元素是NaN?
编辑:我正在运行Python 2.7.10.
请注意,[x is np.nan for x in np.array([np.nan])]返回False:np.nan在内存中并不总是与另一个对象相同np.nan.
我不希望字符串 "nan"被认为等同于np.nan:我想要isnan(np.array(["nan"], dtype=object))返回np.array([False]).
多维度不是一个大问题.(这是什么,一点点ravel-和- reshapeING不会解决:P)
依赖于is运算符来测试两个NaN的等价性的任何函数并不总是起作用.(如果您认为他们应该,请问自己is操作员实际上做了什么!)
如果您愿意使用pandas库,则可以使用pd.isnull来解决这种情况:
pandas.isnull(obj)[source]检测缺失值(数字数组中为NaN,对象数组中为None / NaN)
这是一个例子:
$ python
>>> import numpy
>>> import pandas
>>> array = numpy.asarray(['a', float('nan')], dtype=object)
>>> pandas.isnull(array)
array([False, True])
Run Code Online (Sandbox Code Playgroud)
这是我最终为自己构建的:
FLOAT_TYPES = (float, np.float64, np.float32, np.complex, np.complex64, np.complex128)
def isnan(arr):
"""Equivalent of np.isnan, except made to also be friendly towards arrays of object/string dtype."""
if isinstance(arr, np.ndarray):
if arr.dtype == object:
# An element can only be NaN if it's a float, and is not equal to itself. (NaN != NaN, by definition.)
# NaN is the only float that doesn't equal itself, so "(x != x) and isinstance(x, float)" tests for NaN-ity.
# Numpy's == checks identity for object arrays, so "x != x" will always return False, so can't vectorize.
is_nan = np.array([((x != x) and isinstance(x, FLOAT_TYPES)) for x in arr.ravel()], dtype=bool)
return is_nan.reshape(arr.shape)
if arr.dtype.kind in "fc": # Only [f]loats and [c]omplex numbers can be NaN
return np.isnan(arr)
return np.zeros(arr.shape, dtype=bool)
if isinstance(arr, FLOAT_TYPES):
return np.isnan(arr)
return False
Run Code Online (Sandbox Code Playgroud)