tim*_*321 81 python arrays numpy pandas
我有一个浮点数组(一些正常数字,一些nans)来自一个pandas数据帧的应用程序.
由于某种原因,numpy.isnan在这个数组上失败,但是如下所示,每个元素都是一个浮点数,numpy.isnan在每个元素上正确运行,变量的类型肯定是一个numpy数组.
这是怎么回事?!
set([type(x) for x in tester])
Out[59]: {float}
tester
Out[60]:
array([-0.7000000000000001, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan], dtype=object)
set([type(x) for x in tester])
Out[61]: {float}
np.isnan(tester)
Traceback (most recent call last):
File "<ipython-input-62-e3638605b43c>", line 1, in <module>
np.isnan(tester)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
set([np.isnan(x) for x in tester])
Out[65]: {False, True}
type(tester)
Out[66]: numpy.ndarray
Run Code Online (Sandbox Code Playgroud)
unu*_*tbu 123
np.isnan 可以应用于本机dtype的NumPy数组(例如np.float64):
In [99]: np.isnan(np.array([np.nan, 0], dtype=np.float64))
Out[99]: array([ True, False], dtype=bool)
Run Code Online (Sandbox Code Playgroud)
但在应用于对象数组时会引发TypeError:
In [96]: np.isnan(np.array([np.nan, 0], dtype=object))
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Run Code Online (Sandbox Code Playgroud)
既然你有Pandas,你可以使用pd.isnull- 它可以接受NumPy对象数组或本机dtypes:
In [97]: pd.isnull(np.array([np.nan, 0], dtype=float))
Out[97]: array([ True, False], dtype=bool)
In [98]: pd.isnull(np.array([np.nan, 0], dtype=object))
Out[98]: array([ True, False], dtype=bool)
Run Code Online (Sandbox Code Playgroud)
请注意,None在对象数组中也被视为空值.
在@unubtu的答案之上,你可以将pandas numpy对象数组强制转换为native(float64)类型,就行了
import pandas as pd
pd.to_numeric(df['tester'], errors='coerce')
Run Code Online (Sandbox Code Playgroud)
指定errors ='coerce'以强制无法解析为数值的字符串变为NaN.列类型将是dtype: float64,然后isnan检查应该工作
np.isnan()和pd.isnull()的绝佳替代品是
for i in range(0,a.shape[0]):
if(a[i]!=a[i]):
//do something here
//a[i] is nan
Run Code Online (Sandbox Code Playgroud)
因为只有nan不等于自己.