cls*_*udt 10 python pandas data-cleaning data-science
应用于pandas.to_numeric包含表示数字(以及可能的其他不可解析字符串)的字符串的数据帧列会导致出现如下错误消息:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-66-07383316d7b6> in <module>()
1 for column in shouldBeNumericColumns:
----> 2 trainData[column] = pandas.to_numeric(trainData[column])
/usr/local/lib/python3.5/site-packages/pandas/tools/util.py in to_numeric(arg, errors)
113 try:
114 values = lib.maybe_convert_numeric(values, set(),
--> 115 coerce_numeric=coerce_numeric)
116 except:
117 if errors == 'raise':
pandas/src/inference.pyx in pandas.lib.maybe_convert_numeric (pandas/lib.c:53558)()
pandas/src/inference.pyx in pandas.lib.maybe_convert_numeric (pandas/lib.c:53344)()
ValueError: Unable to parse string
Run Code Online (Sandbox Code Playgroud)
看看哪个值无法解析会不会有帮助?
jez*_*ael 19
我想你可以添加参数errors='coerce'来转换坏的非数值NaN,然后检查这个值isnull并使用boolean indexing:
print (df[pd.to_numeric(df.col, errors='coerce').isnull()])
Run Code Online (Sandbox Code Playgroud)
样品:
df = pd.DataFrame({'B':['a','7','8'],
'C':[7,8,9]})
print (df)
B C
0 a 7
1 7 8
2 8 9
print (df[pd.to_numeric(df.B, errors='coerce').isnull()])
B C
0 a 7
Run Code Online (Sandbox Code Playgroud)
或者如果需要在混合列中找到所有字符串 - 使用字符串值的numerice检查type值是否为string:
df = pd.DataFrame({'B':['a',7, 8],
'C':[7,8,9]})
print (df)
B C
0 a 7
1 7 8
2 8 9
print (df[df.B.apply(lambda x: isinstance(x, str))])
B C
0 a 7
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
19618 次 |
| 最近记录: |