ValueError:无法解析 Pandas 数据框中值“32,5”的字符串

Mar*_*man 1 python dataframe pandas

我目前使用 Jupyter 笔记本来分析公司数据。我的第一步是清理和格式化数据。到目前为止我的代码是:

%matplotlib inline
# First, we'll import pandas, a data processing and CSV file I/O library
import pandas as pd
# We'll also import seaborn, a Python graphing library
import warnings # current version of seaborn generates a bunch of warnings that we'll ignore
warnings.filterwarnings("ignore")
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
sns.set(style="dark", color_codes=True)

Users = pd.read_csv("Users.csv", delimiter = ';', engine = 'python') # maak een pandas dataframe per bestand
Users['ContractHours'].fillna(0, inplace = True)
Users['ContractHours'] = Users['ContractHours'].apply(pd.to_numeric)
Run Code Online (Sandbox Code Playgroud)

之后,我尝试将 ContractHours 列中的 NaN 值替换为零,并将该列转换为浮点数。将 NaN 替换为 0 成功。但我收到错误:

ValueError                                Traceback (most recent call last)
pandas\_libs\src\inference.pyx in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56156)()

ValueError: Unable to parse string "32,5"

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-22-bcb66b8c06fb> in <module>()
     20 #Users = Users['ContractHours'].replace(',', '.')
     21 Users['ContractHours'].fillna(0, inplace = True)
---> 22 Users['ContractHours'] = Users['ContractHours'].apply(pd.to_numeric)
     23 
     24 #print(Customers.head(10))

C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   2353             else:
   2354                 values = self.asobject
-> 2355                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2356 
   2357         if len(mapped) and isinstance(mapped[0], Series):

pandas\_libs\src\inference.pyx in pandas._libs.lib.map_infer (pandas\_libs\lib.c:66645)()

C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\tools\numeric.py in to_numeric(arg, errors, downcast)
    124             coerce_numeric = False if errors in ('ignore', 'raise') else True
    125             values = lib.maybe_convert_numeric(values, set(),
--> 126                                                coerce_numeric=coerce_numeric)
    127 
    128     except Exception:

pandas\_libs\src\inference.pyx in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56638)()

ValueError: Unable to parse string "32,5" at position 0
Run Code Online (Sandbox Code Playgroud)

如何将字符串“32,5”解析为“ContractHours”列中的浮点数?

我还尝试用“.”替换“,” 之前,但它会导致所有其他列消失,并且逗号仍然是逗号。

Users = Users['ContractHours'].replace(',', '.')
Run Code Online (Sandbox Code Playgroud)

结果是:

0       34
1       24
2       40
3       35
4       40
5       24
6       32
7       32
8       32
9       24
10      24
11      24
12      24
13       0
14      32
15      28
16      32
17      32
18      28
19      24
20      40
21      40
22      36
23      24
24    32,5
25      36
26      36
27      24
28      40
29      40
30      28
31      32
32      32
33      40
34      32
35      24
36      24
37      40
38      25
39      24
Name: ContractHours, dtype: object
Run Code Online (Sandbox Code Playgroud)

所有其他列都消失了,32,5 需要是 32.5

jez*_*ael 5

使用参数decimal进行正确floats解析read_csv

Users = pd.read_csv("Users.csv", sep = ';', decimal=',')
Run Code Online (Sandbox Code Playgroud)

您的解决方案应更改为regex=True替换为子字符串:

Users = Users['ContractHours'].replace(',', '.', regex=True).astype(float)
Run Code Online (Sandbox Code Playgroud)