Mar*_*man 1 python dataframe pandas
我目前使用 Jupyter 笔记本来分析公司数据。我的第一步是清理和格式化数据。到目前为止我的代码是:
%matplotlib inline
# First, we'll import pandas, a data processing and CSV file I/O library
import pandas as pd
# We'll also import seaborn, a Python graphing library
import warnings # current version of seaborn generates a bunch of warnings that we'll ignore
warnings.filterwarnings("ignore")
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
sns.set(style="dark", color_codes=True)
Users = pd.read_csv("Users.csv", delimiter = ';', engine = 'python') # maak een pandas dataframe per bestand
Users['ContractHours'].fillna(0, inplace = True)
Users['ContractHours'] = Users['ContractHours'].apply(pd.to_numeric)
Run Code Online (Sandbox Code Playgroud)
之后,我尝试将 ContractHours 列中的 NaN 值替换为零,并将该列转换为浮点数。将 NaN 替换为 0 成功。但我收到错误:
ValueError Traceback (most recent call last)
pandas\_libs\src\inference.pyx in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56156)()
ValueError: Unable to parse string "32,5"
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-22-bcb66b8c06fb> in <module>()
20 #Users = Users['ContractHours'].replace(',', '.')
21 Users['ContractHours'].fillna(0, inplace = True)
---> 22 Users['ContractHours'] = Users['ContractHours'].apply(pd.to_numeric)
23
24 #print(Customers.head(10))
C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
2353 else:
2354 values = self.asobject
-> 2355 mapped = lib.map_infer(values, f, convert=convert_dtype)
2356
2357 if len(mapped) and isinstance(mapped[0], Series):
pandas\_libs\src\inference.pyx in pandas._libs.lib.map_infer (pandas\_libs\lib.c:66645)()
C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\tools\numeric.py in to_numeric(arg, errors, downcast)
124 coerce_numeric = False if errors in ('ignore', 'raise') else True
125 values = lib.maybe_convert_numeric(values, set(),
--> 126 coerce_numeric=coerce_numeric)
127
128 except Exception:
pandas\_libs\src\inference.pyx in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56638)()
ValueError: Unable to parse string "32,5" at position 0
Run Code Online (Sandbox Code Playgroud)
如何将字符串“32,5”解析为“ContractHours”列中的浮点数?
我还尝试用“.”替换“,” 之前,但它会导致所有其他列消失,并且逗号仍然是逗号。
Users = Users['ContractHours'].replace(',', '.')
Run Code Online (Sandbox Code Playgroud)
结果是:
0 34
1 24
2 40
3 35
4 40
5 24
6 32
7 32
8 32
9 24
10 24
11 24
12 24
13 0
14 32
15 28
16 32
17 32
18 28
19 24
20 40
21 40
22 36
23 24
24 32,5
25 36
26 36
27 24
28 40
29 40
30 28
31 32
32 32
33 40
34 32
35 24
36 24
37 40
38 25
39 24
Name: ContractHours, dtype: object
Run Code Online (Sandbox Code Playgroud)
所有其他列都消失了,32,5 需要是 32.5
使用参数decimal进行正确floats解析read_csv:
Users = pd.read_csv("Users.csv", sep = ';', decimal=',')
Run Code Online (Sandbox Code Playgroud)
您的解决方案应更改为regex=True替换为子字符串:
Users = Users['ContractHours'].replace(',', '.', regex=True).astype(float)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
9944 次 |
| 最近记录: |