我有一个包含这样数据的文件:
2.10.2014 23:30:00,"25,1",nan,nan,nan
2.10.2014 23:30:00,nan,"15,2",nan,nan
2.10.2014 23:30:00,nan,nan,"125,14",nan
2.10.2014 23:45:00,nan,0,nan,nan
Run Code Online (Sandbox Code Playgroud)
我想读这个文件.所需的输出:
2.10.2014 23:30:00 25.1 nan nan nan
2.10.2014 23:30:00 nan 15.2 nan nan
2.10.2014 23:30:00 nan nan 125.14 nan
2.10.2014 23:45:00 nan 0 nan nan
Run Code Online (Sandbox Code Playgroud)
重要的是要注意,如果0出现值,引号就会消失.
在这一刻,我的代码看起来像这样:
import pandas as pd
import csv
df=pd.read_csv("file.csv",
sep=',\s+',
quoting=csv.QUOTE_NONE,
header=None,
encoding="mbcs")
Run Code Online (Sandbox Code Playgroud)
结果是:
"2.10.2014 23:30:00,""25,1"",nan,nan,nan"
Run Code Online (Sandbox Code Playgroud)
而不是quoting=csv.QUOTE_NONE我也试过使用escapechar='"'
传递decimal=','给read_csv:
In [28]:
import io
import pandas as pd
t="""2.10.2014 23:30:00,"25,1",nan,nan,nan
2.10.2014 23:30:00,nan,"15,2",nan,nan
2.10.2014 23:30:00,nan,nan,"125,14",nan
2.10.2014 23:45:00,nan,0,nan,nan"""
pd.read_csv(io.StringIO(t), decimal=',', header=None)
Out[28]:
0 1 2 3 4
0 2.10.2014 23:30:00 25.1 NaN NaN NaN
1 2.10.2014 23:30:00 NaN 15.2 NaN NaN
2 2.10.2014 23:30:00 NaN NaN 125.14 NaN
3 2.10.2014 23:45:00 NaN 0.0 NaN NaN
Run Code Online (Sandbox Code Playgroud)
另外,您可以将parse_dates=[0]第一列解释为datetime:
In [31]:
pd.read_csv(io.StringIO(t), decimal=',', header=None, parse_dates=[0])
Out[31]:
0 1 2 3 4
0 2014-02-10 23:30:00 25.1 NaN NaN NaN
1 2014-02-10 23:30:00 NaN 15.2 NaN NaN
2 2014-02-10 23:30:00 NaN NaN 125.14 NaN
3 2014-02-10 23:45:00 NaN 0.0 NaN NaN
Run Code Online (Sandbox Code Playgroud)
在您的情况下忽略该io.StringIO位,这只是我从文本字符串加载您的数据只是做:
df=pd.read_csv("file.csv", sep=',\s+', quoting=csv.QUOTE_NONE, header=None, decimal=',', parse_dates=[0], encoding="mbcs")
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1078 次 |
| 最近记录: |