使用引号中的值和逗号作为小数点读取csv(逗号分隔文件)

Mic*_*hal 1 python pandas

我有一个包含这样数据的文件:

    2.10.2014 23:30:00,"25,1",nan,nan,nan
    2.10.2014 23:30:00,nan,"15,2",nan,nan
    2.10.2014 23:30:00,nan,nan,"125,14",nan
    2.10.2014 23:45:00,nan,0,nan,nan
Run Code Online (Sandbox Code Playgroud)

我想读这个文件.所需的输出:

    2.10.2014 23:30:00 25.1  nan   nan     nan
    2.10.2014 23:30:00 nan   15.2  nan     nan
    2.10.2014 23:30:00 nan   nan   125.14  nan
    2.10.2014 23:45:00 nan   0     nan     nan
Run Code Online (Sandbox Code Playgroud)

重要的是要注意,如果0出现值,引号就会消失.

在这一刻,我的代码看起来像这样:

     import pandas as pd
     import csv

     df=pd.read_csv("file.csv",
                    sep=',\s+',
                    quoting=csv.QUOTE_NONE, 
                    header=None, 
                    encoding="mbcs")
Run Code Online (Sandbox Code Playgroud)

结果是:

     "2.10.2014 23:30:00,""25,1"",nan,nan,nan"
Run Code Online (Sandbox Code Playgroud)

而不是quoting=csv.QUOTE_NONE我也试过使用escapechar='"'

EdC*_*ica 5

传递decimal=','read_csv:

In [28]:
import io
import pandas as pd
t="""2.10.2014 23:30:00,"25,1",nan,nan,nan
    2.10.2014 23:30:00,nan,"15,2",nan,nan
    2.10.2014 23:30:00,nan,nan,"125,14",nan
    2.10.2014 23:45:00,nan,0,nan,nan"""
pd.read_csv(io.StringIO(t), decimal=',', header=None)

Out[28]:
                        0     1     2       3   4
0      2.10.2014 23:30:00  25.1   NaN     NaN NaN
1      2.10.2014 23:30:00   NaN  15.2     NaN NaN
2      2.10.2014 23:30:00   NaN   NaN  125.14 NaN
3      2.10.2014 23:45:00   NaN   0.0     NaN NaN
Run Code Online (Sandbox Code Playgroud)

另外,您可以将parse_dates=[0]第一列解释为datetime:

In [31]:
pd.read_csv(io.StringIO(t), decimal=',', header=None, parse_dates=[0])

Out[31]:
                    0     1     2       3   4
0 2014-02-10 23:30:00  25.1   NaN     NaN NaN
1 2014-02-10 23:30:00   NaN  15.2     NaN NaN
2 2014-02-10 23:30:00   NaN   NaN  125.14 NaN
3 2014-02-10 23:45:00   NaN   0.0     NaN NaN
Run Code Online (Sandbox Code Playgroud)

在您的情况下忽略该io.StringIO位,这只是我从文本字符串加载您的数据只是做:

df=pd.read_csv("file.csv", sep=',\s+', quoting=csv.QUOTE_NONE, header=None, decimal=',', parse_dates=[0], encoding="mbcs")
Run Code Online (Sandbox Code Playgroud)