熊猫读取csv忽略最后一列的结尾分号

Question

熊猫读取csv忽略最后一列的结尾分号

我的数据文件如下所示：

data.txt
user,activity,timestamp,x-axis,y-axis,z-axis
0,33,Jogging,49105962326000,-0.6946376999999999,12.680544,0.50395286;
1,33,Jogging,49106062271000,5.012288,11.264028,0.95342433;
2,33,Jogging,49106112167000,4.903325,10.882658000000001,-0.08172209;
3,33,Jogging,49106222305000,-0.61291564,18.496431,3.0237172;

Run Code Online (Sandbox Code Playgroud)

可以看出，最后一列以分号结尾，所以当我读入熊猫时，该列被推断为类型对象（以分号结尾。

df = pd.read_csv('data.txt')
df
    user    activity    timestamp   x-axis  y-axis  z-axis
0   33  Jogging     49105962326000  -0.694638   12.680544   0.50395286;
1   33  Jogging     49106062271000  5.012288    11.264028   0.95342433;
2   33  Jogging     49106112167000  4.903325    10.882658   -0.08172209;
3   33  Jogging     49106222305000  -0.612916   18.496431   3.0237172;

Run Code Online (Sandbox Code Playgroud)

我如何让熊猫忽略那个分号？

Answer 1

Nik*_*ido 13

您的 txt 的问题在于它包含混合内容。正如我所看到的，标题没有分号作为终止符

如果您更改添加分号的第一行，则非常简单

pd.read_csv("data.txt", lineterminator=";")

Run Code Online (Sandbox Code Playgroud)

Answer 2

pol*_*ist 8

可能不是这种情况，但它在示例中有效。

在文档中，您可以找到以下comment参数：

表示不应解析行的其余部分。如果在一行的开头找到，该行将被完全忽略。此参数必须是单个字符。与空行一样（只要skip_blank_lines=True），完全注释的行会被参数标题忽略，但不会被skiprows 忽略。例如，如果comment='#'，用header=0 解析#empty\na,b,c\n1,2,3 将导致'a,b,c' 被视为标题。

所以 if;只能在最后一列的末尾找到：

>>> df = pd.read_csv("data.txt", comment=";")
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   user       4 non-null      int64  
 1   activity   4 non-null      object 
 2   timestamp  4 non-null      int64  
 3   x-axis     4 non-null      float64
 4   y-axis     4 non-null      float64
 5   z-axis     4 non-null      float64
dtypes: float64(3), int64(2), object(1)
memory usage: 224.0+ bytes
>>> df
   user activity       timestamp    x-axis     y-axis    z-axis
0    33  Jogging  49105962326000 -0.694638  12.680544  0.503953
1    33  Jogging  49106062271000  5.012288  11.264028  0.953424
2    33  Jogging  49106112167000  4.903325  10.882658 -0.081722
3    33  Jogging  49106222305000 -0.612916  18.496431  3.023717

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，3 月前
查看次数：	646 次
最近记录：	5 年，3 月前