使用带标题的熊猫阅读csv

Question

使用带标题的熊猫阅读csv

我#的标题行中包含CSV文件：

s = '#one two three\n1 2 3'

Run Code Online (Sandbox Code Playgroud)

如果我使用pd.read_csv的#标志进入第一头：

import pandas as pd
from io import StringIO
pd.read_csv(StringIO(s), delim_whitespace=True)
     #one  two  three
0     1    2      3

Run Code Online (Sandbox Code Playgroud)

如果设置参数comment='#'，则pandas完全忽略该行。

有没有简单的方法来处理这种情况？

与之相关的第二个问题是在这种情况下如何处理报价，它不能工作#：

s = '"one one" two three\n1 2 3'
print(pd.read_csv(StringIO(s), delim_whitespace=True))
   one one  two  three
0        1    2      3

Run Code Online (Sandbox Code Playgroud)

它没有#：

s = '#"one one" two three\n1 2 3'
print(pd.read_csv(StringIO(s), delim_whitespace=True))
   #"one  one"  two  three
0      1     2    3    NaN

Run Code Online (Sandbox Code Playgroud)

谢谢！

++++++++++更新

这是第二个示例的测试。

s = '#"one one" two three\n1 2 3'
# here I am cheating slicing the string
wanted_result = pd.read_csv(StringIO(s[1:]), delim_whitespace=True)
# is there a way to achieve the same result configuring somehow read_csv?
assert wanted_result.equals(pd.read_csv(StringIO(s), delim_whitespace=True))

Run Code Online (Sandbox Code Playgroud)

Answer 1

far*_*awa 1

您可以通过以下方式重命名输出的第一个标头read_csv()：

import pandas as pd

from io import StringIO
df = pd.read_csv(StringIO(s), delim_whitespace=True)
new_name =  df.columns[0].split("#")[0]
df.rename(columns={df.columns[0]:new_name})

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，5 月前
查看次数：	1193 次
最近记录：	10 年，5 月前