use*_*149 67 python macos import tab-delimited pandas
我一直在使用Pandas/Python在Windows中阅读制表符分隔的数据文件而没有任何问题.数据文件包含前三行中的注释,然后是标题.
df = pd.read_csv(myfile,sep='\t',skiprows=(0,1,2),header=(0))
Run Code Online (Sandbox Code Playgroud)
我现在正试图用我的Mac阅读这个文件.(我第一次在Mac上使用Python.)我收到以下错误.
pandas.parser.CParserError: Error tokenizing data. C error: Expected 1
fields in line 8, saw 39
Run Code Online (Sandbox Code Playgroud)
如果设置error_bad_lines的说法read_csv到假,我得到以下信息,这一直持续到最后一行的末尾.
Skipping line 8: expected 1 fields, saw 39
Skipping line 9: expected 1 fields, saw 125
Skipping line 10: expected 1 fields, saw 125
Skipping line 11: expected 1 fields, saw 125
Skipping line 12: expected 1 fields, saw 125
Skipping line 13: expected 1 fields, saw 125
Skipping line 14: expected 1 fields, saw 125
Skipping line 15: expected 1 fields, saw 125
Skipping line 16: expected 1 fields, saw 125
Skipping line 17: expected 1 fields, saw 125
...
Run Code Online (Sandbox Code Playgroud)
我是否需要为encoding参数指定一个值?好像我不应该这样,因为在Windows上阅读文件可以正常工作.
bra*_*ers 92
最大的线索是行都在一行返回.这表示行终止符被忽略或不存在.
您可以为csv_reader指定行终止符.如果你在mac上创建的行将以\r而不是linux标准结束,\n或者更好的仍然是windows的吊带和腰带方法\r\n.
pandas.read_csv(filename, sep='\t', lineterminator='\r')
Run Code Online (Sandbox Code Playgroud)
您还可以使用编解码器包打开所有数据.这可能以牺牲文档加载速度为代价来增加稳健性.
import codecs
doc = codecs.open('document','rU','UTF-16') #open for reading with "universal" type set
df = pandas.read_csv(doc, sep='\t')
Run Code Online (Sandbox Code Playgroud)