UnicodeError: UTF-16 stream does not start with BOM

Py1*_*y11 5 python csv error-handling

I have trouble reading the csv file by python. My csv file has Korean and numbers.

Below is my python code.

import csv
import codecs
csvreader = csv.reader(codecs.open('1.csv', 'rU', 'utf-16'))
for row in csvreader:
    print(row)
Run Code Online (Sandbox Code Playgroud)

First, there was a UnicodeDecodeError when I enter "for row in csvreader" line in the above code.

So I used the code below then the problem seemed to be solved

csvreader = csv.reader(codecs.open('1.csv', 'rU', 'utf-16'))
Run Code Online (Sandbox Code Playgroud)

Then I ran into NULL byte error. Then I can't figure out what's wrong with the csv file.

[update] I don't think I changed anything from the previous code but my program shows "UnicodeError: UTF-16 stream does not start with BOM"

When I open the csv by excel I can see the table in proper format (image attached at the botton) but when I open it in sublime Text, below is a snippet of what I get.

504b 0304 1400 0600 0800 0000 2100 6322
f979 7701 0000 d405 0000 1300 0802 5b43
6f6e 7465 6e74 5f54 7970 6573 5d2e 786d
6c20 a204 0228 a000 0200 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
Run Code Online (Sandbox Code Playgroud)

If you need more information about my file, let me know!

I appreciate your help. Thanks in advance :)

csv file shown in excel

在此处输入图片说明

csv file shown in sublime text 在此处输入图片说明

aba*_*ert 6

问题在于您的输入文件显然不是以BOM表开头的(特殊字符对于Little-Endian和Big-Endian utf-16的编码方式有所不同),因此您不能只使用“ utf-16”作为编码,你必须明确地使用“ utf-16-le”或“ utf-16-be”。

如果你不这样做,codecs会猜测,如果猜测错误,它会尽量向后读取每个码点并获得非法值。

如果您发布的示例以偶数偏移量开始并且包含一堆ASCII,则该字符串的结尾很小,因此请使用-le版本。(但是,当然,要看它的实际含义比猜测要好。)


aba*_*ert 6

既然您\xe2\x80\x99 已经在问题中包含了更多文件,那么\xe2\x80\x99 根本就不是一个CSV 文件。我的猜测是,它\xe2\x80\x99是一个旧式的二进制XLS文件,但\xe2\x80\x99只是一个猜测。如果您只是将 spam.xls 重命名为 spam.csv,则可以执行此操作;您需要将其导出为 CSV 格式。(如果您需要这方面的帮助,请在另一个提供 Excel(而不是编程)帮助的网站上询问。)

\n\n

如果出于某种原因你不能这样做,PyPI 上有一些库可以解析 XLS 文件 xe2x80x94 但如果你想要 CSV,并且可以导出 CSV,那么 xe2x80x99 会更好主意。

\n