错误：Python 3 中的“‘ascii’编解码器无法解码字节 0xd8”

Question

错误：Python 3 中的“‘ascii’编解码器无法解码字节 0xd8”

lea*_*erX 0 python unicode utf-8 python-3.3

我编写了一个程序，它递归地搜索文件夹中具有特定扩展名的文件并进行一些处理。奇怪的是，该程序在处理大约 85 个文件时运行良好，然后每次在同一文件上都会崩溃。我不认为该文件或文件名有什么不同。因为它对于 85 个文件运行良好，所以我知道错误不是关于我的代码本身，而是更多关于错误的编译器？

操作系统：Linux arctic 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u1 x86_64 GNU/Linux

错误详细信息（完整回溯）：

Traceback (most recent call last):
  File "scoretotal.py", line 98, in <module>
    main()   
  File "scoretotal.py", line 96, in main
    find_score_files()
  File "scoretotal.py", line 89, in find_score_files
    total = calculate_total((os.path.join(root,filename)))
  File "scoretotal.py", line 14, in calculate_total
    lines = file_object_read.read()
  File    "/soft/linux/bin/../python3.3.3/lib/python3.3/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 17: ordinal not in range(128)

Run Code Online (Sandbox Code Playgroud)

我在跑Python 3.3.3。根据我在网上的研究，它可能与 unicode 或 UTF-8 格式有关，但我一生都无法弄清楚。出了什么问题？

Answer 1

Mar*_*som 5

当你打开一个文件而不指定编码时，Python 会为你选择一种编码；在你的情况下，它选择了ascii，这是相当安全的，因为它不太可能给你返回错误的字符，但很容易出错。您需要检查这些文件的源以找出它们的编码并将其包含在调用中open。例如，如果您确定文件是使用 ISO-8859-1 编码写入的：

file_object_read = open(path, 'r', encoding='iso-8859-1')

Run Code Online (Sandbox Code Playgroud)

如果您不知道要使用什么编码，您将不得不猜测，并接受有时您的猜测是错误的。在 Linux 上您可以尝试'utf-8'，在 Windows 上您可以尝试'mbcs'，因为这些是这些系统上其他程序使用的默认值。有一些实用程序可以检查文件内容并尝试做出有根据的猜测，包括chardet包.

归档时间：	10 年，6 月前
查看次数：	3354 次
最近记录：	10 年，6 月前