UnicodeDecodeError:'charmap'编解码器无法解码位置Y中的字节X:字符映射到<undefined>

Question

UnicodeDecodeError:'charmap'编解码器无法解码位置Y中的字节X:字符映射到<undefined>

Ede*_*row 449 windows unicode file-io decode python-3.x

我正在尝试使用填充了信息的文本文件来对Python 3程序进行一些操作.但是,在尝试读取文件时,我收到以下错误:

回溯(最近一次调用最后一次):文件"SCRIPT LOCATION",第NUMBER行,在text = file.read()文件"C:\ Python31\lib\encodings\cp1252.py",第23行,在解码中返回codecs.charmap_decode (input,self.errors,decoding_table)[0] UnicodeDecodeError:'charmap'编解码器无法解码2907500位的字节0x90:字符映射到

如果有人能给我任何帮助试图解决这个问题,我将非常感激.

Answer 1

Len*_*bro 777

有问题的文件没有使用CP1252编码.它正在使用另一种编码.你需要弄清楚哪一个.常见的是Latin-1和UTF-8.因为0x90实际上并不意味着什么Latin-1,UTF-8(其中0x90是一个连续字节)更有可能.

您在打开文件时指定编码:

file = open(filename, encoding="utf8")

Run Code Online (Sandbox Code Playgroud)

很酷,我尝试在Python 3.4中运行的一些Python 2.7代码存在这个问题.Latin-1为我工作! (18认同)
@ 1vand1ng0:拉丁文1当然有效; 无论文件的实际编码是什么,它都适用于任何文件.这是因为文件中的所有256个可能的字节值都有一个要映射的Latin-1代码点,但这并不意味着你得到清晰的结果!如果您不知道编码,即使以二进制模式打开文件也可能比假设Latin-1更好. (7认同)
谢谢@ 1vand1ng0 utf-8对我没有用,但是拉丁语-1做了 (5认同)
即使编码已在 open() 中正确指定为 UTF-8（如上所示），我仍收到 OP 错误。有任何想法吗？ (3认同)
如果您使用的是Python 2.7,并得到相同的错误,请尝试`io`模块:`io.open(filename,encoding ="utf8")` (2认同)
`filename = "C:\Report.txt" with open(filename,encoding ="utf8") as my_file: text = my_file.read() print(text)` 即使在使用这个之后我也遇到了同样的错误。我也尝试过其他编码，但都是徒劳的。在这段代码中，我还使用了`from geotext import GeoText`。请提出解决方案。 (2认同)

Answer 2

Dec*_*zie 36

只是添加以防万一file = open(filename, encoding="utf8")无法尝试file = open(filename, errors='ignore')

一切都很好

警告：遇到未知字符时，这将导致数据丢失（根据您的情况可能会很好）。 (4认同)
建议的编码字符串应该有一个破折号，因此应该是：open(csv_file,encoding='utf-8')（在Python3上测试） (2认同)

Answer 3

Mat*_*ius 34

作为@LennartRegebro的扩展答案:

如果你不知道它是什么编码并且上面的解决方案不起作用(事实并非如此utf8)并且你发现自己只是猜测 - 你可以使用在线工具来识别它是什么编码.它们并不完美但通常工作得很好.在计算出编码后,您应该可以使用上面的解决方案.

编辑:(复制评论)

一个非常流行的文本编辑器Sublime Text有一个命令来显示已经设置的编码...

转到View- > Show Console(或Ctrl+ `)

键入底部的字段view.encoding()并希望最好(我无法得到任何东西,Undefined但也许你会有更好的运气......)

Sublime Text,也是 - 打开控制台并输入`view.encoding()`. (3认同)
一些文本编辑器也将提供此信息。我知道使用vim可以通过`：set fileencoding`（[从此链接]（http://superuser.com/questions/28779/how-do-i-find-the-encoding-of-the-- vim中的当前缓冲区）） (2认同)

Answer 4

小智 20

TLDR？尝试：file = open(filename, encoding='cp437)

为什么？一次使用时：

file = open(filename)
text = file.read()

Run Code Online (Sandbox Code Playgroud)

Python 假定该文件使用与当前环境相同的代码页（在开篇文章的情况下为 cp1252）并尝试将其解码为自己的默认 UTF-8。如果文件包含未在此代码页中定义的值的字符（如 0x90），我们将收到 UnicodeDecodeError。有时我们不知道文件的编码，有时文件的编码可能没有被 Python 处理（例如 cp790），有时文件可能包含混合编码。

如果不需要这些字符，可以决定用问号替换它们，如下：

file = open(filename, errors='replace')

Run Code Online (Sandbox Code Playgroud)

另一种解决方法是使用：

file = open(filename, errors='ignore')

Run Code Online (Sandbox Code Playgroud)

然后字符保持不变，但其他错误也将被掩盖。

很好的解决方案是指定编码，但不是任何编码（如 cp1252），而是定义所有字符的编码（如 cp437）：

file = open(filename, encoding='cp437')

Run Code Online (Sandbox Code Playgroud)

代码页 437 是原始的 DOS 编码。所有代码都已定义，因此在读取文件时没有错误，没有错误被屏蔽，字符被保留（没有完全保持完整但仍然可以区分）。

也许您应该更加强调随机猜测编码可能会产生垃圾。您必须_知道_数据的编码。 (5认同)
有许多编码“定义了所有字符”（您的真正意思是“将每个单字节值映射到一个字符”）。CP437 与 Windows/DOS 生态系统密切相关。在大多数情况下，Latin-1 (ISO-8859-1) 将是更好的起始猜测。 (2认同)

Answer 5

小智 9

不要浪费你的时间，只需在读写代码中添加以下内容encoding="cp437"和errors='ignore'代码：

open('filename.csv', encoding="cp437", errors='ignore')
open(file_name, 'w', newline='', encoding="cp437", errors='ignore')

Run Code Online (Sandbox Code Playgroud)

神速

在应用之前，请确保您希望将“0x90”解码为“É”。检查 `b'\x90'.decode('cp437')`。 (2认同)

Answer 6

Kyl*_*isi 5

或者，如果您不需要解码文件，例如将文件上传到网站，open(filename, 'rb')。r =读数，b =二进制

谢谢，我的问题就是这样 (2认同)
也许强调“b”将产生“bytes”而不是“str”数据。正如您所注意到的，如果您不需要以任何方式处理字节，那么这是合适的。 (2认同)

Answer 7

han*_*frc 5

在应用建议的解决方案之前，您可以检查文件（以及错误日志）中出现的 Unicode 字符是什么，在本例中0x90： https: //unicodelookup.com/#0x90/1（或直接在 Unicode Consortium网站http://www.unicode.org/charts/通过搜索0x0090）

然后考虑将其从文件中删除。

我有一个网页 https://tripleee.github.io/8bit/#90，您可以在其中查找 Python 支持的各种 8 位编码中的字符值。有了足够的数据点，您通常可以推断出合适的编码（尽管其中一些非常相似，因此准确地确定原始作者使用的编码通常也需要一些猜测）。 (2认同)

Answer 8

Jus*_* Me 5

def read_files(file_path):

    with open(file_path, encoding='utf8') as f:
        text = f.read()
        return text

Run Code Online (Sandbox Code Playgroud)

或（与）

def read_files(text, file_path):

    with open(file_path, 'rb') as f:
        f.write(text.encode('utf8', 'ignore'))

Run Code Online (Sandbox Code Playgroud)

或者

document = Document()
document.add_heading(file_path.name, 0)
    file_path.read_text(encoding='UTF-8'))
        file_content = file_path.read_text(encoding='UTF-8')
        document.add_paragraph(file_content)

Run Code Online (Sandbox Code Playgroud)

或者

def read_text_from_file(cale_fisier):
    text = cale_fisier.read_text(encoding='UTF-8')
    print("what I read: ", text)
    return text # return written text

def save_text_into_file(cale_fisier, text):
    f = open(cale_fisier, "w", encoding = 'utf-8') # open file
    print("Ce am scris: ", text)
    f.write(text) # write the content to the file

Run Code Online (Sandbox Code Playgroud)

或者

def read_text_from_file(file_path):
    with open(file_path, encoding='utf8', errors='ignore') as f:
        text = f.read()
        return text # return written text


def write_to_file(text, file_path):
    with open(file_path, 'wb') as f:
        f.write(text.encode('utf8', 'ignore')) # write the content to the file

Run Code Online (Sandbox Code Playgroud)

或者

import os
import glob

def change_encoding(fname, from_encoding, to_encoding='utf-8') -> None:
    '''
    Read the file at path fname with its original encoding (from_encoding)
    and rewrites it with to_encoding.
    '''
    with open(fname, encoding=from_encoding) as f:
        text = f.read()

    with open(fname, 'w', encoding=to_encoding) as f:
        f.write(text)

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，8 月前
查看次数：	510342 次
最近记录：	5 年，11 月前