如何将 .write() 用于外语字符（ã、à、ê、ó、...）

Question

如何将 .write() 用于外语字符（ã、à、ê、ó、...）

Ped*_*ade 2 python encoding glob python-3.x

我正在 Python 3 中处理一个小项目，我必须扫描一个装满文件的驱动器并输出一个 .txt 文件，其中包含驱动器内所有文件的路径。问题是一些文件是巴西葡萄牙语，其中包含“重音字母”，例如“não”、“você”等，而这些特殊字母在最终的 .txt 中被错误地输出。

代码只是下面这几行：

import glob

path = r'path/path'

files = [f for f in glob.glob(path + "**/**", recursive=True)]

with open("file.txt", 'w') as output:
    for row in files:
        output.write(str(row.encode('utf-8') )+ '\n')

Run Code Online (Sandbox Code Playgroud)

输出示例

path\folder1\Treino_2.doc
path\folder1\Treino_1.doc
path\folder1\\xc3\x81gua de Produ\xc3\xa7\xc3\xa3o.doc

Run Code Online (Sandbox Code Playgroud)

最后一行显示了一些输出是如何错误的，因为x81gua de Produ\xc3\xa7\xc3\xa3o应该是Régua de Produção

Answer 1

Mar*_*ers 5

Python文件处理Unicode文本（包括巴西重音字符）直接。您需要做的就是在文本模式下使用该文件，这是默认设置，除非您明确要求open()提供二进制文件。"w"给你一个可写的文本文件。

但是，您可能希望通过使用函数的encoding参数open()来明确编码：

with open("file.txt", "w", encoding="utf-8") as output:
    for row in files:
        output.write(row + "\n")

Run Code Online (Sandbox Code Playgroud)

如果您没有明确设置编码，则会选择特定于系统的默认值。并非所有编码都可以编码所有可能的 Unicode 代码点。这种情况在 Windows 上比在其他操作系统上发生得更多，在其他操作系统上，默认的 ANSI 代码页会导致charmap codepage can't encode character错误，但如果当前区域设置配置为使用非 Unicode 编码，则在其他操作系统上也可能发生这种情况。

难道没有编码的字节，然后转换所产生的bytes与再次对象返回一个字符串str()。这只会使字符串表示和转义以及b那里的前缀变得一团糟：

>>> path = r"path\folder1\Água de Produção.doc"
>>> v.encode("utf8")  # bytes are represented with the "b'...'" syntax
b'path\\folder1\\\xc3\x81gua de Produ\xc3\xa7\xc3\xa3o.doc'
>>> str(v.encode("utf8"))  # converting back with `str()` includes that syntax
"b'path\\\\folder1\\\\\\xc3\\x81gua de Produ\\xc3\\xa7\\xc3\\xa3o.doc'"

Run Code Online (Sandbox Code Playgroud)

请参见python 字符串前的 ab 前缀是什么意思？有关此处发生的情况的更多详细信息。

归档时间：	5 年，6 月前
查看次数：	67 次
最近记录：	5 年，6 月前