使用 .encode 使用 utf8 读取行

Question

使用 .encode 使用 utf8 读取行

我从文件中读取行，例如：

\n\n

\n
小大事：163 Wege zur Spitzenleistung (Dein Leben)（德语版）（彼得斯、汤姆）
\n\n
Die madlle Katastrophe: So f\xc3\xbchren Sie Teams \xc3\xbcber Distanz zur\n Spitzenleistung (德语版) (Thomas, Gary)
\n

\n\n

我用以下方法读取/编码它们：

\n\n

title = line.encode(\'utf8\')\n

Run Code Online (Sandbox Code Playgroud)\n\n

但输出是：

\n\n

\n
b\'Die 美德灾难：所以 f\\xc3\\xbchren Sie Teams \\xc3\\xbcber\n Distanz zur Spitzenleistung（德语版）（托马斯、加里）\'
\n\n
b\'The Little Big Things: 163 Wege zur Spitzenleistung (Dein Leben)\n（德语版）（Peters，Tom）\'
\n

\n\n

为什么总是添加“b\'”？\n如何正确读取文件以便保留“元音变音”？

\n\n

这是完整的相关代码片段：

\n\n

# Parse the clippings.txt file\nlines = [line.strip() for line in codecs.open(config[\'CLIPPINGS_FILE\'], \'r\', \'utf-8-sig\')]\nfor line in lines:\n    line_count = line_count + 1\n    if (line_count == 1 or is_title == 1):\n        # ASSERT: this is a title line\n        #title = line.encode(\'ascii\', \'ignore\')\n        title = line.encode(\'utf8\')\n        prev_title = 1\n        is_title = 0\n        note_type_result = note_type = l = l_result = location = ""\n        continue\n

Run Code Online (Sandbox Code Playgroud)\n\n

谢谢

\n

Answer 1

Max*_*Noe 5

该方法str.encode将 unicode 字符串转换为bytes对象：

str.encode(encoding="utf-8", errors="strict")
将字符串的编码版本作为字节对象返回。默认编码为“utf-8”。可以给出错误来设置不同的错误处理方案。错误的默认值是“strict”，这意味着编码错误会引发 UnicodeError。其他可能的值包括“ignore”、“replace”、“xmlcharrefreplace”、“backslashreplace”以及通过 codecs.register_error() 注册的任何其他名称，请参阅错误处理程序部分。有关可能的编码的列表，请参阅标准编码部分。

所以你得到的正是你所期望的。

在大多数机器上，您只能open读取文件。如果文件编码不是系统默认编码，您可以将其作为关键字参数传递：

with open(filename, encoding='utf8') as f:
    line = f.readline()

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年前
查看次数：	21148 次
最近记录：	10 年前