从 .txt 文件读取撇号时出现乱码

viG*_*027 0 python character-encoding

从 .txt 文件读取行时遇到问题。我的文件包含带有以下单词的句子

\n
\n

没有\xe2\x80\x99t ,可以\xe2\x80\x99t ,没有\xe2\x80\x99t

\n
\n

等等,问题是当我使用read()方法时

\n
\n

\xe2\x80\x99

\n
\n

我有类似的东西:

\n
\n

\xc3\xa2\xe2\x82\xac\xe2\x84\xa2

\n
\n

所以我读到的词hadn\xc3\xa2\xe2\x82\xac\xe2\x84\xa2thadn\xe2\x80\x99t

\n

我的输入:

\n
Love at First Sight\n\nOne <adjective> afternoon, I was walking by the <place> when\naccidentally I bumped into a <adjective> boy.\nAt first I blushed and apologized for bumping into him, but when he flashed his\n<adjective> smile I just couldn\xe2\x80\x99t help falling in love. His\n<adjective> voice telling me that it was ok sounded like music to myears.\nI could have stayed there staring at him for <period_of_time>.\nHe had <adjective> <color> eyes and <adjective>\n<color> hair. I thought he was perfect for me. Before I noticed,\n<number> <period_of_time> had passed by after I apologized,\nand I hadn\xe2\x80\x99t said anything else since!\nThat\xe2\x80\x99s when I noticed thathe was looking at me\n<adverb>. I didn\xe2\x80\x99t know what tosay, so I just <past_verb>.\nI noticed him giving me astrange look when he started walking to his\n<noun>.I looked back at him <number> more time(s), but hewas already out of sight.\nIt wasn\xe2\x80\x99t love after all\n
Run Code Online (Sandbox Code Playgroud)\n

预期输出:与输入文件相同

\n

我的代码:

\n
f = open(\'loveatfirstsight.txt\',\'r\')\nfor i in f.readlines():\n    print(i)\n
Run Code Online (Sandbox Code Playgroud)\n

我的操作系统:Windows 10

\n

use*_*170 5

该文件以 UTF-8 编码,但您读取它时就好像它是(我假设)windows-1252(或其他一些 Windows 特定的编码)。由于这个文件中出现的撇号字符不是典型的 ASCII \xe2\x80\x98typewriter apostrope\xe2\x80\x99 ( 'U+0027 APOSTROPHE) ,而是一个 \xe2\x80\x98typographer\xe2\x80\x99s apostrope\xe2 \x80\x99 ( \xe2\x80\x99U+2019 右单引号) 位于基本拉丁语 (\xe2\x80\x98ASCII\xe2\x80\x99) 块之外,不匹配的编码使字符出现损坏。

\n
>>> 'hadn\xe2\x80\x99t'.encode('utf-8').decode('cp1252')\n'hadn\xc3\xa2\xe2\x82\xac\xe2\x84\xa2t'\n
Run Code Online (Sandbox Code Playgroud)\n

encoding要解决此问题,您应该通过函数的参数指定正确的编码open

\n
f = open('loveatfirstsight.txt', 'r', encoding='utf-8')\nfor i in f.readlines():\n    print(i)\n
Run Code Online (Sandbox Code Playgroud)\n

正如help(open)所解释的那样,

\n
\n

在文本模式下,如果encoding未指定,则使用的编码与平台相关:locale.getpreferredencoding(False)调用以获取当前区域设置编码。(对于读取和写入原始字节,请使用二进制\n模式并保留encoding未指定。)

\n
\n