python 2.7编码解码

Question

python 2.7编码解码

我有一个涉及编码/解码的问题。\n我从文件中读取文本并将其与数据库 (Postgres) 中的文本进行比较\n比较是在两个列表内完成的

\n\n

从文件中我得到“jo\\x9a”为“jo\xc5\xa1”，从数据库中我得到“jo\\xc5\\xa1”为相同的值

\n\n

common = [a for a in codes_from_file if a in kode_prfoksov]\n\n# Items in one but not the other\nonly1 = [a for a in codes_from_file if not a in kode_prfoksov]\n\n#Items only in another\nonly2 = [a for a in kode_prfoksov if not a in codes_from_file ]\n

Run Code Online (Sandbox Code Playgroud)\n\n

怎么解决这个问题呢？比较这两个字符串时应该设置哪种编码来解决问题？

\n\n

谢谢

\n

Answer 1

str*_*nac 5

第一个好像是windows-1250，第二个好像是utf-8。

\n\n

>>> print 'jo\\x9a'.decode('windows-1250')\njo\xc5\xa1\n>>> print 'jo\\xc5\\xa1'.decode('utf-8')\njo\xc5\xa1\n>>> 'jo\\x9a'.decode('windows-1250') == 'jo\\xc5\\xa1'.decode('utf-8')\nTrue\n

Run Code Online (Sandbox Code Playgroud)\n

Answer 2

jof*_*fel 4

您的文件字符串似乎是 Windows-1250 编码的。您的数据库似乎包含 UTF-8 字符串。

因此，您可以首先将所有字符串转换为 unicode：

codes_from_file = [a.decode("windows-1250") for a in codes_from_file]
kode_prfoksov]  = [a.decode("utf-8") for a in codes_from_file]

Run Code Online (Sandbox Code Playgroud)

或者，如果您不需要 unicode 字符串，只需将文件字符串转换为 UTF-8：

codes_from_file = [a.decode("windows-1250").encode("utf-8") for a in codes_from_file]

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，11 月前
查看次数：	10372 次
最近记录：	13 年，11 月前