Python 将乱码转换为希伯来语

ita*_*345 -1 python unicode character-encoding hebrew

这是我的代码:

\n\n
# -*- coding: utf-8-*-\narray=["\xc3\xa0","\xc3\xa1","\xc3\xa2","\xc3\xa3","\xc3\xa4","\xc3\xa5","\xc3\xa6","\xc3\xa7","\xc3\xa8","\xc3\xa9","\xc3\xaa","\xc3\xab","\xc3\xac","\xc3\xad","\xc3\xae","\xc3\xaf","\xc3\xb0","\xc3\xb1","\xc3\xb3","\xc3\xb4","\xc3\xb5","\xc3\xb6","\xc3\xb8","\xc3\xb9","\xc3\xba","\xc3\xbb","\xc3\xbc","\xc3\xbd","\xc3\xbe","\xc3\xbf"]\narray1=["\xd7\x90","\xd7\x91","\xd7\x92","\xd7\x93","\xd7\x94","\xd7\x95","\xd7\x96","\xd7\x97","\xd7\x98","\xd7\x99","\xd7\x9a","\xd7\x9b","\xd7\x9c","\xd7\x9d","\xd7\x9e","\xd7\x9f","\xd7\xa0","\xd7\xa1","\xd7\xa2","\xd7\xa3","\xd7\xa4","\xd7\xa5","\xd7\xa6","\xd7\xa7","\xd7\xa8","\xd7\xa9","\xd7\xaa"]\nstr="\xc3\xa1\xc3\xaf \xc3\xa9\xc3\xa4\xc3\xa5\xc3\xa3\xc3\xa4"\nmessage=""\nfor i in range(0,len(str)):\n   s=str[i]\n   index=-1\n   for j in range(0,len(array)):\n       if(array[j]==s):\n           index=j\n           break\n   if(index!=-1):\n   message+=array1[index]\n   print array1[index]\nprint message\n
Run Code Online (Sandbox Code Playgroud)\n\n

错误是:

\n\n
SyntaxError: EOL while scanning string literal\n
Run Code Online (Sandbox Code Playgroud)\n\n

在第 2 行

\n\n

我有一个希伯来语文本文件,但无论编码是什么,它总是以乱码显示。这是一个将其转换为希伯来语的 python 程序。原始文件位于 IS0-8859-1

\n

Mar*_*nen 6

正如@Martijn 所建议的,正确解码原始文件将是一个更好的解决方案。如果您的文件是希伯来语但显示array字符,则它可能显示为latin1cp1252编码。 cp1255看起来势均力敌。也许你的array1不太正确。另请注意,字符串是可迭代的,因此您可以简化数组:

\n\n
# coding: utf8\narray  = u\'\xc3\xa0\xc3\xa1\xc3\xa2\xc3\xa3\xc3\xa4\xc3\xa5\xc3\xa6\xc3\xa7\xc3\xa8\xc3\xa9\xc3\xaa\xc3\xab\xc3\xac\xc3\xad\xc3\xae\xc3\xaf\xc3\xb0\xc3\xb1\xc3\xb3\xc3\xb4\xc3\xb5\xc3\xb6\xc3\xb8\xc3\xb9\xc3\xba\xc3\xbb\xc3\xbc\xc3\xbd\xc3\xbe\xc3\xbf\'\narray1 = u\'\xd7\x90\xd7\x91\xd7\x92\xd7\x93\xd7\x94\xd7\x95\xd7\x96\xd7\x97\xd7\x98\xd7\x99\xd7\x9a\xd7\x9b\xd7\x9c\xd7\x9d\xd7\x9e\xd7\x9f\xd7\xa0\xd7\xa1\xd7\xa2\xd7\xa3\xd7\xa4\xd7\xa5\xd7\xa6\xd7\xa7\xd7\xa8\xd7\xa9\xd7\xaa\'\nprint(array)\nprint(array1)\nprint(array.encode(\'cp1252\').decode(\'cp1255\',errors=\'replace\'))\n
Run Code Online (Sandbox Code Playgroud)\n\n

上面的最后一行反转“不正确”的编码并用cp1255(希伯来语编码)对其进行解码。输出:

\n\n
\xc3\xa0\xc3\xa1\xc3\xa2\xc3\xa3\xc3\xa4\xc3\xa5\xc3\xa6\xc3\xa7\xc3\xa8\xc3\xa9\xc3\xaa\xc3\xab\xc3\xac\xc3\xad\xc3\xae\xc3\xaf\xc3\xb0\xc3\xb1\xc3\xb3\xc3\xb4\xc3\xb5\xc3\xb6\xc3\xb8\xc3\xb9\xc3\xba\xc3\xbb\xc3\xbc\xc3\xbd\xc3\xbe\xc3\xbf\n\xd7\x90\xd7\x91\xd7\x92\xd7\x93\xd7\x94\xd7\x95\xd7\x96\xd7\x97\xd7\x98\xd7\x99\xd7\x9a\xd7\x9b\xd7\x9c\xd7\x9d\xd7\x9e\xd7\x9f\xd7\xa0\xd7\xa1\xd7\xa2\xd7\xa3\xd7\xa4\xd7\xa5\xd7\xa6\xd7\xa7\xd7\xa8\xd7\xa9\xd7\xaa\n\xd7\x90\xd7\x91\xd7\x92\xd7\x93\xd7\x94\xd7\x95\xd7\x96\xd7\x97\xd7\x98\xd7\x99\xd7\x9a\xd7\x9b\xd7\x9c\xd7\x9d\xd7\x9e\xd7\x9f\xd7\xa0\xd7\xa1\xd7\xa3\xd7\xa4\xd7\xa5\xd7\xa6\xd7\xa8\xd7\xa9\xd7\xaa\xef\xbf\xbd\xef\xbf\xbd\xe2\x80\x8e\xe2\x80\x8f\xef\xbf\xbd\n
Run Code Online (Sandbox Code Playgroud)\n\n

这不是完美的匹配,但足够接近,我认为您的原始文件是用cp1255.

\n