python阿拉伯编码问题

Ami*_*sen 3 python encoding

我有一个Windows-1256编码的文本。现在我想将文本从阿拉伯语(windows-1256)转换为utf-8

示范文本 :

Óæí Ïæã ÈíåÞí
Run Code Online (Sandbox Code Playgroud)

结果:

??? ??? ?????
Run Code Online (Sandbox Code Playgroud)

我使用此代码解码并编码为utf-8

# -*- coding: utf-8 -*-

data = "Óæí Ïæã ÈíåÞí"
print data.decode("windows-1256", "replace")
print data.encode("windows-1256")
Run Code Online (Sandbox Code Playgroud)

该代码返回以下结果:

?“?¦?­ ???¦?£ ?ˆ?­?¥???­
Traceback (most recent call last):
  File "mohmal2.py", line 5, in <module>
    print data.encode("windows-1256")
  File "/usr/lib/python2.7/encodings/cp1256.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
Run Code Online (Sandbox Code Playgroud)

我找到了可以转换此文本的网站:

http://www.iosart.com

Jos*_*Lee 5

看来您不小心将输入解码为Windows-1252。

>>> "Óæí Ïæã ÈíåÞí".encode('cp1252').decode('cp1256')
'??? ??? ?????'
Run Code Online (Sandbox Code Playgroud)