UnicodeDecodeError:'ascii'编解码器无法解码

Question

UnicodeDecodeError:'ascii'编解码器无法解码

lil*_*ood 8 python encoding file decoding representation

我正在使用file.readline()在Python中读取包含罗马尼亚语单词的文件.由于编码,我遇到了许多字符的问题.

示例:

>>> a = "abera?ie"  #type 'str'
>>> a -> 'abera\xc8\x9bie'
>>> print sys.stdin.encoding
UTF-8

Run Code Online (Sandbox Code Playgroud)

我已经尝试使用utf-8,cp500等编码(),但它不起作用.

我找不到哪个正确的字符编码我必须使用？

提前致谢.

编辑:目的是将文件中的单词存储在一个字典中,并在打印时获取aberaţie而不是'abera\xc8\x9bie'

Answer 1

Cla*_*diu 15

你想做什么？

这是一组字节:

BYTES = 'abera\xc8\x9bie'

Run Code Online (Sandbox Code Playgroud)

它是一组字节,表示utf-8字符串"aberaţie" 的编码.您解码字节以获取您的unicode字符串:

>>> BYTES 
'abera\xc8\x9bie'
>>> print BYTES 
aberaÈ›ie
>>> abberation = BYTES.decode('utf-8')
>>> abberation 
u'abera\u021bie'
>>> print abberation 
abera?ie

Run Code Online (Sandbox Code Playgroud)

如果要将unicode字符串存储到文件中,则必须将其编码为您选择的特定字节格式:

>>> abberation.encode('utf-8')
'abera\xc8\x9bie'
>>> abberation.encode('utf-16')
'\xff\xfea\x00b\x00e\x00r\x00a\x00\x1b\x02i\x00e\x00'

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，5 月前
查看次数：	10667 次
最近记录：	14 年，3 月前