我有一个代码,这样:
a = "\u0432"
b = u"\u0432"
c = b"\u0432"
d = c.decode('utf8')
print(type(a), a)
print(type(b), b)
print(type(c), c)
print(type(d), d)
Run Code Online (Sandbox Code Playgroud)
并输出:
<class 'str'> ?
<class 'str'> ?
<class 'bytes'> b'\\u0432'
<class 'str'> \u0432
Run Code Online (Sandbox Code Playgroud)
为什么在后一种情况下我看到的是字符代码,而不是字符?我如何将Byte字符串转换为Unicode字符串,在输出的情况下,我看到了字符而不是代码?
它在使用cyryllic时引用了UnicodeDecodeError.我在Python 3.3和Pycharm 2.7.2中遇到同样的问题.尝试在代码中硬编码编码,在Pycharm选项中手动指定编码,但没有效果.它仍然试图用cp1251 lib打开utf-8文件.
Connected to pydev debugger (build 129.314)
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm 2.7.2\helpers\pydev\pydevd.py", line 1481, in <module>
debugger.run(setup['file'], None, None)
File "C:\Program Files (x86)\JetBrains\PyCharm 2.7.2\helpers\pydev\pydevd.py", line 1124, in run
pydev_imports.execfile(file, globals, locals) #execute the script
File "C:\Program Files (x86)\JetBrains\PyCharm 2.7.2\helpers\pydev\_pydev_execfile.py", line 33, in execfile
contents = stream.read()
File "C:\Python33\lib\encodings\cp1251.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 2839: character maps to <undefined> …Run Code Online (Sandbox Code Playgroud)