Mar*_*kum 103 python unicode utf-8
我在尝试将字符串编码为UTF-8时遇到了一些问题.我尝试过很多东西,包括使用string.encode('utf-8')和unicode(string),但是我得到了错误:
UnicodeDecodeError:'ascii'编解码器无法解码位置1的字节0xef:序数不在范围内(128)
这是我的字符串:
(?????)?
Run Code Online (Sandbox Code Playgroud)
我不知道出了什么问题,任何想法?
编辑:问题是打印字符串不正确显示.此外,当我尝试转换它时出现此错误:
Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
Run Code Online (Sandbox Code Playgroud)
Nic*_*ood 70
这与您的终端编码未设置为UTF-8有关.这是我的终端
$ echo $LANG
en_GB.UTF-8
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
(?????)?
>>>
Run Code Online (Sandbox Code Playgroud)
在我的终端上,该示例适用于上述,但如果我摆脱了LANG设置,那么它将无法工作
$ unset LANG
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
>>>
Run Code Online (Sandbox Code Playgroud)
查阅linux变体的文档,了解如何使此变更永久化.
mat*_*ata 24
尝试:
string.decode('utf-8') # or:
unicode(string, 'utf-8')
Run Code Online (Sandbox Code Playgroud)
编辑:
'(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'.decode('utf-8')给出u'(\uff61\uff65\u03c9\uff65\uff61)\uff89',这是正确的.
所以你的问题必须在某个地方,可能如果你试图用它做某事是有隐式转换(可能是打印,写入流...)
要说更多,我们需要看一些代码.
pep*_*epr 21
我在/sf/answers/739338561/和Nick Craig-Wood的演示中发表评论.您已正确解码字符串.问题在于print命令,因为它将Unicode字符串转换为控制台编码,并且控制台无法显示字符串.尝试将字符串写入文件,并使用支持Unicode的一些不错的编辑器查看结果:
import codecs
s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
s1 = s.decode('utf-8')
f = codecs.open('out.txt', 'w', encoding='utf-8')
f.write(s1)
f.close()
Run Code Online (Sandbox Code Playgroud)
然后你会看到(?????)?.
尝试utf-8在脚本开头设置系统默认编码,以便使用该编码对所有字符串进行编码.
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
Run Code Online (Sandbox Code Playgroud)
如果你是一个对工作的远程主机,看看/etc/ssh/ssh_config您的本地 PC.
当此文件包含一行时:
SendEnv LANG LC_*
Run Code Online (Sandbox Code Playgroud)
通过#在行首添加来评论它.它可能有所帮助.
使用此行,ssh将PC的语言相关环境变量发送到远程主机.它会导致很多问题.
可以按照Andrei Krasutski 的建议在脚本顶部使用以下代码。
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
Run Code Online (Sandbox Code Playgroud)
但我建议你也在# -*- coding: utf-8 -*脚本的最顶部添加一行。
在我尝试执行basic.py.
$ python basic.py
File "01_basic.py", line 14
SyntaxError: Non-ASCII character '\xd9' in file basic.py on line 14, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Run Code Online (Sandbox Code Playgroud)
以下是basic.py引发上述错误的代码。
from pylatex import Document, Section, Subsection, Command, Package
from pylatex.utils import italic, NoEscape
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
def fill_document(doc):
with doc.create(Section('?? ???????')):
doc.append('??? ?????? ?????? ????? ??? ?????')
doc.append(italic('????? ???????? ??? ???? ????'))
with doc.create(Subsection('??? ???????????')):
doc.append('?????? ????? ??????????: $&#{}')
if __name__ == '__main__':
# Basic document
doc = Document('basic')
fill_document(doc)
Run Code Online (Sandbox Code Playgroud)
然后我# -*- coding: utf-8 -*-在最顶部添加了一行并执行。有效。
# -*- coding: utf-8 -*-
from pylatex import Document, Section, Subsection, Command, Package
from pylatex.utils import italic, NoEscape
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
def fill_document(doc):
with doc.create(Section('?? ???????')):
doc.append('??? ?????? ?????? ????? ??? ?????')
doc.append(italic('????? ???????? ??? ???? ????'))
with doc.create(Subsection('??? ???????????')):
doc.append('?????? ????? ??????????: $&#{}')
if __name__ == '__main__':
# Basic document
doc = Document('basic')
fill_document(doc)
Run Code Online (Sandbox Code Playgroud)
谢谢。
| 归档时间: |
|
| 查看次数: |
313541 次 |
| 最近记录: |