ipython和python以不同的方式处理我的字符串,为什么？

Question

ipython和python以不同的方式处理我的字符串,为什么？

wim*_*wim 4 python string unicode encoding ipython

在python(2.7.1)中:

>>> x = u'$€%'
>>> x.find('%')
2
>>> len(x)
3

Run Code Online (Sandbox Code Playgroud)

而在ipython中:

>>> x = u'$€%'
>>> x.find('%')
4
>>> len(x)
5

Run Code Online (Sandbox Code Playgroud)

这里发生了什么？

编辑:包括以下评论中要求的其他信息

IPython中

>>> import sys, locale
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding(locale.getdefaultlocale()[1])
>>> sys.getdefaultencoding()
'UTF8'
>>> x = u'$€%'
>>> x
u'$\xe2\x82\xac%'
>>> print x
$â¬%
>>> len(x)
5

Run Code Online (Sandbox Code Playgroud)

蟒蛇

>>> import sys, locale
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding(locale.getdefaultlocale()[1])
>>> sys.getdefaultencoding()
'UTF8'
>>> x = u'$€%'
>>> x
u'$\u20ac%'
>>> print x
$€%
>>> len(x)
3

Run Code Online (Sandbox Code Playgroud)

Answer 1

min*_*nrk 5

@ nye17这是一个好主意永远不会打电话setdefaultencoding()(由于某种原因,它在首次使用后从sys中删除).一个常见的罪魁祸首是gtk,这会导致各种问题,所以如果IPython导入gtk,sys.getdefaultencoding()将返回utf8.IPython不设置默认编码本身.

@wim我可以问你正在使用什么版本的IPython？0.11中的部分主要大修是修复了许多unicode错误,但更多的突然出现(现在主要在Windows上).

我在IPython 0.11中运行了你的测试用例,并且IPython和Python的行为看起来是一样的,所以我认为这个bug是固定的.

相关值:

sys.stdin.encoding = utf8
sys.getdefaultencoding()= ascii
测试的平台:Ubuntu 10.04 + Python2.6.5,OSX 10.7 + Python2.7.1

至于解释,基本上IPython没有认识到输入可能是unicode.在IPython 0.10中,没有遵守多字节utf8输入,因此每个字节= 1个字符,您可以看到:

In [1]: x = '$€%'

In [2]: x
Out[2]: '$\xe2\x82\xac%'

In [3]: y = u'$€%'

In [4]: y
Out[4]: u'$\xe2\x82\xac%'# wrong!

Run Code Online (Sandbox Code Playgroud)

然而,应该发生什么,以及0.11中发生了什么,是y == x.decode(sys.stdin.encoding),不是repr(y) == 'u'+repr(x).

归档时间：	14 年，4 月前
查看次数：	3906 次
最近记录：	10 年，2 月前