相关疑难解决方法(0)

Python 3中的Unicode字符串是否依赖于"窄"/"宽"构建？

从Python 2.2和PEP 261开始,Python可以以"窄"或"宽"模式构建,这会影响"字符"的定义,即"Python Unicode字符串的可寻址单元".

窄版本中的字符看起来像UTF-16代码单元:

>>> a = u'\N{MAHJONG TILE GREEN DRAGON}'
>>> a
u'\U0001f005'
>>> len(a)
2
>>> a[0], a[1]
(u'\ud83c', u'\udc05')
>>> [hex(ord(c)) for c in a.encode('utf-16be')]
['0xd8', '0x3c', '0xdc', '0x5']

Run Code Online (Sandbox Code Playgroud)

(上面似乎不同意一些来源,他们坚持认为窄版本使用的是UCS-2,而不是UTF-16.确实非常有趣)

Python 3.0是否保持这种区别？或者所有Python 3都构建广泛？

(我听说过PEP 393改变了3.3中字符串的内部表示,但这与3.0~3.2无关.)

python unicode python-3.x

Kos*_*Kos

2017 05-23

8
推荐指数

1
解决办法

1183
查看次数

为什么Mac OS X python与CentOS Linux python对字符串中的\ U转换有不同的解释？

两个python解释器会话.第一个来自CentOS上的python.第二个来自Mac OS X 10.7上的内置python.为什么第二个会话从\ U转义序列创建长度为2的字符串,然后错误输出？

$ python
Python 2.6.6 (r266:84292, Dec  7 2011, 20:48:22) 
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> u'\U00000020'
u' '
>>> u'\U00000065'
u'e'
>>> u'\U0000FFFF'
u'\uffff'
>>> u'\U00010000'
u'\U00010000'
>>> len(u'\U00010000')
1
>>> ord(u'\U00010000')
65536

Run Code Online (Sandbox Code Playgroud)

$ python
Python 2.6.7 (r267:88850, Jul 31 2011, 19:30:54) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
>>> u'\U00000020'
u' '
>>> u'\U00000065'
u'e'
>>> …

Run Code Online (Sandbox Code Playgroud)

python unicode macos centos

aud*_*ude

lucky-day

5
推荐指数

1
解决办法

862
查看次数

Python 2.7中特定于平台的Unicode语义

Ubuntu 11.10:

$ python
Python 2.7.2+ (default, Oct  4 2011, 20:03:08)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = u'\U0001f44d'
>>> len(x)
1
>>> ord(x[0])
128077

Run Code Online (Sandbox Code Playgroud)

Windows 7的:

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> x = u'\U0001f44d'
>>> len(x)
2
>>> ord(x[0])
55357

Run Code Online (Sandbox Code Playgroud)

我的Ubuntu体验是使用发行版中的默认解释器.对于Windows 7,我下载并安装了从python.org链接的推荐版本.我自己没有编译其中任何一个.

差异的本质对我来说很清楚.(在Ubuntu上,字符串是一系列代码点;在Windows 7上是一系列UTF-16代码单元.)我的问题是: