相关疑难解决方法(0)

什么是Unicode,UTF-8,UTF-16？

什么是Unicode的基础以及为什么需要UTF-8或UTF-16？我在Google上研究了这个并在这里搜索过,但我不清楚.

在VSS进行文件比较时,有时会有消息说这两个文件有不同的UTF.为什么会这样呢？

请简单解释一下.

unicode encoding utf-8 utf-16

Sof*_*eek

2010 02-11

368
推荐指数

8
解决办法

28万
查看次数

从Unicode字符串中正确提取Emojis

我在Python 2中工作,我有一个包含emojis以及其他unicode字符的字符串.我需要将其转换为列表,其中列表中的每个条目都是单个字符/表情符号.

x = u'xyz'
char_list = [c for c in x]

Run Code Online (Sandbox Code Playgroud)

所需的输出是:

['', '', 'x', 'y', 'z', '', '']

Run Code Online (Sandbox Code Playgroud)

实际输出是:

[u'\ud83d', u'\ude18', u'\ud83d', u'\ude18', u'x', u'y', u'z', u'\ud83d', u'\ude0a', u'\ud83d', u'\ude0a']

Run Code Online (Sandbox Code Playgroud)

如何实现所需的输出？

python unicode python-2.x emoji

Aar*_*ron

2016 02-20

21
推荐指数

2
解决办法

5512
查看次数

Google App Engine使用Python 2.5.2,显然启用了UCS4.但GAE数据存储区在内部使用UTF-8.所以,如果你存储U '\ ud834\udd0c'(长2)到数据存储,当你找回它,你会得到 '\ U0001d10c'(长度为1).我试图计算字符串中unicode字符的数量,以便在存储它之前和之后给出相同的结果.因此,在收到字符串之前,我会尝试将字符串规范化(从u'\ ud834\udd0c'到'\ U0001d10c'),然后再计算其长度并将其放入数据存储区.我知道我可以将其编码为UTF-8然后再次解码,但是有更简单/有效的方法吗？

python unicode google-app-engine utf-16 utf-32

Tra*_*vis

lucky-day

8
推荐指数

1
解决办法

1646
查看次数

Python 2.7中特定于平台的Unicode语义

Ubuntu 11.10:

$ python
Python 2.7.2+ (default, Oct  4 2011, 20:03:08)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = u'\U0001f44d'
>>> len(x)
1
>>> ord(x[0])
128077

Run Code Online (Sandbox Code Playgroud)

Windows 7的:

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> x = u'\U0001f44d'
>>> len(x)
2
>>> ord(x[0])
55357

Run Code Online (Sandbox Code Playgroud)

我的Ubuntu体验是使用发行版中的默认解释器.对于Windows 7,我下载并安装了从python.org链接的推荐版本.我自己没有编译其中任何一个.

差异的本质对我来说很清楚.(在Ubuntu上,字符串是一系列代码点;在Windows 7上是一系列UTF-16代码单元.)我的问题是: