在 Python 中将 unicode 代码点转换为 unicode 字符

Question

在 Python 中将 unicode 代码点转换为 unicode 字符

Car*_*ers 3 python unicode utf-8 python-3.x

我正在解析文本中的十六进制/Unicode 转义。

所以我会有一个输入字符串

\x{abcd}

Run Code Online (Sandbox Code Playgroud)

这很容易 - 我最终得到一个["ab", "cd"]我调用的数组digits并对其执行以下操作：

return bytes(int(d, 16) for d in digits).decode("utf-8")

Run Code Online (Sandbox Code Playgroud)

所以我基本上接受{}UTF-8 编码字符之间的所有内容并将其转换为字符。简单的。

>>> bytes(int(d, 16) for d in ["e1", "88", "92"]).decode("utf-8")
'?'

Run Code Online (Sandbox Code Playgroud)

但我想走另一条路：\u{1212}应该导致相同的字符。问题是，我不知道如何将结果["12", "12"]视为 unicode 代码点而不是 UTF-8 字节来获取 ? 又是性格。

我怎样才能在 python 3 中做到这一点？

Answer 1

Ble*_*der 5

您可以chr在将数字解析为 base-16 后使用：

>>> chr(int('1212', 16))
'?'
>>> '\u1212'
'?'

Run Code Online (Sandbox Code Playgroud)

如果您在某个字符串中全局替换它，使用re.sub替换函数可以使这变得简单：

import re

def replacer(match):
    if match.group(2) == 'u':
        return chr(int(match.group(3), 16))
    elif match.group(2) == 'x':
        return  # ...

re.sub(r'(\\(x|u)\{(.*?)\})', replacer, r'\x{abcd} foo \u{1212}')

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，9 月前
查看次数：	430 次
最近记录：	11 年，9 月前