如何将表示UTF-8字符的int转换为Unicode代码点？

Question

我有一个代表其UTF-8编码形式的int:

my_int = 0xC484
# Decimal: `50308`
# Binary: `0b1100010010000100`

如果使用unichr我得到的功能:\uC484或?(U + C484)

但是,我需要输出: ?

如何转换my_int为Unicode代码点？

Answer 1

要将整数转换0xC484为字节串\'\\xc4\\x84\'（Unicode 字符的 UTF-8 表示形式\xc4\x84），您可以使用struct.pack()：

\n\n

>>> import struct\n>>> struct.pack(">H", 0xC484)\n\'\\xc4\\x84\'\n

...其中>格式string表示big-endian，并H表示unsigned Short int。

\n\n

获得 UTF-8 字节串后，您可以像往常一样将其解码为 Unicode：

\n\n

>>> struct.pack(">H", 0xC484).decode("utf8")\nu\'\\u0104\'\n\n>>> print struct.pack(">H", 0xC484).decode("utf8")\n\xc4\x84\n