如何从 Python2.x 中的 unicode 字符串中删除转义字符（转义 unicode 字符）？

Question

如何从 Python2.x 中的 unicode 字符串中删除转义字符（转义 unicode 字符）？

>>> test
u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old\xe2"'
>>> test2
'"Hello," he\\u200b said\\u200f\\u200e.\n\t"I\\u200b am\\u200b nine years old"'
>>> print test
"Hello," he said??.
        "I am nine years oldâ"
>>> print test2
"Hello," he\u200b said\u200f\u200e.
        "I\u200b am\u200b nine years old"

Run Code Online (Sandbox Code Playgroud)

那么我将如何从 test2 转换为 test（即打印 unicode 字符）？.decode('utf-8')不这样做。

Answer 1

fal*_*tru 5

您可以使用unicode-escape编码解码'\\u200b'为u'\u200b'.

>>> test1 = u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old\xe2"'
>>> test2 = '"Hello," he\\u200b said\\u200f\\u200e.\n\t"I\\u200b am\\u200b nine years old"'
>>> test2.decode('unicode-escape')
u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old"'
>>> print test2.decode('unicode-escape')
"Hello," he? said??.
    "I? am? nine years old"

Run Code Online (Sandbox Code Playgroud)

注意：但即使这样，test2也无法解码以完全匹配，test1因为在结束引号 ( )之前有一个u'\xe2'in 。test1"

>>> test1 == test2.decode('unicode-escape')
False
>>> test1.replace(u'\xe2', '') == test2.decode('unicode-escape')
True

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，6 月前
查看次数：	2029 次
最近记录：	8 年，6 月前