如何将非ascii字符打印为\ uXXXX

Ala*_*ACK 1 python unicode non-ascii-characters python-3.x python-3.4

# what I currently have

print('??')

# ??
Run Code Online (Sandbox Code Playgroud)
# this is what I want

print('??')

# \uXXXX \uXXXX
Run Code Online (Sandbox Code Playgroud)

我该怎么做呢?我想将字符串中的所有非ascii字符打印为unicode escape literals

Mar*_*ers 9

您可以使用以下ascii()函数将字符串转换为调试表示形式,并将非ASCII,不可打印的字符转换为转义序列:

repr(),由返回的字符串中返回一个包含对象的可打印表示一个字符串,但逃避非ASCII字符repr()使用\x,\u\U逃逸.

对于U + 0100-U + FFFF范围内的Unicode代码点,它使用\uhhhh转义; 对于Latin-1范围(U + 007F-U + 00FF)\xhh,使用转义.请注意,输出有资格作为有效的Python语法来重新创建字符串,因此包括引号:

>>> print('??')
??
>>> print(ascii('??'))
'\u4f60\u597d'
>>> print(ascii('ASCII is not changed, Latin-1 (åéîøü) is, as are all higher codepoints, such as ??'))
'ASCII is not changed, Latin-1 (\xe5\xe9\xee\xf8\xfc) is, as are all higher codepoints, such as \u4f60\u597d'
Run Code Online (Sandbox Code Playgroud)

如果你必须拥有\uhhhh一切,你必须自己进行转换:

import re

def escape_unicode(t, _p=re.compile(r'[\u0080-\U0010ffff]')):
    def escape(match):
        char = ord(match.group())
        return '\\u{:04x}'.format(char) if char < 0x10000 else '\\U{:08x}'.format(char)
    return _p.sub(escape, t)
Run Code Online (Sandbox Code Playgroud)

以上功能并没有像添加引号ascii()功能的作用:

>>> print(escape_unicode('??'))
\u4f60\u597d
>>> print(escape_unicode('ASCII is not changed, Latin-1 (åéîøü) is, as are all higher codepoints, such as ??'))
ASCII is not changed, Latin-1 (\u00e5\u00e9\u00ee\u00f8\u00fc) is, as are all higher codepoints, such as \u4f60\u597d
Run Code Online (Sandbox Code Playgroud)

  • @Alan:不是用`ascii()`函数.你必须手动完成这个.另请注意,对于BMP之外的任何内容(U + FFFF上的代码点),您必须在Python中使用`\ Uhhhhhhhh`符号.你想解决什么问题? (3认同)