Ala*_*ACK 1 python unicode non-ascii-characters python-3.x python-3.4
# what I currently have
print('??')
# ??
Run Code Online (Sandbox Code Playgroud)
# this is what I want
print('??')
# \uXXXX \uXXXX
Run Code Online (Sandbox Code Playgroud)
我该怎么做呢?我想将字符串中的所有非ascii字符打印为unicode escape literals
您可以使用以下ascii()函数将字符串转换为调试表示形式,并将非ASCII,不可打印的字符转换为转义序列:
如
repr(),由返回的字符串中返回一个包含对象的可打印表示一个字符串,但逃避非ASCII字符repr()使用\x,\u或\U逃逸.
对于U + 0100-U + FFFF范围内的Unicode代码点,它使用\uhhhh转义; 对于Latin-1范围(U + 007F-U + 00FF)\xhh,使用转义.请注意,输出有资格作为有效的Python语法来重新创建字符串,因此包括引号:
>>> print('??')
??
>>> print(ascii('??'))
'\u4f60\u597d'
>>> print(ascii('ASCII is not changed, Latin-1 (åéîøü) is, as are all higher codepoints, such as ??'))
'ASCII is not changed, Latin-1 (\xe5\xe9\xee\xf8\xfc) is, as are all higher codepoints, such as \u4f60\u597d'
Run Code Online (Sandbox Code Playgroud)
如果你必须拥有\uhhhh一切,你必须自己进行转换:
import re
def escape_unicode(t, _p=re.compile(r'[\u0080-\U0010ffff]')):
def escape(match):
char = ord(match.group())
return '\\u{:04x}'.format(char) if char < 0x10000 else '\\U{:08x}'.format(char)
return _p.sub(escape, t)
Run Code Online (Sandbox Code Playgroud)
以上功能并没有像添加引号ascii()功能的作用:
>>> print(escape_unicode('??'))
\u4f60\u597d
>>> print(escape_unicode('ASCII is not changed, Latin-1 (åéîøü) is, as are all higher codepoints, such as ??'))
ASCII is not changed, Latin-1 (\u00e5\u00e9\u00ee\u00f8\u00fc) is, as are all higher codepoints, such as \u4f60\u597d
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1599 次 |
| 最近记录: |