如何控制包含东亚字符的Unicode字符串的填充

Question

如何控制包含东亚字符的Unicode字符串的填充

kev*_*kev 7 python unicode string-formatting

我有三个UTF-8蜇伤:

hello, world
hello, ??
hello, ?rld

Run Code Online (Sandbox Code Playgroud)

我只想要前10个ascii-char-width,以便括号在一列中:

[hello, wor]
[hello, ? ]
[hello, ?r]

Run Code Online (Sandbox Code Playgroud)

在控制台中:

width('??')==width('worl')
width('? ')==width('wor')  #a white space behind '?'

Run Code Online (Sandbox Code Playgroud)

一个中文字符是三个字节,但在控制台中显示时只有2个ascii字符宽度:

>>> bytes("hello, ??", encoding='utf-8')
b'hello, \xe4\xb8\x96\xe7\x95\x8c'

Run Code Online (Sandbox Code Playgroud)

format()当UTF-8字符混入时,python 没有帮助

>>> for s in ['[{0:<{1}.{1}}]'.format(s, 10) for s in ['hello, world', 'hello, ??', 'hello, ?rld']]:
...    print(s)
...
[hello, wor]
[hello, ?? ]
[hello, ?rl]

Run Code Online (Sandbox Code Playgroud)

它不漂亮:

 -----------Songs-----------
|    1: ??                  |
|    2: ???                 |
|    3: ??????              |
|    4: ?????               |
|    5: ???(CUCURRUCUCU PALO|
|    6: ????                |
|    7: ??                  |
|    8: ????                |
|    9: ?????               |
|   10: ??( ?????????)(INTO |
| X 11: ????                |
| X 12: ????(THE MO RUN AIR |
| X 13: ????                |
| X 14: ??                  |
| X 15: ??????(SERENADE)    |
| X 16: ??????(Sweet Lullaby|
 ---------------------------

Run Code Online (Sandbox Code Playgroud)

所以,我想知道是否有一种标准的方法来做UTF-8填充工作人员？

Answer 1

Mar*_*nen 13

当尝试使用固定宽度字体的中文对齐ASCII文本时,有一组可打印ASCII字符的全宽版本.下面我制作了ASCII到全宽版本的转换表:

# coding: utf8

# full width versions (SPACE is non-contiguous with ! through ~)
SPACE = '\N{IDEOGRAPHIC SPACE}'
EXCLA = '\N{FULLWIDTH EXCLAMATION MARK}'
TILDE = '\N{FULLWIDTH TILDE}'

# strings of ASCII and full-width characters (same order)
west = ''.join(chr(i) for i in range(ord(' '),ord('~')))
east = SPACE + ''.join(chr(i) for i in range(ord(EXCLA),ord(TILDE)))

# build the translation table
full = str.maketrans(west,east)

data = '''\
??(A song)
???(Another song)
??????(Yet another song)
?????
???(Cucurrucucu palo whatever)
????
??
????
?????
?????????????(Into something)
????
????
????
??
??????(SERENADE)
??????(Sweet Lullaby)
'''

# Replace the ASCII characters with full width, and create a song list.
data = data.translate(full).rstrip().split('\n')

# translate each printable line.
print(' ----------Songs-----------'.translate(full))
for i,song in enumerate(data):
    line = '|{:4}: {:20.20}|'.format(i+1,song)
    print(line.translate(full))
print(' --------------------------'.translate(full))

Run Code Online (Sandbox Code Playgroud)

产量

???????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
????????????????????????????
???????????????????????????

Run Code Online (Sandbox Code Playgroud)

它不是太漂亮,但它排成一行.

Answer 2

her*_*h10 6

似乎没有官方支持，但内置软件包可能会有所帮助：

>>> import unicodedata
>>> print unicodedata.east_asian_width(u'?')

Run Code Online (Sandbox Code Playgroud)

返回值表示代码点的类别。具体来说，

W - 东亚宽
F - 东亚全宽（窄）
Na - 东亚窄
H - 东亚半宽（宽）
A - 东亚暧昧
N - 不是东亚人

This answer to a similar question提供了一个快速的解决方案。但是请注意，显示结果取决于所使用的确切等宽字体。ipython 和 pydev 使用的默认字体效果不佳，而 windows 控制台则可以。

Answer 3

Dav*_*one 5

看看厨房。我想它可能有你想要的。

归档时间：	14 年，9 月前
查看次数：	2986 次
最近记录：	12 年，6 月前