使用 utf-8 字符的 python str.format 超过 1 个位置

Question

使用 utf-8 字符的 python str.format 超过 1 个位置

Dan*_*Dan 2 python string-formatting cjk

我试图在 python 中打印日文字符，按列对齐。日语字符的宽度似乎等于两个空格，因此对齐不起作用。

这是代码：

def print_kanji(s, k):
    print('{:<20}{:<10}{:<10}{:<10}'
        .format(s, k['reading'][0], k['reading'][1], k['kanji']))

# Being 's' some input string and 'k' a map which contains readings in the 3 different japanese alphabets.

Run Code Online (Sandbox Code Playgroud)

我获得的输出如下：

decir               ??        ??        ??        

pequeño             ????      ????      ???       

niño                ???       ???       ??        

ya [ha hecho X]     ??        ??

Run Code Online (Sandbox Code Playgroud)

左侧的列是西班牙语，但这并不重要。重要的是右边的3列没有对齐。我已经计算了位置的数量并且它是正确的，即第一个日文列总是10个“位置”长，问题是日文字符是2个位置宽而空白只有1个。

我还检查了空白（使用日语输入）也有两个位置宽，因此我应该能够通过用日语替换“拉丁”空格（1 个位置宽度）来解决问题。

如何更改format用于对齐字符串的字符？

编辑

我发现它str.format有一个参数是fill. 我试图用日文空白（两个位置宽）替换它，结果更糟。

编辑 2

我已经通过实现这个功能解决了

def get_formatted_kanji(h, k, kn):
    h2 = h + str(' ' * (10 - 2*len(h)))
    k2 = k + str(' ' * (10 - 2*len(h)))
    kn2 = kn + str(' ' * (10 - 2*len(h)))
    return h2 + k2 + kn2

# being h, k and kn the three 'japanese strings' to be formatted in columns

Run Code Online (Sandbox Code Playgroud)

但是，是否有更好的（内置）方法来实现这一目标？

Answer 1

Die*_*Epp 5

在终端中，某些字符占据两列而其他字符占据一列是很常见的。您可以使用unicodedataPython 模块找出哪些字符是哪些字符，该模块具有east_asian_width().

以下是如何使用它来填充文本的示例：

import unicodedata
table = [
    ('decir', '??', '??', '??'), 
    ('pequeño', '????', '????', '???'), 
    ('niño', '???', '???', '??'), 
    ('ya [ha hecho X]', '??', '??', ''),
]

WIDTHS = {
    'F': 2,
    'H': 1,
    'W': 2,
    'N': 1,
    'A': 1, # Not really correct...
    'Na': 1,
}

def pad(text, width):
    text_width = 0
    for ch in text:
        width_class = unicodedata.east_asian_width(ch)
        text_width += WIDTHS[width_class]
    if width <= text_width:
        return text
    return text + ' ' * (width - text_width)

for s, reading1, reading2, kanji in table:
    print('{}{}{}{}'.format(
        pad(s, 20),
        pad(reading1, 10),
        pad(reading2, 10),
        pad(kanji, 10),
    ))

Run Code Online (Sandbox Code Playgroud)

这是它在我的系统 (macOS) 上的外观的屏幕截图：

限制

上面的代码不处理 Unicode 组合字符。更完整的实现将执行 Unicode 文本分割，然后计算出每个字素簇的宽度。我敢肯定，有些图书馆可以为您做到这一点。

语言说明

作为说明，我不认为“???”这个词和“pequeño”可能是等价的。西班牙语单词“pequeño”指的是某物的大小，“???” 指数量。

我认为更有可能的是

波哥：？？？
佩克尼奥：？？？

归档时间：	4 年，7 月前
查看次数：	67 次
最近记录：	4 年，7 月前