如何计算Python字符串中每个字符的数量?

Lon*_*ner 0 python unicode python-3.x

我编写了这个Python程序来计算Python字符串中每个字符的数量.

def count_chars(s):
    counts = [0] * 65536
    for c in s:
        counts[ord(c)] += 1
    return counts

def print_counts(counts):
    for i, n in enumerate(counts):
        if n > 0:
            print(chr(i), '-', n)

if __name__ == '__main__':
    print_counts(count_chars('hello, world \u2615'))
Run Code Online (Sandbox Code Playgroud)

输出:

  - 2
, - 1
d - 1
e - 1
h - 1
l - 3
o - 2
r - 1
w - 1
? - 1
Run Code Online (Sandbox Code Playgroud)

该程序是否可以计算任何Unicode字符出现次数?如果没有,可以采取哪些措施来确保每个可能的Unicode字符都得到处理?

Mar*_*ers 7

您的代码只处理Basic Multilingual Plane中的字符; 例如,表情符号将不会被处理.您可以通过使用字典而不是具有固定索引数的列表来解决这个问题,并将字符用作键.

但是,您应该只使用一个collections.Counter()对象:

from collections import Counter

counts = Counter(s)

for character, count in counts.most_common():
    print(character, '-', count)
Run Code Online (Sandbox Code Playgroud)

毕竟,它仅适用于此类用例.

演示:

>>> from collections import Counter
>>> s = 'hello, world \u2615 \U0001F60A'
>>> counts = Counter(s)
>>> for character, count in counts.most_common():
...     print(character, '-', count)
...
  - 3
l - 3
o - 2
r - 1
w - 1
e - 1
h - 1
d - 1
? - 1
, - 1
 - 1
Run Code Online (Sandbox Code Playgroud)