Lon*_*ner 0 python unicode python-3.x
我编写了这个Python程序来计算Python字符串中每个字符的数量.
def count_chars(s):
counts = [0] * 65536
for c in s:
counts[ord(c)] += 1
return counts
def print_counts(counts):
for i, n in enumerate(counts):
if n > 0:
print(chr(i), '-', n)
if __name__ == '__main__':
print_counts(count_chars('hello, world \u2615'))
Run Code Online (Sandbox Code Playgroud)
输出:
- 2
, - 1
d - 1
e - 1
h - 1
l - 3
o - 2
r - 1
w - 1
? - 1
Run Code Online (Sandbox Code Playgroud)
该程序是否可以计算任何Unicode字符出现次数?如果没有,可以采取哪些措施来确保每个可能的Unicode字符都得到处理?
您的代码只处理Basic Multilingual Plane中的字符; 例如,表情符号将不会被处理.您可以通过使用字典而不是具有固定索引数的列表来解决这个问题,并将字符用作键.
但是,您应该只使用一个collections.Counter()对象:
from collections import Counter
counts = Counter(s)
for character, count in counts.most_common():
print(character, '-', count)
Run Code Online (Sandbox Code Playgroud)
毕竟,它仅适用于此类用例.
演示:
>>> from collections import Counter
>>> s = 'hello, world \u2615 \U0001F60A'
>>> counts = Counter(s)
>>> for character, count in counts.most_common():
... print(character, '-', count)
...
- 3
l - 3
o - 2
r - 1
w - 1
e - 1
h - 1
d - 1
? - 1
, - 1
- 1
Run Code Online (Sandbox Code Playgroud)