计算文本文件中字母的频率

Question

计算文本文件中字母的频率

在python中,如何遍历文本文件并计算每个字母的出现次数？我意识到我可以使用'for x in file'语句来完成它然后设置26左右,如果elif语句,但肯定有更好的方法来做到这一点？

谢谢.

Answer 1

用途collections.Counter():

from collections import Counter
with open(file) as f:
    c = Counter()
    for x in f:
        c += Counter(x.strip())

Run Code Online (Sandbox Code Playgroud)

正如@mgilson指出的那样,如果文件不是那么大,你可以简单地做:

c = Counter(f.read().strip())

Run Code Online (Sandbox Code Playgroud)

例:

>>> c = Counter()
>>> c += Counter('aaabbbcccddd eee fff ggg')
>>> c
Counter({'a': 3, ' ': 3, 'c': 3, 'b': 3, 'e': 3, 'd': 3, 'g': 3, 'f': 3})
>>> c += Counter('aaabbbccc')
Counter({'a': 6, 'c': 6, 'b': 6, ' ': 3, 'e': 3, 'd': 3, 'g': 3, 'f': 3})

Run Code Online (Sandbox Code Playgroud)

或使用count()字符串的方法:

from string import ascii_lowercase     # ascii_lowercase =='abcdefghijklmnopqrstuvwxyz'
with open(file) as f:
    text = f.read().strip()
    dic = {}
    for x in ascii_lowercase:
        dic[x] = text.count(x)

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，10 月前
查看次数：	22269 次
最近记录：	12 年，4 月前