计算txt文件中最常用的单词

Question

计算txt文件中最常用的单词

我正在尝试获取 txt 文件中 10 个最常用单词的列表，最终目标是构建词云。当我打印时，以下代码不会产生任何结果。

>>> import collections
>>> from collections import Counter
>>> file = open('/Users/Desktop/word_cloud/98-0.txt')
>>> wordcount={}
>>> d = collections.Counter(wordcount)
>>> for word, count in d.most_common(10):
    print(word, ": ", count)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Iza*_*gen 7

实际上，我建议您继续使用Counter. 这是一个非常有用的工具，用于计数事物，但它的语法非常具有表现力，因此您无需担心sort任何事情。使用它，您可以：

from collections import Counter

#opens the file. the with statement here will automatically close it afterwards.
with open("input.txt") as input_file:
    #build a counter from each word in the file
    count = Counter(word for line in input_file
                         for word in line.split())

print(count.most_common(10))

Run Code Online (Sandbox Code Playgroud)

使用 my input.txt，它的输出为

[('THE', 27643), ('AND', 26728), ('I', 20681), ('TO', 19198), ('OF', 18173), ('A', 14613), ('YOU', 13649), ('MY', 12480), ('THAT', 11121), ('IN', 10967)]

Run Code Online (Sandbox Code Playgroud)

我对其进行了一些更改，因此不必将整个文件读入内存。我input.txt是莎士比亚作品的无标点版本，以证明这段代码很快。在我的机器上大约需要 0.2 秒。

您的代码有点随意 - 看起来您已经尝试将几种方法结合在一起，并在各处保留每种方法的位。我的代码已经注释了一些解释性的功能。希望它应该相对简单，但是如果您仍然对任何事情感到困惑，请告诉我。

归档时间：	8 年，5 月前
查看次数：	7988 次
最近记录：	8 年，5 月前