在python中查找文本文件中每个单词的频率

Question

在python中查找文本文件中每个单词的频率

我想找到文本文件中所有单词的频率，以便我可以从中找出最常出现的单词。有人可以帮助我使用为此目的的命令吗？

import nltk
text1 = "hello he heloo hello hi " // example text
 fdist1 = FreqDist(text1)

Run Code Online (Sandbox Code Playgroud)

我已经使用了上面的代码，但问题是它没有给出词频，而是显示每个字符的频率。我还想知道如何使用文本文件输入文本。

Answer 1

hei*_*nst 5

我看到您正在使用该示例并看到了与您所看到的相同的事情，为了使其正常工作，您必须用空格分隔字符串。如果你不这样做，它似乎会计算每个字符，这就是你所看到的。这将返回每个单词的正确计数，而不是字符。

import nltk

text1 = 'hello he heloo hello hi '
text1 = text1.split(' ')
fdist1 = nltk.FreqDist(text1)
print (fdist1.most_common(50))

Run Code Online (Sandbox Code Playgroud)

如果你想从文件中读取并获取字数，你可以这样做：

输入.txt

hello he heloo hello hi
my username is heinst
your username is frooty

Run Code Online (Sandbox Code Playgroud)

蟒蛇代码

import nltk

with open ("input.txt", "r") as myfile:
    data=myfile.read().replace('\n', ' ')

data = data.split(' ')
fdist1 = nltk.FreqDist(data)
print (fdist1.most_common(50))

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，2 月前
查看次数：	10313 次
最近记录：	6 年，1 月前