我试图使用python计算文本文件中的单词频率.
我使用以下代码:
openfile=open("total data", "r")
linecount=0
for line in openfile:
if line.strip():
linecount+=1
count={}
while linecount>0:
line=openfile.readline().split()
for word in line:
if word in count:
count[word]+=1
else:
count[word]=1
linecount-=1
print count
Run Code Online (Sandbox Code Playgroud)
但我得到一本空字典."print count"给出{}作为输出
我也试过用:
from collections import defaultdict
.
.
count=defaultdict(int)
.
.
if word in count:
count[word]=count.get(word,0)+1
Run Code Online (Sandbox Code Playgroud)
但我又得到了一本空字典.我不明白我做错了什么.有人可以指出吗?
此循环for line in openfile:将文件指针移动到文件的末尾.因此,如果您想再次读取数据,请将指针(openfile.seek(0))移动到文件的开头或重新打开文件.
为了更好地使用单词频率Collections.Counter:
from collections import Counter
with open("total data", "r") as openfile:
c = Counter()
for line in openfile:
words = line.split()
c.update(words)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1111 次 |
| 最近记录: |