我想计算文件中的特定单词.
例如,"apple"出现在文件中的次数.我试过这个:
#!/usr/bin/env python
import re
logfile = open("log_file", "r")
wordcount={}
for word in logfile.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for k,v in wordcount.items():
print k, v
Run Code Online (Sandbox Code Playgroud)
将'word'替换为'apple',但它仍会计算我文件中所有可能的单词.
任何建议将不胜感激.:)
Eug*_*ash 13
您可以使用,str.count()因为您只关心单个单词的出现:
with open("log_file") as f:
contents = f.read()
count = contents.count("apple")
Run Code Online (Sandbox Code Playgroud)
但是,为了避免一些极端情况,例如错误地计算单词"applejack",我建议你使用正则表达式:
import re
with open("log_file") as f:
contents = f.read()
count = sum(1 for match in re.finditer(r"\bapple\b", contents))
Run Code Online (Sandbox Code Playgroud)
\b在正则表达式中确保模式在单词边界上开始和结束(而不是在较长字符串中的子字符串).
如果您只关心一个单词,那么您不需要创建一个字典来跟踪每个字数.您可以逐行遍历文件并查找您感兴趣的单词的出现位置.
#!/usr/bin/env python
logfile = open("log_file", "r")
wordcount=0
my_word="apple"
for line in logfile:
if my_word in line.split():
wordcount += 1
print my_word, wordcount
Run Code Online (Sandbox Code Playgroud)
但是,如果您还要计算所有单词,并且只打印您感兴趣的单词的单词计数,那么对代码的这些微小更改应该有效:
#!/usr/bin/env python
import re
logfile = open("log_file", "r")
wordcount={}
for word in logfile.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
# print only the count for my_word instead of iterating over entire dictionary
my_word="apple"
print my_word, wordcount[my_word]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
21436 次 |
| 最近记录: |