Python:在文本中查找单词列表的最佳/有效方法？

Question

我有一个大约300个单词的列表和大量的文本,我想扫描,知道每个单词出现的次数.

我正在使用python中的re模块:

for word in list_word:
    search = re.compile(r"""(\s|,)(%s).?(\s|,|\.|\))""" % word)
    occurrences = search.subn("", text)[1]

但我想知道是否有更高效或更优雅的方式来做到这一点？

Answer 1

如果你有大量的文本,我不会在这种情况下使用正则表达式,而只是分割文本:

words = {"this": 0, "that": 0}
for w in text.split():
  if w in words:
    words[w] += 1

单词会给你每个单词的频率