Python - 单词出现次数

Question

Python - 单词出现次数

我正在尝试执行一个函数,允许查找文本中(整个)单词(不区分大小写)的出现次数.

示例:

>>> text = """Antoine is my name and I like python.
Oh ! your name is antoine? And you like Python!
Yes is is true, I like PYTHON
and his name__ is John O'connor"""

assert( 2 == Occs("Antoine", text) )
assert( 2 == Occs("ANTOINE", text) )
assert( 0 == Occs("antoin", text) )
assert( 1 == Occs("true", text) )    
assert( 0 == Occs("connor", text) )
assert( 1 == Occs("you like Python", text) )
assert( 1 == Occs("Name", text) )

Run Code Online (Sandbox Code Playgroud)

这是一个基本的尝试:

def Occs(word,text):
    return text.lower().count(word.lower())

Run Code Online (Sandbox Code Playgroud)

这个不起作用,因为它不是基于单词.
这个功能必须快,文字可以很大.

我应该将它拆分成阵列吗？
有没有简单的方法来做这个功能？

编辑(python 2.3.4)

Answer 1

Fre*_*Foo 7

from collections import Counter
import re

Counter(re.findall(r"\w+", text))

Run Code Online (Sandbox Code Playgroud)

或者,对于不区分大小写的版本

Counter(w.lower() for w in re.findall(r"\w+", text))

Run Code Online (Sandbox Code Playgroud)

在Python <2.7中,使用defaultdict而不是Counter:

freq = defaultdict(int)
for w in re.findall(r"\w+", text):
    freq[w.lower()] += 1

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年，5 月前
查看次数：	5447 次
最近记录：	12 年，4 月前