saz*_*azr 9 python lambda parsing word-frequency
我正在解析一长串文本并计算Python中每个单词出现的次数.我有一个功能,但我正在寻找建议是否有方法可以使它更有效(在速度方面)以及是否有甚至python库函数可以为我这样做所以我不是重新发明轮子?
您能否建议一种更有效的方法来计算长字符串中最常见的单词(通常在字符串中超过1000个单词)?
还有什么最好的方法来将字典排序成一个列表,其中第一个元素是最常见的单词,第二个元素是第二个最常见的单词等?
test = """abc def-ghi jkl abc
abc"""
def calculate_word_frequency(s):
# Post: return a list of words ordered from the most
# frequent to the least frequent
words = s.split()
freq = {}
for word in words:
if freq.has_key(word):
freq[word] += 1
else:
freq[word] = 1
return sort(freq)
def sort(d):
# Post: sort dictionary d into list of words ordered
# from highest freq to lowest freq
# eg: For {"the": 3, "a": 9, "abc": 2} should be
# sorted into the following list ["a","the","abc"]
#I have never used lambda's so I'm not sure this is correct
return d.sort(cmp = lambda x,y: cmp(d[x],d[y]))
print calculate_word_frequency(test)
Run Code Online (Sandbox Code Playgroud)
Bur*_*lid 26
>>> from collections import Counter
>>> test = 'abc def abc def zzz zzz'
>>> Counter(test.split()).most_common()
[('abc', 2), ('zzz', 2), ('def', 2)]
Run Code Online (Sandbox Code Playgroud)
>>>> test = """abc def-ghi jkl abc
abc"""
>>> from collections import Counter
>>> words = Counter()
>>> words.update(test.split()) # Update counter with words
>>> words.most_common() # Print list with most common to least common
[('abc', 3), ('jkl', 1), ('def-ghi', 1)]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
25690 次 |
| 最近记录: |