Lar*_*ari 2 python dictionary count
我想知道如何从词典列表中获得最常用的单词。结构示例如下。
listDict = [{'longDescription': 'In the demo, a hip designer, a sharply-dressed marketer, and a smiling, relaxed developer sip lattes and calmly discuss how Flex is going to make customers happy and shorten the workday.'},
{'longDescription': 'In the demo, a hip designer, a sharply-dressed marketer'},
{'longDescription': 'Is going to make customers happy and shorten the workday.'},
{'longDescription': 'In the demo, a hip designer, a sharply-dressed marketer, and a smiling.'}]
Run Code Online (Sandbox Code Playgroud)
所需的结果在上面,按最常用的词排序:
[('word1', 7),
('word2', 7),
('word3', 3),
('word4', 3),
('word5', 3),
('word6', 2),
('word7', 2)]
Run Code Online (Sandbox Code Playgroud)
这是一种有趣的方法:您可以先使用来计数单个项目Counter,然后再对sum它们进行计数。
from collections import Counter
import re
counts = sum((Counter(filter(None, re.split('\W+', v.lower())))
for x in listDict for v in x.values()), Counter())
print(counts.most_common(5))
[('a', 8), ('and', 5), ('the', 5), ('marketer', 3), ('designer', 3)]
Run Code Online (Sandbox Code Playgroud)
正则表达式详细信息
\W+ # one or more characters that are not alphabets
Run Code Online (Sandbox Code Playgroud)
re.split根据正则表达式模式分割文本。filter将删除空字符串(这要归功于Ajax1234)。