查找列表中最受欢迎的单词

Mac*_*rko 9 python string words list

我有一个单词列表:

words = ['all', 'awesome', 'all', 'yeah', 'bye', 'all', 'yeah']
Run Code Online (Sandbox Code Playgroud)

我想得到一个元组列表:

[(3, 'all'), (2, 'yeah'), (1, 'bye'), (1, 'awesome')]
Run Code Online (Sandbox Code Playgroud)

每个元组是......

(number_of_occurrences, word)
Run Code Online (Sandbox Code Playgroud)

列表应按出现次数排序.

到目前为止我做了什么:

def popularWords(words):
    dic = {}
    for word in words:
        dic.setdefault(word, 0)
        dic[word] += 1
    wordsList = [(dic.get(w), w) for w in dic]
    wordsList.sort(reverse = True)
    return wordsList
Run Code Online (Sandbox Code Playgroud)

问题是...

它是Pythonic,优雅而高效吗?你能做得更好吗?提前致谢.

Sig*_*gyF 11

你可以使用计数器.

import collections
words = ['all', 'awesome', 'all', 'yeah', 'bye', 'all', 'yeah']
counter = collections.Counter(words)
print(counter.most_common())
>>> [('all', 3), ('yeah', 2), ('bye', 1), ('awesome', 1)]
Run Code Online (Sandbox Code Playgroud)

它为元组提供了反向列.

来自评论:collections.counter> = 2.7,3.1.您可以将计数器配方用于较低版本.


Tri*_*ych 6

您正在寻找defaultdict集合:

from collections import defaultdict

D = defaultdict(int)
for word in words:
    D[word] += 1
Run Code Online (Sandbox Code Playgroud)

这给你一个词典,其中键是单词,值是频率.要获得你的(频率,单词)元组:

tuples = [(freq, word) for word,freq in D.iteritems()]
Run Code Online (Sandbox Code Playgroud)

如果使用Python 2.7 +/3.1 +,您可以使用内置Counter类完成第一步:

from collections import Counter
D = Counter(words)
Run Code Online (Sandbox Code Playgroud)