对列表进行排序并获得最常用的单词

Question

对列表进行排序并获得最常用的单词

use*_*211 2 python list frequency python-2.7

我是python的新手,正在尝试对列表进行排序并获得3个最常用的单词.我到目前为止:

from collections import Counter

reader = open("longtext.txt",'r')
data = reader.read()
reader.close()
words = data.split() # Into a list
uniqe = sorted(set(words)) # Remove duplicate words and sort
for word in uniqe:
        print '%s: %s' %(word, words.count(word) ) # words.count counts the words.

Run Code Online (Sandbox Code Playgroud)

这是我的输出,我如何对最频繁的单词进行排序并仅列出第一,第二和第三个常用单词？:

2: 2
3.: 1
3?: 1
New: 1
Python: 5
Read: 1
and: 1
between: 1
choosing: 1
or: 2
to: 1

Run Code Online (Sandbox Code Playgroud)

Answer 1

the*_*eye 6

你可以使用像这样collections.counter的most_common方法

from collections import Counter
with open("longtext.txt", "r") as reader:
    c = Counter(line.rstrip() for line in reader)
print c.most_common(3)

Run Code Online (Sandbox Code Playgroud)

从官方文档中引用示例,

>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]

Run Code Online (Sandbox Code Playgroud)

如果你想像问题中所示那样打印它们,你可以简单地迭代最常见的元素并像这样打印它们

for word, count in c.most_common(3):
    print "{}: {}".format(word, count)

Run Code Online (Sandbox Code Playgroud)

注意: Counter方法比排序方法更好,因为运行时Counter将在O(N)中,而排序在最坏的情况下需要O(N*log N).

归档时间：	11 年，1 月前
查看次数：	3895 次
最近记录：	11 年，1 月前