如何计算python中dict中前10个最常见的值

mo_*_*hun 8 python csv counter python-2.7

我是python和编程的新手,所以请善待.我正在尝试分析带有音乐信息的csv文件,并返回收听最多的前n个乐队.从下面的代码中,每首歌曲是一个列表中的dict条目,格式如下:

[{'album': 'Exile on Main Street', 'song': 'Happy', 'datetime': '3 Dec 2014 14:08', 'artist': 'The Rolling Stones'}, {'album': 'II', 'song': 'Black Dog', 'datetime': '1 Dec 2014 08:08', 'artist': 'Led Zepplin'}]

from collections import Counter

def count_artist_plays(filename):
    with open(filename, 'r') as data:
        header = data.readline().strip().split(',')

        entries = []
        for line in data:
            entry = line.strip().split(',')
            listens = {}
            for info, type in enumerate(header):
                listens[type] = entry[info]

            entries.append(listens)

    for d in entries:
        arts = d['artist']
        c = Counter(arts)
        print c.most_common(10)
Run Code Online (Sandbox Code Playgroud)

我如何得到最常见的字符串(波段)而不是我在下面得到的字符细分?

[('s', 2), ('a', 1), (' ', 1), ('E', 1), ('l', 1), ('o', 1), ('n', 1), ('S', 1), ('v', 1), ('y', 1)]
Run Code Online (Sandbox Code Playgroud)

unu*_*tbu 13

初始化计数器一次,让成为艺术家,并在每次循环时增加一个键(艺术家):

c = Counter()
for d in entries:
    arts = d['artist']
    c[arts] += 1
print(c.most_common(10))
Run Code Online (Sandbox Code Playgroud)

何时arts是字符串,然后c = Counter(arts)计算以下字符arts:

In [522]: collections.Counter('Led Zepplin')
Out[522]: Counter({'e': 2, 'p': 2, ' ': 1, 'd': 1, 'i': 1, 'L': 1, 'l': 1, 'n': 1, 'Z': 1})
Run Code Online (Sandbox Code Playgroud)

相反:

In [523]: c = collections.Counter()

In [524]: c['Led Zepplin'] += 1

In [525]: c['The Rolling Stones'] += 1

In [526]: c.most_common()
Out[526]: [('Led Zepplin', 1), ('The Rolling Stones', 1)]
Run Code Online (Sandbox Code Playgroud)

或者,正如Jon Clements指出的那样,建立一个所有艺术家的列表,然后计算列表:

c = Counter(d['artist'] for d in entries)
print(c.most_common(10))
Run Code Online (Sandbox Code Playgroud)

请注意,上面使用生成器表达式来避免构建(可能)大型临时列表,同时具有更简洁,可读的语法.

  • 或者只是`Counter(el ['artist'] for el in entries).most_common(10)` (4认同)