如何在python中维护堆中的字典?

giz*_*gok 11 python heap data-structures

我有一本字典如下:

{'abc':100,'xyz':200,'def':250 .............}
Run Code Online (Sandbox Code Playgroud)

它是一个字典,其中键作为实体的名称,值是该实体的计数.我需要从字典中返回前10个元素.

我可以写一个堆来做它,但我不确定如何为键映射做值,因为某些值是相等的.

有没有其他数据结构可以做到这一点?

Bak*_*riu 14

使用heapq你可能想做这样的事情:

heap = [(-value, key) for key,value in the_dict.items()]
largest = heapq.nsmallest(10, heap)
largest = [(key, -value) for value, key in largest]
Run Code Online (Sandbox Code Playgroud)

请注意,由于heapq只实现最小堆,因此最好反转值,以便更大的值变小.

对于小尺寸的堆,此解决方案将更慢,例如:

>>> import random
>>> import itertools as it
>>> def key_generator():
...     characters = [chr(random.randint(65, 90)) for x in range(100)]
...     for i in it.count():
...             yield ''.join(random.sample(characters, 3))
... 
>>> the_dict = dict((key, random.randint(-500, 500)) for key, _ in zip(key_generator(), range(3000)))
>>> def with_heapq(the_dict):
...     items = [(-value, key) for key, value in the_dict.items()]
...     smallest = heapq.nsmallest(10, items)
...     return [-value for value, key in smallest]
... 
>>> def with_sorted(the_dict):
...     return sorted(the_dict.items(), key=(lambda x: x[1]), reverse=True)[:10]
... 
>>> import timeit
>>> timeit.timeit('with_heapq(the_dict)', 'from __main__ import the_dict, with_heapq', number=1000)
0.9220538139343262
>>> timeit.timeit('with_sorted(the_dict)', 'from __main__ import the_dict, with_sorted', number=1000)
1.2792410850524902
Run Code Online (Sandbox Code Playgroud)

使用3000个值,它只比sorted版本略快,而O(nlogn)不是O(n + mlogn).如果我们将dict的大小增加到10000,则heapq版本变得更快:

>>> timeit.timeit('with_heapq(the_dict)', 'from __main__ import the_dict, with_heapq', number=1000)
2.436316967010498
>>> timeit.timeit('with_sorted(the_dict)', 'from __main__ import the_dict, with_sorted', number=1000)
3.585728168487549
Run Code Online (Sandbox Code Playgroud)

时间可能还取决于您运行的机器.您可能应该分析哪种解决方案在您的情况下最有效.如果效率不是很关键,我建议使用该sorted版本,因为它更简单.

  • +1,但是对heapify的调用是多余的,实际上,您只需要`heapq.nlargest(the_dict.items())`。 (2认同)