为什么这个 python 优先级队列无法堆化?

Sam*_*ber 3 python heapq

为什么这个优先级队列无法堆化?其中 (150, 200, 200) 是分配给字典的优先级值

import heapq

priority_q = [
    (150, {'intel-labels': {'timestamp': 150}}), 
    (200, {'intel-labels': {'timestamp': 200}}), 
    (200, {'intel-labels': {'timestamp': 200, 'xx': 'xx'}})
]

heapq.heapify(priority_q)

print( heapq.nlargest(2, priority_q))
Run Code Online (Sandbox Code Playgroud)

例外情况:

Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   TypeError: '<' not supported between instances of 'dict' and 'dict'
Run Code Online (Sandbox Code Playgroud)

然而,下面的工作原理..

priority_q = [
    (150, {'intel-labels': {'timestamp': 150}}), 
    (200, {'intel-labels': {'timestamp': 200}}), 
    (201, {'intel-labels': {'timestamp': 200, 'xx': 'xx'}})
]

heapq.heapify(priority_q)
Run Code Online (Sandbox Code Playgroud)

为什么是这样?

Sor*_*ary 5

heap.heapify正在比较项目。您的项目是元组。

所以你需要知道如何比较元组。从元素角度来看,Python从元组中的第一项开始比较它们,如果可以得出结果,则停止并返回结果,否则(第一项相等)它会检查两个元组中的第二项,很快。

看看这个:

t1 = (8, {"a": 10})
t2 = (10, {"b": 20})

print(t1 < t2)  # True
Run Code Online (Sandbox Code Playgroud)

这是真的,因为8小于10,仅此而已。无需进一步检查。

这个怎么样?

t1 = (10, {"a": 10})
t2 = (10, {"b": 20})

print(t1 < t2)
Run Code Online (Sandbox Code Playgroud)

输出:

Traceback (most recent call last):
  File "", line 4, in <module>
    print(t1 < t2)
          ^^^^^^^
TypeError: '<' not supported between instances of 'dict' and 'dict'
Run Code Online (Sandbox Code Playgroud)

现在是一个错误。dict对象不能用作<or的操作数>

>>> {"a": 10} < {"b": 20}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'dict' and 'dict'
Run Code Online (Sandbox Code Playgroud)

解决方案:
不幸的是,您无法传递排序函数作为key参数,例如sorted()指定您希望如何进行比较,但是,您可以将项目包装在另一个自定义对象中,并__lt__基于元组的第一项来实现。您可以手动执行此操作或使用@dataclass. 文档中特别提到了这一点:

https://docs.python.org/3/library/heapq.html#priority-queue-implementation-notes

这是一个简单的自定义包装器:

import heapq
from typing import Any


class PriorityItem:
    __slots__ = ("priority", "count", "item")
    counter = 0

    def __init__(self, priority: int, item: Any) -> None:
        self.priority = priority
        self.item = item
        self.count = PriorityItem.counter
        PriorityItem.counter += 1

    def __lt__(self, item: object) -> bool:
        if not isinstance(item, PriorityItem):
            return NotImplemented
        return (self.priority, self.count) < (item.priority, item.count)

    def __repr__(self) -> str:
        return f"PriorityItem(priority={self.priority}, item={self.item})"


priority_q = [
    (150, {"intel-labels": {"timestamp": 150}}),
    (200, {"intel-labels": {"timestamp": 200}}),
    (200, {"intel-labels": {"timestamp": 200, "xx": "xx"}}),
]

new_priority_q = [PriorityItem(*i) for i in priority_q]
heapq.heapify(new_priority_q)
print(heapq.nlargest(2, new_priority_q))
Run Code Online (Sandbox Code Playgroud)