在 Python 中使用堆的前 K 个常用词

Viv*_*ian 8 python heap

我正在尝试在 O(N log K) 时间内解决Top K 常用词 Leetcode 问题,但得到了不希望的结果。我的 Python3 代码和控制台输出如下:

from collections import Counter
import heapq

class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        
        counts = Counter(words)
        print('Word counts:', counts)
        
        result = []
        for word in counts:
            print('Word being added:', word)
            if len(result) < k:
                heapq.heappush(result, (-counts[word], word))
                print(result)
            else:
                heapq.heappushpop(result, (-counts[word], word))
        result = [r[1] for r in result]
        
        return result

----------- Console output -----------

Word counts: Counter({'the': 3, 'is': 3, 'sunny': 2, 'day': 1})
Word being added: the
[(-3, 'the')]
Word being added: day
[(-3, 'the'), (-1, 'day')]
Word being added: is
[(-3, 'is'), (-1, 'day'), (-3, 'the')]
Word being added: sunny
[(-3, 'is'), (-2, 'sunny'), (-3, 'the'), (-1, 'day')]
Run Code Online (Sandbox Code Playgroud)

当我使用 运行测试用例["the", "day", "is", "sunny", "the", "the", "sunny", "is", "is"]K = 4,我发现单词the被移动到列表的末尾(之后day),is即使它们的计数都是 3。这是有道理的,因为父级只需要 <= 子级孩子们没有以任何方式订购。由于(-2, 'sunny')(-3, 'the')都是 > (-3, 'is'),实际上,即使(-3, 'the')<(-2, 'sunny')和 是 的右孩子,堆不变量仍然保持不变(-3, 'is')。预期的结果是["is","the","sunny","day"]我的代码的输出是["is","sunny","the","day"].

我应该使用堆在 O(N log K) 时间内解决这个问题,如果是这样,我该如何修改我的代码以达到预期的结果?

sha*_*678 7

您在 using 的正确轨道上heapqCounter您只需要稍微修改与 k 相关的使用它们的方式:(您需要在向 中添加任何内容之前迭代整个计数result):

from collections import Counter
import heapq

class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        counts = collections.Counter(words)
        max_heap = []
        for key, val in counts.items():
            heapq.heappush(max_heap, (-val, key))
        
        result = []
        while k > 0:
            result.append(heapq.heappop(max_heap)[1])
            k -= 1
        
        return result
Run Code Online (Sandbox Code Playgroud)

之前没有阅读 O(N log k) 的要求,这里是对上述解决方案的修改以实现:

from collections import Counter, deque
import heapq

class WordWithFrequency(object):
    def __init__(self, word, frequency):
        self.word = word
        self.frequency = frequency

    def __lt__(self, other):
        if self.frequency == other.frequency:
            return lt(other.word, self.word)
        else:
            return lt(self.frequency, other.frequency)

class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:    
        counts = collections.Counter(words)
        
        max_heap = []
        for key, val in counts.items():
            heapq.heappush(max_heap, WordWithFrequency(key, val))
            if len(max_heap) > k:
                heapq.heappop(max_heap)
        
        result = deque([]) # can also use a list and just reverse at the end
        while k > 0:
            result.appendleft(heapq.heappop(max_heap).word)
            k -= 1
        
        return list(result)
Run Code Online (Sandbox Code Playgroud)