为什么堆非常适合合并排序流？

Question

为什么堆非常适合合并排序流？

tes*_*123 1 algorithm heap data-structures

我正在阅读一些算法书籍并遇到这一行，

Heaps are well suited for algorithms that merge sorted data streams.

Run Code Online (Sandbox Code Playgroud)

没有给出任何解释为什么会出现这种情况。有人可以帮我解释一下这是为什么吗？

Answer 1

bla*_*azs 5

如果您只有两个数据流，那么您实际上不需要堆，如下算法所示：

Let s1 ans s2 be the two streams

while s1.hasData() and s2.hasData()
    if s1.peek() < s2.peek(): datum = s1.pop()
    else: datum = s2.pop()
    s.push(datum)
if either is non-empty (only one is), add the rest of its content to s

Run Code Online (Sandbox Code Playgroud)

正如 Henk Holterman 所指出的，如果您有k>2流，那么您可以通过堆实现合并（本质上，堆会做出现在复杂的决定，决定在当前步骤中使用哪个流）：

let H be a (min  or max, depending on your needs) heap
let s1, s2, ..., sk be sorted streams

// fill the heap with the first elements from the streams (e.g. min/max elements from each stream, depending on how they're sorted)
for i=1 to k:
    H.add((i, si.pop())) // we need to know which stream the element came from

let s be the initially-empty data stream which will contain the merged content in sorted order
// H.empty() will indicate that all streams are empty
while not H.empty():
    // take the min/max element of the min/max elements of each stream (*the* min/max element)
    (i, datum) = H.extract()
    // add it to s
    s.push(datum)
    // we know the datum came from s[i]; thus we need to push the next element from the i-th stream into heap as it may contain the next min/max element (that is, if s[i] isn't empty)
    if not s[i].empty():
        // we'll assume the heap sorts based on the second component of the pair
        H.add((i, s[i].pop())
// s is the sorted stream containing elements from s1,s2,...,sk

Run Code Online (Sandbox Code Playgroud)

其运行时间为O((|s1|+...+|sk|) * log k)，其中|si|表示流中的元素数量si。

关键思想是，在循环的每次迭代中，将, , ...,while中的最小/最大添加到。您可以使用堆来实现此目的，堆会告诉您当前哪个流包含最小/最大元素。并注意如何优雅地处理流为空的边缘情况。s[1].peek()s[2].peek()s[k].peek()s

例如，您可以使用平衡二叉搜索树，但后者将维护此任务实际上不需要的额外不变量。我相信，堆更高效（在内部它通常表示为一个简单的数组；请参阅[Wiki 页面](https://en.wikipedia.org/wiki/Binary_heap#Heap_implementation)）。 (2认同)

归档时间：	9 年，5 月前
查看次数：	605 次
最近记录：	9 年，4 月前