为什么二进制堆的实现比Python的stdlib慢?

chi*_*lNZ 1 python binary-heap heapsort

我一直在实现自己的堆模块,以帮助我理解堆数据结构.我理解它们是如何工作和管理的,但我的实现比标准python heapq模块慢得多,同时执行堆排序(对于大小为100,000的列表,heapq需要0.6s而我的代码需要2s(原来是2.6s,切断它)通过从percDown中取出len()语句并传递长度来减少到2s所以每次方法调用自身时都不必计算len.这是我的实现:

def percDown(lst, start, end, node):
    #Moves given node down the heap, starting at index start, until the heap property is
    #satisfied (all children must be larger than their parent)
    iChild = 2 * start + 1
    i = start
    # if the node has reached the end of the heap (i.e. no children left),
    # return its index (we are done)
    if iChild > end - 1:
        return start
    #if the second child exists and is smaller than the first child, use that child index
    #for comparing later
    if iChild + 1 < end and lst[iChild + 1] < lst[iChild]:
        iChild += 1
    #if the smallest child is less than the node, it is the new parent
    if lst[iChild] < node:
        #move the child to the parent position
        lst[start] = lst[iChild]
        #continue recursively going through the child nodes of the
        # new parent node to find where node is meant to go
        i = percDown(lst, iChild, end, node)
    return i
Run Code Online (Sandbox Code Playgroud)

popMin:弹出最小值(lst [0])并重新排序堆

def popMin(lst):
    length = len(lst)
    if (length > 1):
        min = lst[0]
        ele = lst.pop()
        i = percDown(lst, 0, length - 1, ele)
        lst[i] = ele
        return min
    else:
        return lst.pop()
Run Code Online (Sandbox Code Playgroud)

heapify:将列表就地转换为堆

def heapify(lst):
    iLastParent = math.floor((len(lst) - 1) / 2)
    length = len(lst)
    while iLastParent >= 0:
        ele = lst[iLastParent]
        i = percDown(lst, iLastParent, length, lst[iLastParent])
        lst[i] = ele
        iLastParent -= 1
Run Code Online (Sandbox Code Playgroud)

sort:使用上述方法(不是就地)对给定列表进行排序

def sort(lst):
    result = []
    heap.heapify(lst)
    length = len(lst)
    for z in range(0, length):
        result.append(heap.popMin(lst))
    return result
Run Code Online (Sandbox Code Playgroud)

我是否错误地增加了算法/堆创建的复杂性,还是只是python heapq模块被大量优化?我感觉它是前者,因为0.6s vs 2s是一个巨大的差异.

Mar*_*ers 6

Python heapq模块使用C扩展.你无法击败C代码.

heapq模块源代码:

# If available, use C implementation
try:
    from _heapq import *
except ImportError:
    pass
Run Code Online (Sandbox Code Playgroud)

另请参阅_heapqmodule.cC源代码.