与其他算法相比,Heapsort算法表现不佳

Sex*_*ast 4 python sorting algorithm heapsort

我有两个heapsort算法.第一个是由我写的,而第二个是从某个网站上获取的.据我所知,两者都有相同的逻辑,但第二个的表现比第一个好.出现这种情况的原因是什么?我能看到的唯一区别是我使用递归,而另一个则迭代地进行递归.这可以单独作为差异化因素吗?

我的代码:

def heapify(arr,i,n):
    pivot = arr[i]   #the value of the root node
    left,right = (i<<1)+1,(i<<1)+2  #indices of the left and right subtree root nodes
    if right <= n-1:  #if right is within the array, so is left
        if arr[left] <= pivot and arr[right] <= pivot:
            return  #if both are less than the root node, it's already heapified
        maximum = left if arr[left] >= arr[right] else right #else find which child has a higher value
        arr[maximum],arr[i] = arr[i],arr[maximum]  #swap the root node with that child
        return heapify(arr,maximum,n)  #make the changed child the new root and recurse
    else:
      if left <= n-1:  #right is outside the array, so check for left only
        if arr[left] <= pivot:
            return
        arr[i],arr[left] = arr[left], arr[i]  #same logic as above
        return heapify(arr,left,n)
      else:
          return

def heapit(array,n):
    for i in range((len(array)-1)/2,-1,-1):  #all elements after (len(array)-1)/2 are the leaf nodes, so we have to heapify earlier nodes
        heapify(array,i,n)

def heapsort(array):
    n = len(array)
    for i in range(n,0,-1):
        heapit(array,i)  #make the array a heap
        array[0],array[i-1] = array[i-1],array[0]  #swap the root node with the last element
Run Code Online (Sandbox Code Playgroud)

其他代码:

def HeapSort(A):
     def heapify(A):
        start = (len(A) - 2) / 2
        while start >= 0:
            siftDown(A, start, len(A) - 1)
            start -= 1

     def siftDown(A, start, end):
        root = start
        while root * 2 + 1 <= end:
            child = root * 2 + 1
            if child + 1 <= end and A[child] < A[child + 1]:
                child += 1
            if child <= end and A[root] < A[child]:
                A[root], A[child] = A[child], A[root]
                root = child
            else:
                return

     heapify(A)
     end = len(A) - 1
     while end > 0:
        A[end], A[0] = A[0], A[end]
        siftDown(A, 0, end - 1)
        end -= 1
Run Code Online (Sandbox Code Playgroud)

即使对于大小约为100,000的小阵列,差异也变得很大.我通过传递要排序到函数的数组调用任一代码:HeapSort(list)heapsort(list).

编辑:

heapsort用这个替换了这个函数:

def heapsort(array):
     n = len(array)
     heapit(array,n)
     array[n-1],array[0] = array[0],array[n-1]
     for i in range(n-1):
       heapify(array,0,n-1-i)
       array[n-i-2],array[0] = array[0],array[n-i-2]
Run Code Online (Sandbox Code Playgroud)

这提供了相当的性能,但仍然较慢.对于100万美元的阵列,结果几乎是20秒:4秒.还有什么可以做的?

Fre*_*Foo 5

编辑:我的下面的评论可能解释了一个重大的减速,但最重要的是你的算法不是heapsort.

在函数内部heapsort,执行循环for i in range(n,0,-1).这是n迭代,其中n输入的大小.在那个循环中,你调用heapit,循环for i in range((len(array)-1)/2,-1,-1); 这大致是n//2迭代.

n * (n // 2)=Θ(n²).换句话说,你有一个至少需要二次时间的算法,而第二个算法实现了在O(nlg n)时间内运行的真正的heapsort .

/编辑

这很可能是递归的杀戮性能,与调用模块级定义的函数相结合.Python(至少CPython)没有针对递归程序进行优化,而是针对迭代程序进行优化.对于每次递归调用heapify,CPython必须执行以下七字节代码指令:

  9         158 LOAD_GLOBAL              0 (heapify)
            161 LOAD_FAST                0 (arr)
            164 LOAD_FAST                6 (maximum)
            167 LOAD_FAST                2 (n)
            170 CALL_FUNCTION            3
            173 RETURN_VALUE        
        >>  174 POP_TOP             
Run Code Online (Sandbox Code Playgroud)

(确定使用dis).递归调用完成后进行最后的两个指令,因为Python不执行尾调用优化.

虽然这可能看起来不贵,一个LOAD_GLOBAL必须做至少一个哈希表查找刚刚找到heapify,并为引用计数heapify,arr,maximumi已被递增.递归调用完成后,必须再次递减引用计数.函数调用在Python中相当昂贵.

如上所述import this,"平坦优于嵌套":尽可能优先考虑迭代而不是递归.