Sex*_*ast 4 python sorting algorithm heapsort
我有两个heapsort
算法.第一个是由我写的,而第二个是从某个网站上获取的.据我所知,两者都有相同的逻辑,但第二个的表现比第一个好.出现这种情况的原因是什么?我能看到的唯一区别是我使用递归,而另一个则迭代地进行递归.这可以单独作为差异化因素吗?
我的代码:
def heapify(arr,i,n):
pivot = arr[i] #the value of the root node
left,right = (i<<1)+1,(i<<1)+2 #indices of the left and right subtree root nodes
if right <= n-1: #if right is within the array, so is left
if arr[left] <= pivot and arr[right] <= pivot:
return #if both are less than the root node, it's already heapified
maximum = left if arr[left] >= arr[right] else right #else find which child has a higher value
arr[maximum],arr[i] = arr[i],arr[maximum] #swap the root node with that child
return heapify(arr,maximum,n) #make the changed child the new root and recurse
else:
if left <= n-1: #right is outside the array, so check for left only
if arr[left] <= pivot:
return
arr[i],arr[left] = arr[left], arr[i] #same logic as above
return heapify(arr,left,n)
else:
return
def heapit(array,n):
for i in range((len(array)-1)/2,-1,-1): #all elements after (len(array)-1)/2 are the leaf nodes, so we have to heapify earlier nodes
heapify(array,i,n)
def heapsort(array):
n = len(array)
for i in range(n,0,-1):
heapit(array,i) #make the array a heap
array[0],array[i-1] = array[i-1],array[0] #swap the root node with the last element
Run Code Online (Sandbox Code Playgroud)
其他代码:
def HeapSort(A):
def heapify(A):
start = (len(A) - 2) / 2
while start >= 0:
siftDown(A, start, len(A) - 1)
start -= 1
def siftDown(A, start, end):
root = start
while root * 2 + 1 <= end:
child = root * 2 + 1
if child + 1 <= end and A[child] < A[child + 1]:
child += 1
if child <= end and A[root] < A[child]:
A[root], A[child] = A[child], A[root]
root = child
else:
return
heapify(A)
end = len(A) - 1
while end > 0:
A[end], A[0] = A[0], A[end]
siftDown(A, 0, end - 1)
end -= 1
Run Code Online (Sandbox Code Playgroud)
即使对于大小约为100,000的小阵列,差异也变得很大.我通过传递要排序到函数的数组调用任一代码:HeapSort(list)
或heapsort(list)
.
编辑:
我heapsort
用这个替换了这个函数:
def heapsort(array):
n = len(array)
heapit(array,n)
array[n-1],array[0] = array[0],array[n-1]
for i in range(n-1):
heapify(array,0,n-1-i)
array[n-i-2],array[0] = array[0],array[n-i-2]
Run Code Online (Sandbox Code Playgroud)
这提供了相当的性能,但仍然较慢.对于100万美元的阵列,结果几乎是20秒:4秒.还有什么可以做的?
编辑:我的下面的评论可能解释了一个重大的减速,但最重要的是你的算法不是heapsort.
在函数内部heapsort
,执行循环for i in range(n,0,-1)
.这是n
迭代,其中n
输入的大小.在那个循环中,你调用heapit
,循环for i in range((len(array)-1)/2,-1,-1)
; 这大致是n//2
迭代.
n * (n // 2)
=Θ(n
²).换句话说,你有一个至少需要二次时间的算法,而第二个算法实现了在O(n
lg n
)时间内运行的真正的heapsort .
/编辑
这很可能是递归的杀戮性能,与调用模块级定义的函数相结合.Python(至少CPython)没有针对递归程序进行优化,而是针对迭代程序进行优化.对于每次递归调用heapify
,CPython必须执行以下七字节代码指令:
9 158 LOAD_GLOBAL 0 (heapify)
161 LOAD_FAST 0 (arr)
164 LOAD_FAST 6 (maximum)
167 LOAD_FAST 2 (n)
170 CALL_FUNCTION 3
173 RETURN_VALUE
>> 174 POP_TOP
Run Code Online (Sandbox Code Playgroud)
(确定使用dis
).递归调用完成后进行最后的两个指令,因为Python不不执行尾调用优化.
虽然这可能看起来不贵,一个LOAD_GLOBAL
必须做至少一个哈希表查找刚刚找到heapify
,并为引用计数heapify
,arr
,maximum
并i
已被递增.递归调用完成后,必须再次递减引用计数.函数调用在Python中相当昂贵.
如上所述import this
,"平坦优于嵌套":尽可能优先考虑迭代而不是递归.