堆排序的目的是什么?

Akh*_*S R 2 algorithm heap time-complexity heapsort data-structures

当我们构建堆时,数组中的元素会按照特定顺序(升序或降序)排列,具体取决于它是最大堆还是最小堆。那么当构建堆本身以排序顺序排列元素且时间复杂度较小时,堆排序有什么用呢?

    void build_heap (int Arr[ ])
        {
           for(int i = N/2-1 ; i >= 0; i-- )
           {
              down_heapify (Arr, i, N);
           }
        }
Run Code Online (Sandbox Code Playgroud)
    void heap_sort(int Arr[], int N)
    {
        build_heap(Arr);
        
        for(int i = N-1; i >= 1; i--)
        {
            swap(Arr[i], Arr[0]);
            down_heapify(Arr, 0, i+1);
        }
    }
Run Code Online (Sandbox Code Playgroud)

Ste*_*tef 6

堆排序总结

堆排序是一种算法,可以概括为两个步骤:

  • 将输入数组转换为堆;
  • 将堆转换为排序数组。

堆本身不是一个排序数组。

让我们看一个例子:

[9, 7, 3, 5, 4, 2, 0, 6, 8, 1]   # unsorted array

convert into heap
[9, 8, 3, 7, 4, 2, 0, 6, 5, 1]   # array representing a max-heap

sort
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]   # sorted array
Run Code Online (Sandbox Code Playgroud)

如果仔细观察,您会注意到示例中的第二个数组(代表堆)未完全排序。元素的顺序看起来不像原始未排序数组那样随机;它们看起来几乎是按降序排列的;但它们还没有完全排序。在数组中,3 位于 7 之前,0 位于 6 之前。

那么什么是堆呢?

什么是堆?

请注意,在上一节中,我区分了“堆”和“代表堆的数组”。我们先来说说什么是堆,然后再说什么是代表堆的数组。

最大堆是一个节点上有值的二叉树,它满足以下两个属性:

  • 子节点上的值总是低于其父节点上的值;
  • 该树几乎是完整的,即该树的所有分支都具有几乎相同的长度,最长分支和最短分支之间的差异至多为 1;此外,最长的分支必须在左侧,最短的分支必须在右侧。

In the example I gave, the heap constructed is this one:

          9
      /      \
     8        3
   /   \     / \
  7     4   2   0
 / \   / \ / \ / \
6   5 1
Run Code Online (Sandbox Code Playgroud)

You can check that this binary tree satisfies the two properties of a heap: each child has a lower value than its parent, and all branches have almost the same length, with always 4 or 3 values per branch, and the longest branches being on the left and the shortest branches being on the right.

What is an array representing a heap?

Storing binary trees into arrays is usually pretty inconvenient, and binary trees are most generally implemented using pointers, kinda like a linked list. However, the heap is a very special binary tree, and its "almost-complete" property is super useful to implement it as an array.

All we have to do is read the values row per row, left to right. In the heap above, we have four rows:

9
8 3
7 4 2 0
6 5 1
Run Code Online (Sandbox Code Playgroud)

We simply store these values in that order in an array:

          9
      /      \
     8        3
   /   \     / \
  7     4   2   0
 / \   / \ / \ / \
6   5 1
Run Code Online (Sandbox Code Playgroud)

Notice that this is exactly the array after the first step of heap sort at the beginning of my post.

In this array representation, we can use positions to determine which node is a child of which node: the node at position i has two children, which are at positions 2*i+1 and 2*i+2.

This array is not a sorted array. But it represents a heap and we can easily use it to produce a sorted array, in n log(n) operations, by extracting the maximum element repeatedly.

If heap-sort was implemented with an external binary tree, then we could use either a max-heap or a min-heap, and sort the array by repeatedly selecting the maximum element or the minimum element. However, if you try to implement heap-sort in-place, storing the heap as an array inside the array which is being sorted, you'll notice that it's much more convenient to use a max-heap than a min-heap, in order to sort the elements in increasing order by repeatedly selecting the max element and moving it to the end of the array.