并行排序单链表

Ian*_*oyd 6 sorting algorithm parallel-processing performance linked-list

是否有任何算法使其链接列表的并行排序值得?

众所周知,Merge Sort是用于排序链表的最佳算法.

大多数合并排序都是根据数组来解释的,每一半都是递归排序的.这将使并行化变得微不足道:独立地对每一半进行排序然后合并两半.

但链表没有"中途"点; 链表一直持续到结束:

头→[a]→[b]→[c]→[d]→[e]→[f]→[g]→[h]→[i]→[j]→...

我现在已经执行了一次实现以获得计数,然后递归地分割计数,直到我们将节点与它进行比较NextNode.递归负责记住两半的位置.

这意味着链表的MergeSort在列表中线性前进.由于它似乎要求通过列表线性进展,我认为它不能并行化.我能想象的唯一方法是:

  • 走一下列表来统计 O(n)
  • 走到列表的一半,到达中途点 O(n/2)
  • 然后排序每一半 O(n log n)

但即使我们在单独的线程中并行排序(a,b)和(c,d),我也会认为NextNode重新排序期间的错误共享会破坏任何并行化的优点.

有没有用于排序链表的并行算法?

数组合并排序算法

以下是对数组执行合并排序的标准算法:

algorithm Merge-Sort
    input:
        an array, A (the values to be sorted)
        an integer, p (the lower bound of the values to be sorted)
        an integer, r (the upper bound of the values to be sorted)

    define variables:
        an integer, q (the midpoint of the values to be sorted)

    q ? ?(p+r)/2?
    Merge-Sort(A, p, q)   //sort the lower half
    Merge-Sort(A, q+1, r) //sort the upper half   
    Merge(A, p, q, r)     
Run Code Online (Sandbox Code Playgroud)

该算法是针对具有任意索引访问的数组而设计的.要使其适用于链表,必须进行修改.

链表合并排序算法

这是(单线程)单链表,合并排序,我目前用于对单链表进行排序的算法.它来自Gonnet + Baeza Yates算法手册

algorithm sort:
    input:
        a reference to a list, r (pointer to the first item in the linked list)
        an integer, n (the number of items to be sorted)
    output:
        a reference to a list (pointer to the sorted list)

    define variables:
        a reference to a list, A (pointer to the sorted top half of the list)
        a reference to a list, B (pointer to the sorted bottom half of the list)
        a reference to a list, temp (temporary variable used to swap)

    if r = nil then
        return nil

    if n > 1 then
        A ? sort(r, ?n/2? )
        B ? sort(r, ?(n+1)/2? )
        return merge( A, B )

    temp ? r
    r ? r.next
    temp.next ? nil
    return temp
Run Code Online (Sandbox Code Playgroud)

一个帕斯卡实现将是:

function MergeSort(var r: list; n: integer): list;
begin
   if r = nil then 
       Result := nil
   else if n > 1 then
      Result := Merge(MergeSort(r, n div 2), MergeSort(r, (n+1) div 2) )
   else
   begin
      Result := r;
      r := r.next;
      Result.next := nil;
   end
end;
Run Code Online (Sandbox Code Playgroud)

如果我的转码工作,这里是一个即时的C#翻译:

list function MergeSort(ref list r, Int32 n)
{
   if (r == null)
      return null;

    if (n > 1)
    {
       list A = MergeSort(r, n / 2);
       list B = MergeSort(r, (n+1) / 2);
       return Merge(A, B);
    }
    else
    {
       list temp = r;
       r = r.next;
       temp.next = null;
       return temp;
    }
}
Run Code Online (Sandbox Code Playgroud)

我现在需要的是一种对链表进行排序的并行算法.它不必是合并排序.

有些人建议复制下面的n个项目,其中n个项目适合单个缓存行,并用这些项目生成任务.

样本数据

algorithm GenerateSampleData
    input:
        an integer, n (the number of items to generate in the linked list)
    output:
        a reference to a node (the head of the linked list of random data to be sorted)

    define variables:
        a reference to a node, head (the returned head)
        a reference to a node, item (an item in the linked list)
        an integer, i (a counter)

    head ? new node
    item ? head        

    for i ? 1 to n do
        item.value ? Random()
        item.next ? new node
        item ? item.next

    return head
Run Code Online (Sandbox Code Playgroud)

因此,我们可以通过调用以下内容生成300,000个随机项的列表:

head := GenerateSampleData(300000);
Run Code Online (Sandbox Code Playgroud)

基准

Time to generate 300,000 items    568 ms

MergeSort 
    count splitting variation   3,888 ms (baseline)

MergeSort
    Slow-Fast midpoint finding  3,920 ms (0.8% slower)

QuickSort
    Copy linked list to array       4 ms
    Quicksort array             5,609 ms
    Relink list                     5 ms
    Total                       5,625 ms (44% slower)
Run Code Online (Sandbox Code Playgroud)

奖金阅读

Phi*_*lip 5

Mergesort非常适合并行排序.将列表分成两半并将它们并行排序,然后合并结果.如果您需要两个以上的并行排序过程,请以递归方式执行此操作.如果您碰巧没有无限多的CPU,则可以在某个重复深度(您必须通过测试确定)中省略并行化.

顺便说一下,将列表分成大致相同大小的两半的常用方法是Floyd的Cycle Finding算法,也称为野兔和乌龟方法:

Node MergeSort(Node head)
{
   if ((head == null) || (head.Next == null))
      return head; //Oops, don't return null; what if only head.Next was null

   Node firstHalf = head;
   Node middle = GetMiddle(head);
   Node secondHalf = middle.Next;
   middle.Next = null; //cut the two halves

   //Sort the lower and upper halves
   firstHalf = MergeSort(firstHalf);
   secondHalf = MergeSort(secondHalf);

   //Merge the sorted halves 
   return Merge(firstHalf, secondHalf);
}

Node GetMiddle(Node head)
{
   if (head == null || head.Next == null)
       return null;

   Node slow = head;
   Node fast = head;
   while ((fast.Next != null) && (fast.Next.Next != null))
   {
       slow = slow.Next;
       fast = fast.Next.Next;
   }
   return slow;
}
Run Code Online (Sandbox Code Playgroud)

在此之后,listlist2大致相同大小的两个列表.连接它们将产生原始列表.当然fast = fast->next->next需要进一步关注; 这只是为了说明一般原则.

在此输入图像描述