为什么我的n log(n)heapsort比我的n ^ 2选择排序慢

Question

为什么我的n log(n)heapsort比我的n ^ 2选择排序慢

smi*_*dha 5 c++ sorting heapsort selection-sort

我已经实现了两种算法,用于从最高到最低排序元素.

第一个,在实际RAM模型上采用二次时间,第二个采用O(n log(n))时间.第二个使用优先级队列来减少.

以下是时间,它是上述程序的输出.

第一列是随机整数数组的大小
第二列是O(n ^ 2)技术的秒数

第三列是O(n log(n))技术的秒数

 9600  1.92663      7.58865
 9800  1.93705      7.67376
10000  2.08647      8.19094

Run Code Online (Sandbox Code Playgroud)

尽管复杂性存在很大差异,但对于所考虑的阵列大小,第3列大于第2列.为什么会这样？C++的优先级队列实现是否缓慢？

我在Windows 7上运行此代码,Visual Studio 2012 32位.

这是代码,

#include "stdafx.h"
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <algorithm>
#include <vector>
#include <queue>
#include <Windows.h>
#include <assert.h>

using namespace std;

double time_slower_sort(vector<int>& a) 
{
    LARGE_INTEGER frequency, start,end;
    if (::QueryPerformanceFrequency(&frequency) == FALSE  ) exit(0);
    if (::QueryPerformanceCounter(&start)       == FALSE  ) exit(0);

    for(size_t i=0 ; i < a.size() ; ++i)  
    {

        vector<int>::iterator it = max_element( a.begin() + i ,a.end() ) ;
        int max_value = *it; 
        *it = a[i];
        a[i] = max_value;    

    }
    if (::QueryPerformanceCounter(&end) == FALSE) exit(0);
    return static_cast<double>(end.QuadPart - start.QuadPart) / frequency.QuadPart;
}



double time_faster_sort(vector<int>& a) 
{
    LARGE_INTEGER frequency, start,end;
    if (::QueryPerformanceFrequency(&frequency) == FALSE  ) exit(0);
    if (::QueryPerformanceCounter(&start)       == FALSE  ) exit(0);


    // Push into the priority queue. Logarithmic cost per insertion = > O (n log(n)) total insertion cost
    priority_queue<int> pq;
    for(size_t i=0 ; i<a.size() ; ++i)
    {
        pq.push(a[i]);
    }

    // Read of the elements from the priority queue in order of priority
    // logarithmic reading cost per read => O(n log(n)) reading cost for entire vector
    for(size_t i=0 ; i<a.size() ; ++i)
    {
        a[i] = pq.top();
        pq.pop();
    }
    if (::QueryPerformanceCounter(&end) == FALSE) exit(0);
    return static_cast<double>(end.QuadPart - start.QuadPart) / frequency.QuadPart;

}




int main(int argc, char** argv)
{
    // Iterate over vectors of different sizes and try out the two different variants
    for(size_t N=1000; N<=10000 ; N += 100 ) 
    {

        // initialize two vectors with identical random elements
        vector<int> a(N),b(N);

        // initialize with random elements
        for(size_t i=0 ; i<N ; ++i) 
        {
            a[i] = rand() % 1000; 
            b[i] = a[i];
        }

        // Sort the two different variants and time them  
        cout << N << "  " 
             << time_slower_sort(a) << "\t\t" 
             << time_faster_sort(b) << endl;

        // Sanity check
        for(size_t i=0 ; i<=N-2 ; ++i) 
        {
            assert(a[i] == b[i]); // both should return the same answer
            assert(a[i] >= a[i+1]); // else not sorted
        }

    }
    return 0;
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

LSe*_*rni 12

对于所考虑的阵列大小,第3列大于第2列.

"Big O"符号仅告诉您时间随输入大小的增长情况.

你的时间是(或应该是)

A + B*N^2          for the quadratic case,
C + D*N*LOG(N)     for the linearithmic case.

Run Code Online (Sandbox Code Playgroud)

但是C完全有可能比A大得多,当N足够小时,导致线性代码的执行时间更长.

线性化有趣的是,如果您的输入从9600增加到19200(加倍),对于二次算法,您的执行时间应该大约为四倍,大约为8秒,而线性算法应该比其执行时间加倍.

因此执行时间比率将从2:8变为8:16,即二次算法现在只有两倍的速度.

再次加倍输入的大小,8:16变为32:32; 当面对大约40,000的输入时,这两种算法同样快.

当处理输入大小为80,000时,比率相反:四次32为128,而32次只有64. 128:64意味着线性算法现在是另一种的两倍.

您应该运行具有不同大小的测试,可能是N,2*N和4*N,以便更好地估计A,B,C和D常数.

这一切归结为,不要盲目依赖Big O分类.如果您希望您的输入增长,请使用它; 但对于小输入,很可能是一个不太可扩展的算法效率更高.

例如你会看到,对于小输入大小,更快的算法是以指数时间运行的算法,它比对数时间快几百倍.但是一旦输入大小超过9,指数算法的运行时间就会飙升,而另一方则不会.

您甚至可能决定实现两种版本的算法,并根据输入大小使用其中一种算法.有一些递归算法正是这样做的,并切换到最后一次迭代的迭代实现.在图示的情况下,您可以为每个尺寸范围实施最佳算法; 但最好的折衷方案是仅采用两种算法,二次采用N = 15,然后切换到对数.

我在这里找到了对Introsort的引用,其中

是一种最初使用Quicksort的排序算法,但是当递归深度超过基于被排序元素数量的对数的级别时切换到Heapsort,并且由于其良好的引用局部性,即对于小型情况使用插入排序,即数据很可能驻留在内存中并且很容易引用.

在上面的例子中,Insertion排序利用了内存局部性,这意味着它的B常量非常小; 递归算法可能会产生更高的成本,并且具有显着的C值.因此,对于小型数据集,即使其Big O分类较差,更紧凑的算法也能很好地完成.

Answer 2

Net*_*peC 3

我认为这个问题确实比人们想象的更微妙。在您的O(N^2)解决方案中，您没有进行分配，算法就地工作，搜索最大的并与当前位置交换。还行吧。

但是在priority_queue版本O(N log N)中（priority_queue在内部，有一个std::vector默认的，来存储状态）。当vector您push_back逐个元素时，有时它需要增长（确实如此），但这是您在O(N^2)版本中不会损失的时间。如果您在初始化时进行以下小更改priority_queue：

priority_queue<int> pq(a.begin(), a.end());而不是for loop

O(N log N)的时间比O(N^2) 的时间要好得多。在提议的更改中，版本中仍然存在分配priority_queue，但只是一次（您为大尺寸节省了大量分配vector，并且分配是重要的耗时操作之一），并且可能是初始化（在O(N) 中）可以利用的完整状态priority_queue，不知道是否STL真的这样做）。

示例代码（用于编译和运行）：

#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <algorithm>
#include <vector>
#include <queue>
#include <Windows.h>
#include <assert.h>

using namespace std;

double time_slower_sort(vector<int>& a) {
    LARGE_INTEGER frequency, start, end;
    if (::QueryPerformanceFrequency(&frequency) == FALSE)
        exit(0);
    if (::QueryPerformanceCounter(&start) == FALSE)
        exit(0);

    for (size_t i = 0; i < a.size(); ++i) {

        vector<int>::iterator it = max_element(a.begin() + i, a.end());
        int max_value = *it;
        *it = a[i];
        a[i] = max_value;
    }
    if (::QueryPerformanceCounter(&end) == FALSE)
        exit(0);
    return static_cast<double>(end.QuadPart - start.QuadPart) /
           frequency.QuadPart;
}

double time_faster_sort(vector<int>& a) {
    LARGE_INTEGER frequency, start, end;
    if (::QueryPerformanceFrequency(&frequency) == FALSE)
        exit(0);
    if (::QueryPerformanceCounter(&start) == FALSE)
        exit(0);

    // Push into the priority queue. Logarithmic cost per insertion = > O (n
    // log(n)) total insertion cost
    priority_queue<int> pq(a.begin(), a.end());  // <----- THE ONLY CHANGE IS HERE

    // Read of the elements from the priority queue in order of priority
    // logarithmic reading cost per read => O(n log(n)) reading cost for entire
    // vector
    for (size_t i = 0; i < a.size(); ++i) {
        a[i] = pq.top();
        pq.pop();
    }
    if (::QueryPerformanceCounter(&end) == FALSE)
        exit(0);
    return static_cast<double>(end.QuadPart - start.QuadPart) /
           frequency.QuadPart;
}

int main(int argc, char** argv) {
    // Iterate over vectors of different sizes and try out the two different
    // variants
    for (size_t N = 1000; N <= 10000; N += 100) {

        // initialize two vectors with identical random elements
        vector<int> a(N), b(N);

        // initialize with random elements
        for (size_t i = 0; i < N; ++i) {
            a[i] = rand() % 1000;
            b[i] = a[i];
        }

        // Sort the two different variants and time them
        cout << N << "  " << time_slower_sort(a) << "\t\t"
             << time_faster_sort(b) << endl;

        // Sanity check
        for (size_t i = 0; i <= N - 2; ++i) {
            assert(a[i] == b[i]);     // both should return the same answer
            assert(a[i] >= a[i + 1]); // else not sorted
        }
    }
    return 0;
}

Run Code Online (Sandbox Code Playgroud)

在我的 PC（Core 2 Duo 6300）中，获得的输出是：

1100  0.000753738      0.000110263
1200  0.000883201      0.000115749
1300  0.00103077       0.000124526
1400  0.00126994       0.000250698
...
9500  0.0497966        0.00114377
9600  0.051173         0.00123429
9700  0.052551         0.00115804
9800  0.0533245        0.00117614
9900  0.0555007        0.00119205
10000 0.0552341        0.00120466

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，4 月前
查看次数：	755 次
最近记录：	6 年，6 月前