Dan*_*maa 9 sorting algorithm performance time-complexity
我不断产生大量的数字,以后需要迭代。
将它们添加到数组然后使用,std::sort()或将它们添加到heap(优先级队列),然后将其弹出,会更有效吗?
目前,我只是有一个vector我要推迟的。另一个数据结构是否更适合按需排序?因此问题是,在log(n)每个插入(nlogn)处插入n次n堆比在事实之后(也nlogn)进行排序要快吗?
运行以下程序(在GNU / Linux上使用gcc 8.3)会得到以下结果:
100 elements, 2171202 iterations --> v: 1.7s pq: 2.89572s (x 1.70337)
1000 elements, 144400 iterations --> v: 3.08776s pq: 6.75459s (x 2.18754)
10000 elements, 10816 iterations --> v: 5.24278s pq: 8.79159s (x 1.6769)
100000 elements, 841 iterations --> v: 5.06147s pq: 8.62931s (x 1.7049)
1000000 elements, 72 iterations --> v: 4.64172s pq: 9.16332s (x 1.97412)
Run Code Online (Sandbox Code Playgroud)
无论如何,sort()一次调用一次vector似乎比使用一次更好priority_queue。
/*
g++ -o prog prog.cpp -O3 -march=native \
-std=c++17 -pedantic -Wall -Wextra -Wconversion
*/
#include <iostream>
#include <stdexcept>
#include <cmath>
#include <vector>
#include <queue>
#include <algorithm>
#include <chrono>
#include <random>
int
get_elem_count(int argc,
char **argv)
{
auto elem_count=-1;
if(argc>1)
{
try
{
elem_count=std::stoi(argv[1]);
}
catch(...)
{
// ignore
}
}
return elem_count;
}
std::vector<double>
generate_sequence(int elem_count)
{
auto gen=std::default_random_engine{std::random_device{}()};
auto dist=std::uniform_real_distribution<double>{0.0, 1.0};
auto sequence=std::vector<double>{};
sequence.reserve(elem_count);
for(auto i=0; i<elem_count; ++i)
{
sequence.emplace_back(dist(gen));
}
return sequence;
}
double
current_time()
{
const auto now{std::chrono::system_clock::now().time_since_epoch()};
return 1e-6*double(std::chrono::duration_cast
<std::chrono::microseconds>(now).count());
}
void
vector_test(const std::vector<double> &sequence,
std::vector<double> &tested)
{
//~~~~ consume the sequence and store values ~~~~
tested.clear();
for(const auto &elem: sequence)
{
tested.emplace_back(elem);
}
std::sort(begin(tested), end(tested),
[](const auto &lhs, const auto &rhs)
{
return lhs>rhs;
});
//~~~~ make use of the sorted values ~~~~
auto previous=1.0;
for(const auto &elem: tested)
{
if(elem>previous)
{
throw std::runtime_error{"this is just a dummy test (never true) "
"in order to prevent the optimizer from "
"discarding all the code..."};
}
previous=elem;
}
}
void
priority_queue_test(const std::vector<double> &sequence,
std::priority_queue<double> &tested)
{
//~~~~ consume the sequence and store values ~~~~
for(const auto &elem: sequence)
{
tested.emplace(elem);
}
//~~~~ make use of the sorted values ~~~~
auto previous=1.0;
while(!empty(tested))
{
const auto elem=tested.top();
tested.pop();
if(elem>previous)
{
throw std::runtime_error{"this is just a dummy test (never true) "
"in order to prevent the optimizer from "
"discarding all the code..."};
}
previous=elem;
}
}
int
main(int argc,
char **argv)
{
const auto elem_count=get_elem_count(argc, argv);
if(elem_count<=0)
{
std::cerr << "usage: " << argv[0] << " iteration_count\n";
return 1;
}
const auto iteration_count=
int(1'000'000'000.0/(elem_count*std::log(elem_count)));
const auto generation_count=int(std::sqrt(iteration_count));
const auto repeat_count=iteration_count/generation_count;
auto vector_duration=0.0;
auto priority_queue_duration=0.0;
for(auto generation=0; generation<generation_count; ++generation)
{
const auto sequence=generate_sequence(elem_count);
auto tested_vector=std::vector<double>{};
auto tested_priority_queue=std::priority_queue<double>{};
auto t0=0.0;
for(auto repeat=0; repeat<repeat_count; ++repeat)
{
if(repeat==1) // 0 is a dry run
{
t0=current_time();
}
vector_test(sequence, tested_vector);
}
vector_duration+=current_time()-t0;
for(auto repeat=0; repeat<repeat_count; ++repeat)
{
if(repeat==1) // 0 is a dry run
{
t0=current_time();
}
priority_queue_test(sequence, tested_priority_queue);
}
priority_queue_duration+=current_time()-t0;
}
std::cout << elem_count << " elements, "
<< generation_count*repeat_count << " iterations --> "
<< "v: "<< vector_duration << "s "
<< "pq: "<< priority_queue_duration << "s "
<< "(x " << priority_queue_duration/vector_duration << ")\n";
return 0;
}
Run Code Online (Sandbox Code Playgroud)
使用向量方法时的操作数为:
n*O(1)=O(n)O(n*logn)O(n)总计= O(n*logn) + 2*O(n)[忽略该秒2*O(n)〜O(n)]。
在使用堆方法的情况下,操作数为:
n*O(logn)=O(n*logn)n*O(logn)=O(n*logn)总计= 2*O(n*logn)。
即使O(n*logn)在两种情况下都可以进行操作,但是根据精确的数学公式,堆和向量情况之间的区别是:
O(堆)-O(向量)= 2*O(n*logn) - O(n*logn) - 2*O(n)
=O(n*logn) - 2*O(n)
这对于以下情况的大型案例将是积极的n:
https://www.desmos.com/calculator/uw4i9oiy19
<iframe src="https://www.desmos.com/calculator/uw4i9oiy19?embed" width="1000px" height="1000px" style="border: 1px solid #ccc" frameborder=0></iframe>Run Code Online (Sandbox Code Playgroud)
在上图中,蓝色为y = x*logx,红色为y = 2*x。
因此,通过此分析,您应该选择向量方法。