我试图使用一个简单的for循环,一个std::accumulate和一个manualy展开的for循环来对数组元素求和.正如我所料,手动展开的循环是最快的循环,但更有趣的是std :: accumulate比简单循环慢得多.这是我的代码,我使用带有-O3标志的gcc 4.7编译它.Visual Studio将需要不同的rdtsc函数实现.
#include <iostream>
#include <algorithm>
#include <numeric>
#include <stdint.h>
using namespace std;
__inline__ uint64_t rdtsc() {
uint64_t a, d;
__asm__ volatile ("rdtsc" : "=a" (a), "=d" (d));
return (d<<32) | a;
}
class mytimer
{
public:
mytimer() { _start_time = rdtsc(); }
void restart() { _start_time = rdtsc(); }
uint64_t elapsed() const
{ return rdtsc() - _start_time; }
private:
uint64_t _start_time;
}; // timer
int main()
{
const int num_samples = 1000;
float* samples …Run Code Online (Sandbox Code Playgroud) 我正在测试算法并遇到这种奇怪的行为,当时std::accumulate比简单for循环更快.
看看生成的汇编程序我不是更明智:-)似乎for循环被优化为MMX指令,而累积则扩展为循环.
这是代码.行为表现为-O3优化级别,gcc 4.7.1
#include <vector>
#include <chrono>
#include <iostream>
#include <random>
#include <algorithm>
using namespace std;
int main()
{
const size_t vsize = 100*1000*1000;
vector<int> x;
x.reserve(vsize);
mt19937 rng;
rng.seed(chrono::system_clock::to_time_t(chrono::system_clock::now()));
uniform_int_distribution<uint32_t> dist(0,10);
for (size_t i = 0; i < vsize; i++)
{
x.push_back(dist(rng));
}
long long tmp = 0;
for (size_t i = 0; i < vsize; i++)
{
tmp += x[i];
}
cout << "dry run " << tmp << …Run Code Online (Sandbox Code Playgroud)