为什么积累比简单的循环更快?

Let*_*_Be 10 c++ performance

我正在测试算法并遇到这种奇怪的行为,当时std::accumulate比简单for循环更快.

看看生成的汇编程序我不是更明智:-)似乎for循环被优化为MMX指令,而累积则扩展为循环.

这是代码.行为表现为-O3优化级别,gcc 4.7.1

#include <vector>                                                                                                                                                                                                                                                              
#include <chrono>                                                                                                                                                                                                                                                              
#include <iostream>                                                                                                                                                                                                                                                            
#include <random>                                                                                                                                                                                                                                                              
#include <algorithm>                                                                                                                                                                                                                                                           
using namespace std;                                                                                                                                                                                                                                                           

int main()                                                                                                                                                                                                                                                                     
{                                                                                                                                                                                                                                                                              
    const size_t vsize = 100*1000*1000;                                                                                                                                                                                                                                        

    vector<int> x;
    x.reserve(vsize);

    mt19937 rng;
    rng.seed(chrono::system_clock::to_time_t(chrono::system_clock::now()));

    uniform_int_distribution<uint32_t> dist(0,10);

    for (size_t i = 0; i < vsize; i++)
    {
        x.push_back(dist(rng));
    }

    long long tmp = 0;
    for (size_t i = 0; i < vsize; i++)
    {
        tmp += x[i];
    }

    cout << "dry run " << tmp << endl;

    auto start = chrono::high_resolution_clock::now();
    long long suma = accumulate(x.begin(),x.end(),0);
    auto end = chrono::high_resolution_clock::now();

    cout << "Accumulate runtime " << chrono::duration_cast<chrono::nanoseconds>(end-start).count() << " - " << suma << endl;

    start = chrono::high_resolution_clock::now();
    suma = 0;
    for (size_t i = 0; i < vsize; i++)
    {
        suma += x[i];
    }
    end = chrono::high_resolution_clock::now();

    cout << "Manual sum runtime " << chrono::duration_cast<chrono::nanoseconds>(end-start).count() << " - " << suma <<  endl;

    return 0;
}
Run Code Online (Sandbox Code Playgroud)

Vau*_*ato 9

当你传递0给累积时,你使用int而不是long long来累积它.

如果您像这样编写手动循环,它将是等效的:

int sumb = 0;
for (size_t i = 0; i < vsize; i++)
{
    sumb += x[i];
}
suma = sumb;
Run Code Online (Sandbox Code Playgroud)

或者你可以这样调用累积:

long long suma = accumulate(x.begin(),x.end(),0LL);
Run Code Online (Sandbox Code Playgroud)


Bla*_*ace 6

我使用Visual Studio 2012得到了一些不同的结果

// original code
Accumulate runtime 93600 ms
Manual sum runtime 140400 ms
Run Code Online (Sandbox Code Playgroud)

请注意,原始std::accumulate代码不等同于for循环,因为第三个参数to std::accumulateint0值.它使用a执行求和,int并且仅在最后将结果存储在a中long long.更改第三个参数以0LL强制算法使用long long累加器并导致以下时间.

// change std::accumulate initial value -> 0LL
Accumulate runtime 265200 ms
Manual sum runtime 140400 ms
Run Code Online (Sandbox Code Playgroud)

由于最终结果适合int我改变sumastd::accumulate返回仅使用int值.这一变化后,MSVC 2012的编译器能够自动向量化for循环,并导致下列时间.

// change suma from long long to int
Accumulate runtime 93600 ms
Manual sum runtime 46800 ms
Run Code Online (Sandbox Code Playgroud)