C++ 11元组性能

Question

C++ 11元组性能

我正准备通过std::tuple在包括单个元素在内的很多情况下使用来使我的代码更加通用化.我的意思是例如tuple<double>而不是double.但我决定检查这个特例的表现.

这是简单的性能基准测试:

#include <tuple>
#include <iostream>

using std::cout;
using std::endl;
using std::get;
using std::tuple;

int main(void)
{

#ifdef TUPLE
    using double_t = std::tuple<double>;
#else
    using double_t = double;
#endif

    constexpr int count = 1e9;
    auto array = new double_t[count];

    long long sum = 0;
    for (int idx = 0; idx < count; ++idx) {
#ifdef TUPLE
        sum += get<0>(array[idx]);
#else
        sum += array[idx];
#endif
    }
    delete[] array;
    cout << sum << endl; // just "external" side effect for variable sum.
}

Run Code Online (Sandbox Code Playgroud)

并运行结果:

$ g++ -DTUPLE -O2 -std=c++11 test.cpp && time ./a.out
0  

real    0m3.347s
user    0m2.839s
sys     0m0.485s

$ g++  -O2 -std=c++11 test.cpp && time ./a.out
0  

real    0m2.963s
user    0m2.424s
sys     0m0.519s

Run Code Online (Sandbox Code Playgroud)

我认为元组是严格的静态编译模板,并且所有get <>函数在这种情况下都只是通常的变量访问.此测试中的BTW内存分配大小相同.为什么会发生执行时间差异？

编辑:问题是在元组<>对象的初始化.为了使测试更准确,必须更改一行:

     constexpr int count = 1e9;
-    auto array = new double_t[count];
+    auto array = new double_t[count]();

     long long sum = 0;

Run Code Online (Sandbox Code Playgroud)

之后可以观察到类似的结果:

$ g++ -DTUPLE -g -O2 -std=c++11 test.cpp && (for i in $(seq 3); do time ./a.out; done) 2>&1 | grep real
real    0m3.342s
real    0m3.339s
real    0m3.343s

$ g++ -g -O2 -std=c++11 test.cpp && (for i in $(seq 3); do time ./a.out; done) 2>&1 | grep real
real    0m3.349s
real    0m3.339s
real    0m3.334s

Run Code Online (Sandbox Code Playgroud)

Answer 1

aar*_*man 12

元组所有默认构造值(所以一切都是0)双精度不会默认初始化.

在生成的程序集中,仅在使用元组时才会出现以下初始化循环.否则它们是等价的.

.L2:
    movq    $0, (%rdx)
    addq    $8, %rdx
    cmpq    %rcx, %rdx
    jne .L2

Run Code Online (Sandbox Code Playgroud)

很好的观察.OP应该写`new double_t [count]();`进行公平的比较. (6认同)
@KerrekSB谢谢,我同意他应该写一个新的测试,我喜欢证明c ++与c一样快,你应该相信你的编译器 (3认同)

归档时间：	11 年，11 月前
查看次数：	4520 次
最近记录：	11 年，11 月前