相关疑难解决方法(0)

启用C++ 11时std :: vector performance regression

当我启用C++ 11时,我在一个小的C++片段中发现了一个有趣的性能回归:

#include <vector>

struct Item
{
  int a;
  int b;
};

int main()
{
  const std::size_t num_items = 10000000;
  std::vector<Item> container;
  container.reserve(num_items);
  for (std::size_t i = 0; i < num_items; ++i) {
    container.push_back(Item());
  }
  return 0;
}

Run Code Online (Sandbox Code Playgroud)

使用g ++(GCC)4.8.2 20131219(预发行版)和C++ 03,我得到:

milian:/tmp$ g++ -O3 main.cpp && perf stat -r 10 ./a.out

Performance counter stats for './a.out' (10 runs):

        35.206824 task-clock                #    0.988 CPUs utilized            ( +-  1.23% )
                4 context-switches          #    0.116 K/sec                    ( +-  4.38% )
                0 cpu-migrations …

Run Code Online (Sandbox Code Playgroud)

c++ performance gcc vector c++11

mil*_*anw

2014 01-08

235
推荐指数

1
解决办法

6548
查看次数

UNIX`time`命令对于基准测试是否足够准确？

假设我想对两个程序进行基准测试:foo.py和bar.py.

是几千次运行和各自的平均值time python foo.py和time python bar.py足够的分析和比较他们的速度？

编辑:此外,如果每个程序的执行次高(假设它不是上述),time仍然可以使用？

unix linux benchmarking profiling

chr*_*ode

2012 12-20

44
推荐指数

3
解决办法

2万
查看次数

为什么 std::tuple 会破坏 C++ 中的小型结构调用约定优化？

C++ 具有小型结构调用约定优化，其中编译器在函数参数中传递小型结构与传递原始类型（例如，通过寄存器）一样有效。例如：

class MyInt { int n; public: MyInt(int x) : n(x){} };
void foo(int);
void foo(MyInt);
void bar1() { foo(1); }
void bar2() { foo(MyInt(1)); }

Run Code Online (Sandbox Code Playgroud)

bar1()并bar2()生成几乎相同的汇编代码，除了分别调用foo(int)和foo(MyInt)。特别是在 x86_64 上，它看起来像：

        mov     edi, 1
        jmp     foo(MyInt) ;tail-call optimization jmp instead of call ret

Run Code Online (Sandbox Code Playgroud)

但是如果我们测试std::tuple<int>，它会有所不同：

void foo(std::tuple<int>);
void bar3() { foo(std::tuple<int>(1)); }

struct MyIntTuple : std::tuple<int> { using std::tuple<int>::tuple; };
void foo(MyIntTuple);
void bar4() { foo(MyIntTuple(1)); }

Run Code Online (Sandbox Code Playgroud)

生成的汇编代码看起来完全不同，小尺寸的struct( std::tuple<int>)是通过指针传递的：

        sub     rsp, 24 …

Run Code Online (Sandbox Code Playgroud)

c++ x86 calling-convention c++11 stdtuple

Yum*_*Yao

2020 09-24

29
推荐指数

2
解决办法

1292
查看次数

返回比std :: pair更低效的2元组？

考虑以下代码:

#include <utility>
#include <tuple>

std::pair<int, int> f1()
{
    return std::make_pair(0x111, 0x222);
}

std::tuple<int, int> f2()
{
    return std::make_tuple(0x111, 0x222);
}

Run Code Online (Sandbox Code Playgroud)

Clang 3和4在x86-64上生成类似的代码:

f1():
 movabs rax,0x22200000111
 ret    
f2():
 movabs rax,0x11100000222 ; opposite packing order, not important
 ret

Run Code Online (Sandbox Code Playgroud)

但是Clang 5生成了不同的代码f2():

f2():
 movabs rax,0x11100000222
 mov    QWORD PTR [rdi],rax
 mov    rax,rdi
 ret

Run Code Online (Sandbox Code Playgroud)

正如GCC 4至GCC 7一样:

f2():
 movabs rdx,0x11100000222
 mov    rax,rdi
 mov    QWORD PTR [rdi],rdx ; GCC 4-6 use 2 DWORD stores
 ret

Run Code Online (Sandbox Code Playgroud)

返回std::tuple适合单个寄存器的生成代码为什么会更糟std::pair？看起来特别奇怪,因为Clang 3和4似乎是最优的,而5则不是.

在这里试试:https: …

c++ gcc clang calling-convention stdtuple

Joh*_*nck

2017 10-26

21
推荐指数

1
解决办法

1474
查看次数

标签统计

c++ ×3

c++11 ×2

calling-convention ×2

gcc ×2

stdtuple ×2

benchmarking ×1

clang ×1

linux ×1

performance ×1

profiling ×1

unix ×1

vector ×1

x86 ×1

启用C++ 11时std :: vector performance regression

UNIX`time`命令对于基准测试是否足够准确？

为什么 std::tuple 会破坏 C++ 中的小型结构调用约定优化？

返回比std :: pair更低效的2元组？

标签 统计

标签统计