Cachegrind:为什么这么多缓存未命中?

And*_*ely 6 c++ performance profiling cpu-cache cachegrind

我目前正在学习Linux下的各种分析和性能实用程序,特别是valgrind/cachegrind.

我有以下玩具程序:

#include <iostream>
#include <vector>

int
main() {
    const unsigned int COUNT = 1000000;

    std::vector<double> v;

    for(int i=0;i<COUNT;i++) {
        v.push_back(i);
    }

    double counter = 0;
    for(int i=0;i<COUNT;i+=8) {
        counter += v[i+0];
        counter += v[i+1];
        counter += v[i+2];
        counter += v[i+3];
        counter += v[i+4];
        counter += v[i+5];
        counter += v[i+6];
        counter += v[i+7];
    }

    std::cout << counter << std::endl;
}
Run Code Online (Sandbox Code Playgroud)

用这个程序编译g++ -O2 -g main.cpp并运行valgrind --tool=cachegrind ./a.out,然后cg_annotate cachegrind.out.31694 --auto=yes产生以下结果:

    --------------------------------------------------------------------------------
-- Auto-annotated source: /home/andrej/Data/projects/pokusy/dod.cpp
--------------------------------------------------------------------------------
       Ir I1mr ILmr        Dr    D1mr    DLmr        Dw D1mw DLmw 

        .    .    .         .       .       .         .    .    .  #include <iostream>
        .    .    .         .       .       .         .    .    .  #include <vector>
        .    .    .         .       .       .         .    .    .  
        .    .    .         .       .       .         .    .    .  int
        7    1    1         1       0       0         4    0    0  main() {
        .    .    .         .       .       .         .    .    .      const unsigned int COUNT = 1000000;
        .    .    .         .       .       .         .    .    .  
        .    .    .         .       .       .         .    .    .      std::vector<double> v;
        .    .    .         .       .       .         .    .    .  
5,000,000    0    0 1,999,999       0       0         0    0    0      for(int i=0;i<COUNT;i++) {
3,000,000    0    0         0       0       0 1,000,000    0    0          v.push_back(i);
        .    .    .         .       .       .         .    .    .      }
        .    .    .         .       .       .         .    .    .  
        3    0    0         0       0       0         0    0    0      double counter = 0;
  250,000    0    0         0       0       0         0    0    0      for(int i=0;i<COUNT;i+=8) {
  250,000    0    0   125,000       1       1         0    0    0          counter += v[i+0];
  125,000    0    0   125,000       0       0         0    0    0          counter += v[i+1];
  125,000    1    1   125,000       0       0         0    0    0          counter += v[i+2];
  125,000    0    0   125,000       0       0         0    0    0          counter += v[i+3];
  125,000    0    0   125,000       0       0         0    0    0          counter += v[i+4];
  125,000    0    0   125,000       0       0         0    0    0          counter += v[i+5];
  125,000    0    0   125,000 125,000 125,000         0    0    0          counter += v[i+6];
  125,000    0    0   125,000       0       0         0    0    0          counter += v[i+7];
        .    .    .         .       .       .         .    .    .      }
        .    .    .         .       .       .         .    .    .  
        .    .    .         .       .       .         .    .    .      std::cout << counter << std::endl;
       11    0    0         6       1       1         0    0    0  }
Run Code Online (Sandbox Code Playgroud)

我担心的是这一行:

125,000    0    0   125,000 125,000 125,000         0    0    0          counter += v[i+6];
Run Code Online (Sandbox Code Playgroud)

为什么这一行有如此多的缓存未命中?数据在连续的内存中,每次迭代我都在读64字节的数据(假设缓存行长度为64字节).

我在Ubuntu Linux 18.04.1,内核4.19,g ++ 7.3.0上运行该程序.电脑是AMD 2400G.

use*_*670 2

我怀疑发生这种情况是因为向量缓冲区未在缓存行边界上对齐。当我们进入下一行时,缓存未命中的突然跳跃标志着一个点。所以我建议检查v.data()价值。