And*_*ely 6 c++ performance profiling cpu-cache cachegrind
我目前正在学习Linux下的各种分析和性能实用程序,特别是valgrind/cachegrind.
我有以下玩具程序:
#include <iostream>
#include <vector>
int
main() {
const unsigned int COUNT = 1000000;
std::vector<double> v;
for(int i=0;i<COUNT;i++) {
v.push_back(i);
}
double counter = 0;
for(int i=0;i<COUNT;i+=8) {
counter += v[i+0];
counter += v[i+1];
counter += v[i+2];
counter += v[i+3];
counter += v[i+4];
counter += v[i+5];
counter += v[i+6];
counter += v[i+7];
}
std::cout << counter << std::endl;
}
Run Code Online (Sandbox Code Playgroud)
用这个程序编译g++ -O2 -g main.cpp并运行valgrind --tool=cachegrind ./a.out,然后cg_annotate cachegrind.out.31694 --auto=yes产生以下结果:
--------------------------------------------------------------------------------
-- Auto-annotated source: /home/andrej/Data/projects/pokusy/dod.cpp
--------------------------------------------------------------------------------
Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
. . . . . . . . . #include <iostream>
. . . . . . . . . #include <vector>
. . . . . . . . .
. . . . . . . . . int
7 1 1 1 0 0 4 0 0 main() {
. . . . . . . . . const unsigned int COUNT = 1000000;
. . . . . . . . .
. . . . . . . . . std::vector<double> v;
. . . . . . . . .
5,000,000 0 0 1,999,999 0 0 0 0 0 for(int i=0;i<COUNT;i++) {
3,000,000 0 0 0 0 0 1,000,000 0 0 v.push_back(i);
. . . . . . . . . }
. . . . . . . . .
3 0 0 0 0 0 0 0 0 double counter = 0;
250,000 0 0 0 0 0 0 0 0 for(int i=0;i<COUNT;i+=8) {
250,000 0 0 125,000 1 1 0 0 0 counter += v[i+0];
125,000 0 0 125,000 0 0 0 0 0 counter += v[i+1];
125,000 1 1 125,000 0 0 0 0 0 counter += v[i+2];
125,000 0 0 125,000 0 0 0 0 0 counter += v[i+3];
125,000 0 0 125,000 0 0 0 0 0 counter += v[i+4];
125,000 0 0 125,000 0 0 0 0 0 counter += v[i+5];
125,000 0 0 125,000 125,000 125,000 0 0 0 counter += v[i+6];
125,000 0 0 125,000 0 0 0 0 0 counter += v[i+7];
. . . . . . . . . }
. . . . . . . . .
. . . . . . . . . std::cout << counter << std::endl;
11 0 0 6 1 1 0 0 0 }
Run Code Online (Sandbox Code Playgroud)
我担心的是这一行:
125,000 0 0 125,000 125,000 125,000 0 0 0 counter += v[i+6];
Run Code Online (Sandbox Code Playgroud)
为什么这一行有如此多的缓存未命中?数据在连续的内存中,每次迭代我都在读64字节的数据(假设缓存行长度为64字节).
我在Ubuntu Linux 18.04.1,内核4.19,g ++ 7.3.0上运行该程序.电脑是AMD 2400G.
| 归档时间: |
|
| 查看次数: |
348 次 |
| 最近记录: |