Sop*_*ert 7 benchmarking valgrind cachegrind
受SQLite的启发,我正在寻找使用valgrind的"cachegrind"工具来进行可重现的性能基准测试.它输出的数字比我发现的任何其他计时方法稳定得多,但它们仍然不具有确定性.举个例子,这是一个简单的C程序:
int main() {
volatile int x;
while (x < 1000000) {
x++;
}
}
Run Code Online (Sandbox Code Playgroud)
如果我编译它并在cachegrind下运行它,我得到以下结果:
$ gcc -O2 x.c -o x
$ valgrind --tool=cachegrind ./x
==11949== Cachegrind, a cache and branch-prediction profiler
==11949== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al.
==11949== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info
==11949== Command: ./x
==11949==
--11949-- warning: L3 cache found, using its data for the LL simulation.
==11949==
==11949== I refs: 11,158,333
==11949== I1 misses: 3,565
==11949== LLi misses: 2,611
==11949== I1 miss rate: 0.03%
==11949== LLi miss rate: 0.02%
==11949==
==11949== D refs: 4,116,700 (3,552,970 rd + 563,730 wr)
==11949== D1 misses: 21,119 ( 19,041 rd + 2,078 wr)
==11949== LLd misses: 7,487 ( 6,148 rd + 1,339 wr)
==11949== D1 miss rate: 0.5% ( 0.5% + 0.4% )
==11949== LLd miss rate: 0.2% ( 0.2% + 0.2% )
==11949==
==11949== LL refs: 24,684 ( 22,606 rd + 2,078 wr)
==11949== LL misses: 10,098 ( 8,759 rd + 1,339 wr)
==11949== LL miss rate: 0.1% ( 0.1% + 0.2% )
$ valgrind --tool=cachegrind ./x
==11982== Cachegrind, a cache and branch-prediction profiler
==11982== Copyright (C) 2002-2015, and GNU GPL'd, by Nicholas Nethercote et al.
==11982== Using Valgrind-3.11.0.SVN and LibVEX; rerun with -h for copyright info
==11982== Command: ./x
==11982==
--11982-- warning: L3 cache found, using its data for the LL simulation.
==11982==
==11982== I refs: 11,159,225
==11982== I1 misses: 3,611
==11982== LLi misses: 2,611
==11982== I1 miss rate: 0.03%
==11982== LLi miss rate: 0.02%
==11982==
==11982== D refs: 4,117,029 (3,553,176 rd + 563,853 wr)
==11982== D1 misses: 21,174 ( 19,090 rd + 2,084 wr)
==11982== LLd misses: 7,496 ( 6,154 rd + 1,342 wr)
==11982== D1 miss rate: 0.5% ( 0.5% + 0.4% )
==11982== LLd miss rate: 0.2% ( 0.2% + 0.2% )
==11982==
==11982== LL refs: 24,785 ( 22,701 rd + 2,084 wr)
==11982== LL misses: 10,107 ( 8,765 rd + 1,342 wr)
==11982== LL miss rate: 0.1% ( 0.1% + 0.2% )
$
Run Code Online (Sandbox Code Playgroud)
在这种情况下,"I refs"在两次运行之间仅相差0.008%,但我仍然想知道为什么这些不同.在更复杂的程序(几十毫秒)中,它们可以变化更多.有没有办法让运行完全可重复?
在gmane.comp.debugging.valgrind的一个主题的最后,Nicholas Nethercote(一个在Valgrind开发团队工作的Mozilla开发人员)说使用Cachegrind会有一些小变化(我可以推断它们不会导致重大问题) .
Cachegrind的手册提到该程序非常敏感.例如,在Linux上,地址空间随机化(用于提高安全性)可能是非确定性的来源.
值得注意的另一件事是结果非常敏感.更改正在分析的可执行文件的大小,或者它使用的任何共享库的大小,甚至文件名的长度都会影响结果.变化会很小,但如果您的程序发生变化,则不会产生完全可重复的结果.
最近的GNU/Linux发行版确实解决了空间随机化问题,其中相同程序的相同运行将其共享库加载到不同位置,作为安全措施.这也扰乱了结果.
虽然这些因素意味着你不应该相信结果是超精确的,但它们应该足够接近有用.
| 归档时间: |
|
| 查看次数: |
241 次 |
| 最近记录: |