我有一个在Linux上运行的C++应用程序,我正在优化它.如何确定代码的哪些区域运行缓慢?
L3-misses我使用以下命令在简单的基准测试中提取导致用户级别的回溯evince:
sudo perf record -d --call-graph dwarf -c 10000 -e mem_load_uops_retired.l3_miss:uppp /opt/evince-3.28.4/bin/evince
Run Code Online (Sandbox Code Playgroud)
很明显,采样周期相当大(连续采样之间有 10000 个事件)。对于这个实验, 的输出perf script有一些与此类似的样本:
EvJobScheduler 27529 26441.375932: 10000 mem_load_uops_retired.l3_miss:uppp: 7fffcd5d8ec0 5080022 N/A|SNP N/A|TLB N/A|LCK N/A
7ffff17bec7f bits_image_fetch_separable_convolution_affine+0x2df (inlined)
7ffff17bec7f bits_image_fetch_separable_convolution_affine_pad_x8r8g8b8+0x2df (/usr/lib/x86_64-linux-gnu/libpixman-1.so.0.34.0)
7ffff17d1fd1 general_composite_rect+0x301 (/usr/lib/x86_64-linux-gnu/libpixman-1.so.0.34.0)
ffffffffffffffff [unknown] ([unknown])
Run Code Online (Sandbox Code Playgroud)
在回溯的底部,有一个名为 的符号[unknown],看起来没问题。但随后就呼叫了线路general_composite_rect()。这个回溯OK吗?
AFAIK,回溯中的第一个调用者应该是类似_start()或 的东西__GI___clone()。但回溯不是这种形式。怎么了?
有什么办法可以解决这个问题吗?截断的(部分)回溯可靠吗?
我有一个 C++ 测试程序,可以让 CPU 保持忙碌:
\n#include <cstdint>\n#include <iostream>\n\n// Linear-feedback shift register\nuint64_t lfsr1(uint64_t max_ix)\n{\n uint64_t start_state = 0xACE1u; /* Any nonzero start state will work. */\n uint64_t lfsr = start_state;\n uint64_t bit; /* Must be 16-bit to allow bit<<15 later in the code */\n\n for (uint64_t ix = 0; ix < max_ix; ++ix)\n { /* taps: 16 14 13 11; feedback polynomial: x^16 + x^14 + x^13 + x^11 + 1 */\n bit = ((lfsr >> 0) ^ (lfsr >> …Run Code Online (Sandbox Code Playgroud)