Moh*_*mad 9 profiling gperftools
我刚开始使用谷歌性能工具(google-perftools以及libgoogle-perftools4ubuntu中的软件包),我发誓我正在谷歌上搜索一天,我没有找到答案!问题是我没有通过CPU分析获得所有函数的结果.这是我的代码:
#include "gperftools/profiler.h"
#include <iostream>
#include <math.h>
using namespace std;
void bar()
{
int a,b,c,d,j,k;
a=0;
int z=0;
b = 1000;
while(z < b)
{
while (a < b)
{
d = sin(a);
c = cos(a);
j = tan(a);
k = tan(a);
k = d * c + j *k;
a++;
}
a = 0;
z++;
}
}
void foo()
{
cout << "hey " << endl;
}
int main()
{
ProfilerStart("/home/mohammad/gperf/dump.txt");
int a = 1000;
while(a--){foo();}
bar();
ProfilerFlush();
ProfilerStop();
}
Run Code Online (Sandbox Code Playgroud)
编译为 g++ test.cc -lprofiler -o a.out
这是我运行代码的方式:
CPUPROFILE=dump.txt ./a.out
Run Code Online (Sandbox Code Playgroud)
我也试过这个:
CPUPROFILE_FREQUENCY=10000 LD_PRELOAD=/usr/local/lib/libprofiler.so.0.3.0 CPUPROFILE=dump.txt ./a.out
Run Code Online (Sandbox Code Playgroud)
这就是我得到的google-pprof --text a.out dump.txt:
Using local file ./a.out.
Using local file ./dump.txt.
Total: 22 samples
8 36.4% 36.4% 8 36.4% 00d8cb04
6 27.3% 63.6% 6 27.3% bar
3 13.6% 77.3% 3 13.6% __cos (inline)
2 9.1% 86.4% 2 9.1% 00d8cab4
1 4.5% 90.9% 1 4.5% 00d8cab6
1 4.5% 95.5% 1 4.5% 00d8cb06
1 4.5% 100.0% 1 4.5% __write_nocancel
0 0.0% 100.0% 3 13.6% __cos
Run Code Online (Sandbox Code Playgroud)
但是没有关于foo功能的信息!
我的系统信息:ubuntu 12.04 g ++ 4.6.3
就这样!
osg*_*sgx 11
TL; DR:foo快速和小型地获取分析事件,再运行100次.频率设置是拼写错误,并且pprof不会比CONFIG_HZ(通常为250)更频繁地采样.最好切换到更现代的Linux perf分析器(来自其作者的教程,维基百科).
长版:
你的foo功能太短而简单 - 只需调用两个函数.使用程序g++ test.cc -lprofiler -o test.s -S -g过滤编译测试,使c ++名称可读:test.sc++filt
foo():
.LFB972:
.loc 1 27 0
pushq %rbp
movq %rsp, %rbp
.loc 1 28 0
movl $.LC0, %esi
movl std::cout, %edi
call std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
movl std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&), %esi
movq %rax, %rdi
call std::basic_ostream<char, std::char_traits<char> >::operator<<(std::basic_ostream<char, std::char_traits<char> >& (*)(std::basic_ostream<char, std::char_traits<char> >&))
.loc 1 29 0
popq %rbp
ret
.LFE972:
.size foo(), .-foo()
Run Code Online (Sandbox Code Playgroud)
因此,要在配置文件中看到它,您应该运行foo更多次,将int a = 1000;main 更改为更大的内容,例如10000或更好的100000(就像我在测试中一样).
你也可以修正不正确的" CPUPROFILE_FREQUENC=10000"来纠正CPUPROFILE_FREQUENCY(注意Y).我应该说10000为CPUPROFILE_FREQUENCY的设置太高,因为它通常每秒只能生成1000或250个事件,具体取决于内核配置CONFIG_HZ(大多数3.x内核有250,检查grep CONFIG_HZ= /boot/config*).pprof中CPUPROFILE_FREQUENCY的默认设置为100.
我在Ubuntu 14.04上测试了不同的CPUPROFILE_FREQUENCY值:100000,10000,1000,250和bash脚本
for a in 100000 100000 10000 10000 1000 1000 300 300 250 250; do
echo -n "$a ";
CPUPROFILE_FREQUENCY=$a CPUPROFILE=dump$a.txt ./test >/dev/null;
done
Run Code Online (Sandbox Code Playgroud)
结果是每个./test的120-140事件和运行时间大约0.5秒,因此来自google-perftools的cpuprofiler不能为单线程每秒执行更多事件,而不是内核中设置的CONFIG_HZ(我有250).
100000 PROFILE: interrupts/evictions/bytes = 124/1/6584
100000 PROFILE: interrupts/evictions/bytes = 134/0/7864
10000 PROFILE: interrupts/evictions/bytes = 125/0/7488
10000 PROFILE: interrupts/evictions/bytes = 123/0/6960
1000 PROFILE: interrupts/evictions/bytes = 134/0/6264
1000 PROFILE: interrupts/evictions/bytes = 125/2/7272
300 PROFILE: interrupts/evictions/bytes = 137/2/7984
300 PROFILE: interrupts/evictions/bytes = 126/0/7216
250 PROFILE: interrupts/evictions/bytes = 123/3/6680
250 PROFILE: interrupts/evictions/bytes = 137/2/7352
Run Code Online (Sandbox Code Playgroud)
原始a = 1000 foo并且cout的函数运行得太快,无法在每次运行中获得任何分析事件(即使在250个事件/秒),因此您没有foo,也没有任何输入/输出函数.在少量的运行中,__write_nocancel可能会得到采样事件,然后foo 将报告libstdc ++的I/O函数(在某处不在顶部,因此使用或的--text选项),零自身事件计数和非零子事件计数:pprofgoogle-pprof
....
1 0.9% 99.1% 1 0.9% __write_nocancel
....
0 0.0% 100.0% 1 0.9% _IO_new_file_overflow
0 0.0% 100.0% 1 0.9% _IO_new_file_write
0 0.0% 100.0% 1 0.9% __GI__IO_putc
0 0.0% 100.0% 1 0.9% foo
0 0.0% 100.0% 1 0.9% new_do_write
0 0.0% 100.0% 1 0.9% std::endl
0 0.0% 100.0% 1 0.9% std::ostream::put
Run Code Online (Sandbox Code Playgroud)
有了a=100000,foo仍然太短而且速度很快,无法获得自己的事件,但I/O函数有几个.这是我从长--text输出中得到的列表:
34 24.6% 24.6% 34 24.6% __write_nocancel
1 0.7% 95.7% 35 25.4% __GI__IO_fflush
1 0.7% 96.4% 1 0.7% __GI__IO_putc
1 0.7% 97.8% 2 1.4% std::operator<<
1 0.7% 98.6% 36 26.1% std::ostream::flush
1 0.7% 99.3% 2 1.4% std::ostream::put
0 0.0% 100.0% 34 24.6% _IO_new_file_sync
0 0.0% 100.0% 34 24.6% _IO_new_file_write
0 0.0% 100.0% 40 29.0% foo
0 0.0% 100.0% 34 24.6% new_do_write
0 0.0% 100.0% 2 1.4% std::endl
Run Code Online (Sandbox Code Playgroud)
由于能够pprof读取调用链(它知道谁调用了获取样本的函数,如果没有省略帧信息),只能看到零自有计数器的函数.
我还可以推荐更现代,更强大的功能(软件和硬件事件,频率高达5 kHz或更高;用户空间和内核分析)和更好的支持分析器,Linux perf分析器(作者的教程,维基百科).
有结果从perf有a=10000:
$ perf record ./test3 >/dev/null
... skip some perf's spam about inaccessibility of kernel symbols
... note the 3 kHz frequency here VVVV
Lowering default frequency rate to 3250.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.078 MB perf.data (~3386 samples) ]
Run Code Online (Sandbox Code Playgroud)
要从perf.data输出文件中查看文本报告,我将使用less(因为perf report默认情况下启动交互式配置文件浏览器):
$ perf report |less
... skip some extra info about the machine, kernel, and perf starting command
# Samples: 1K of event 'cycles'
# Event count (approx.): 1155264208
# Overhead Command Shared Object Symbol
41.94% test3 libm-2.19.so [.] __tan_sse2
16.95% test3 libm-2.19.so [.] __sin_sse2
13.40% test3 libm-2.19.so [.] __cos_sse2
4.93% test3 test3 [.] bar()
2.90% test3 libc-2.19.so [.] __GI___libc_write
....
0.20% test3 test3 [.] foo()
Run Code Online (Sandbox Code Playgroud)
或者perf report -n | less查看原始事件(样本)计数器:
# Overhead Samples Command Shared Object
41.94% 663 test3 libm-2.19.so [.] __tan_sse2
16.95% 268 test3 libm-2.19.so [.] __sin_sse2
13.40% 212 test3 libm-2.19.so [.] __cos_sse2
4.93% 78 test3 test3 [.] bar()
2.90% 62 test3 libc-2.19.so [.] __GI___libc_write
....
0.20% 4 test3 test3 [.] foo()
Run Code Online (Sandbox Code Playgroud)