c ++分析/优化:如何在优化的函数中获得更好的分析粒度

Ada*_*dam 6 c++ optimization profiler

我正在使用google的perftools(http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html)进行CPU分析 - 这是一个很棒的工具,它帮助我执行了大量的CPU时间改进我的申请.

不幸的是,我已经明白了代码仍然有点慢,并且当使用g ++的-O3优化级别进行编译时,我所知道的是特定的函数很慢,但不是它的哪些方面很慢.

如果我删除了-O3标志,那么程序的未经优化的部分超过了这个功能,而且我对功能的实际部分很清楚.如果我留下-O3标志,那么函数的慢速部分是内联的,我无法确定函数的哪些部分很慢.

有什么建议?谢谢你的帮助!

Gre*_*ers 6

For something like this, I've always used the "old school" way of doing it:

Insert into the routine you want to measure at various points statements which measure the current time (or cputime). Then simply print out or log the differences between them and you will know how long each section of code took. From there you can find out what is eating most of the time, and go in and get fine-grained timing within that section until you know what the problem is, and how to fix it.

If the overhead of the function calls is not the problem, you can also force inlining to be off with -fno-inline-small-functions -fno-inline-functions -fno-inline-functions-called-once -fno-inline (I'm not exactly sure how these switches interact with each other, but I think they are independent). Then you can use your ordinary profiler to look at the call graph profile and see what function calls are taking what amount of time.


tim*_*day 5

如果您使用的是Linux,请使用oprofile.如果您使用的是Windows,请使用AMD的CodeAnalyst.

两者都将基于样本的配置文件降低到单个源代码行或汇编指令的级别,您应该没有问题识别函数内的"热点".