node_s/v8 flamegraph中使用perf_events的未知事件

Kam*_*l Z 8 performance profiling v8 node.js perf

我尝试做使用Linux作为perf_events由布伦丹·格雷格描述的一些分析的NodeJS 这里.

工作流程如下:

  1. 运行节点> 0.11.13 with --perf-basic-prof,这将创建/tmp/perf-(PID).map写入JavaScript符号映射的文件.
  2. 使用捕获堆栈 perf record -F 99 -p `pgrep -n node` -g -- sleep 30
  3. 使用存储库中的stackcollapse-perf.pl脚本折叠堆栈
  4. 使用flamegraph.pl脚本生成svg火焰图

我得到以下结果(开头看起来非常好): 在此输入图像描述

问题是有很多[unknown]元素,我认为应该是我的nodejs函数调用.我假设整个过程在第3点失败,其中perf数据应该使用由node/v8生成的映射来折叠--perf-basic-prof./tmp/perf-PID.map在节点执行期间创建文件并将一些映射写入其中.

如何解决这个问题呢?

我使用的是CentOS 6.5 x64,并且已经尝试使用节点0.11.13,0.11.14(预建和编译),但没有成功.

Mik*_*vey 15

FIrst of all, what "[unknown]" means is the sampler couldn't figure out the name of the function, because it's a system or library function. If so, that's OK - you don't care, because you're looking for things responsible for time in your code, not system code.

Actually, I'm suggesting this is one of those XY questions. Even if you get a direct answer to what you asked, it is likely to be of little use. Here are the reasons why:

1. CPU Profiling is of little use in an I/O bound program

The two towers on the left in your flame graph are doing I/O, so they probably take a lot more wall-time than the big pile on the right. If this flame graph were derived from wall-time samples, rather than CPU-time samples, it could look more like the second graph below, which tells you where time actually goes:

在此输入图像描述

What was a big juicy-looking pile on the right has shrunk, so it is nowhere near as significant. On the other hand, the I/O towers are very wide. Any one of those wide orange stripes, if it's in your code, represents a chance to save a lot of time, if some of the I/O could be avoided.

2. Whether the program is CPU- or I/O-bound, speedup opportunities can easily hide from flame graphs

Suppose there is some function Foo that really is doing something wasteful, that if you knew about it, you could fix. Suppose in the flame graph, it is a dark red color. Suppose it is called from numerous places in the code, so it's not all collected in one spot in the flame graph. Rather it appears in multiple small places shown here by black outlines:

在此输入图像描述

Notice, if all those rectangles were collected, you could see that it accounts for 11% of time, meaning it is worth looking at. If you could cut its time in half, you could save 5.5% overall. If what it's doing could actually be avoided entirely, you could save 11% overall. Each of those little rectangles would shrink down to nothing, and pull the rest of the graph, to its right, with it.

Now I'll show you the method I use. I take a moderate number of random stack samples and examine each one for routines that might be speeded up. That corresponds to taking samples in the flame graph like so:

在此输入图像描述

The slender vertical lines represent twenty random-time stack samples. As you can see, three of them are marked with an X. Those are the ones that go through Foo. That's about the right number, because 11% times 20 is 2.2.

(Confused? OK, here's a little probability for you. If you flip a coin 20 times, and it has a 11% chance of coming up heads, how many heads would you get? Technically it's a binomial distribution. The most likely number you would get is 2, the next most likely numbers are 1 and 3. (If you only get 1 you keep going until you get 2.) Here's the distribution:)

在此输入图像描述

(The average number of samples you have to take to see Foo twice is 2/0.11 = 18.2 samples.)

Looking at those 20 samples might seem a bit daunting, because they run between 20 and 50 levels deep. However, you can basically ignore all the code that isn't yours. Just examine them for your code. You'll see precisely how you are spending time, and you'll have a very rough measurement of how much. Deep stacks are both bad news and good news - they mean the code may well have lots of room for speedups, and they show you what those are.

你看到的任何你可以加速的东西,如果你在不止一个样本上看到它,会给你一个健康的加速,保证.您需要在多个样本上看到它的原因是,如果您只在一个样本上看到它,您只知道它的时间不为零.如果您在多个样本上看到它,您仍然不知道它需要多少时间,但您确实知道它不小. 这是统计数据.


小智 9

一般来说,不同意主题专家是一个坏主意,但(最大的尊重)我们走了!

所以敦促答案做到以下几点:

"请务必回答这个问题.提供详细信息并分享您的研究成果!"

所以问题是,至少我对它的解释是,为什么在perf脚本输出中有[未知]帧(以及如何将这些[未知]帧转换为有意义的名称)?这个问题可能是关于"如何提高我的系统性能?" 但在这种特殊情况下我没有这样看.关于如何对perf记录数据进行后处理,这里存在一个真正的问题.

这个问题的答案是尽管前提条件设置正确:正确的节点版本,正确的参数存在以生成函数名称(--perf-basic-prof),生成的perf映射文件必须由root拥有 for perf脚本生成预期的输出.

而已!

今天写了一些新的脚本我点击了这个指引我这个问题.

这里有几个额外的参考:

https://yunong.io/2015/11/23/generating-node-js-flame-graphs/

https://github.com/jrudolph/perf-map-agent/blob/d8bb58676d3d15eeaaf3ab3f201067e321c77560/bin/create-java-perf-map.sh#L22

[有时可以强制使用非root文件] http://www.spinics.net/lists/linux-perf-users/msg02588.html