如何使用 NSight Compute 2019 CLI 获取内核的执行时间？

Question

如何使用 NSight Compute 2019 CLI 获取内核的执行时间？

ein*_*ica 1 profiling cuda command-line-interface nsight-compute

假设我有一个myapp不需要命令行参数的可执行文件，并启动 CUDA 内核mykernel。我可以调用：

nv-nsight-cu-cli -k mykernel myapp

Run Code Online (Sandbox Code Playgroud)

并得到如下所示的输出：

==PROF== Connected to process 30446 (/path/to/myapp)
==PROF== Profiling "mykernel": 0%....50%....100% - 13 passes
==PROF== Disconnected from process 1234
[1234] myapp@127.0.0.1
  mykernel(), 2020-Oct-25 01:23:45, Context 1, Stream 7
    Section: GPU Speed Of Light
    --------------------------------------------------------------------
    Memory Frequency                      cycle/nsecond      1.62
    SOL FB                                %                  1.58
    Elapsed Cycles                        cycle              4,421,067
    SM Frequency                          cycle/nsecond      1.43
    Memory [%]                            %                  61.76
    Duration                              msecond            3.07
    SOL L2                                %                  0.79
    SM Active Cycles                      cycle              4,390,420.69
    (etc. etc.)
    --------------------------------------------------------------------
    (etc. etc. - other sections here)

Run Code Online (Sandbox Code Playgroud)

到目前为止，一切都很好。但现在，我只想要 - 的总体内核持续时间mykernel，没有其他输出。看着nv-nsight-cu-cli --query-metrics，我发现：

gpu__time_duration           incremental duration in nanoseconds; isolated measurement is same as gpu__time_active
gpu__time_active             total duration in nanoseconds

Run Code Online (Sandbox Code Playgroud)

那么，它一定是其中之一，对吧？但当我跑步时

nv-nsight-cu-cli -k mykernel myapp --metrics gpu__time_duration,gpu__time_active

Run Code Online (Sandbox Code Playgroud)

我得到：

==PROF== Connected to process 30446 (/path/to/myapp)
==PROF== Profiling "mykernel": 0%....50%....100% - 13 passes
==PROF== Disconnected from process 12345
[12345] myapp@127.0.0.1
  mykernel(), 2020-Oct-25 12:34:56, Context 1, Stream 7
    Section: GPU Speed Of Light
    Section: Command line profiler metrics
    ---------------------------------------------------------------
    gpu__time_active                                   (!) n/a
    gpu__time_duration                                 (!) n/a
    ---------------------------------------------------------------

Run Code Online (Sandbox Code Playgroud)

我的问题：

为什么我得到的值是“n/a”？
我怎样才能得到我想要的实际值，而不是别的？

备注: :

我正在使用 CUDA 10.2 和 NSight Compute 版本 2019.5.0（内部版本 27346997）。
我意识到我可以过滤不合格调用的标准输出流，但这不是我想要的。
我实际上只想要原始数字，但我愿意满足于使用--csv并采用最后一个字段。
在nvprof 转换指南中找不到任何相关内容。

Answer 1

ein*_*ica 5

tl;dr：您需要指定适当的“子度量”：

nv-nsight-cu-cli -k mykernel myapp --metrics gpu__time_active.avg

Run Code Online (Sandbox Code Playgroud)

_{（基于@RobertCrovella 的评论）}

CUDA 的分析机制收集“基本指标”，这些指标确实以--list-metrics. 对于其中的每一个，都会采集多个样本。在 NSight Compute 2019.5 版本中，您不能只获取原始样本；还可以获取原始样本。您只能获得“亚度量”值。

“子度量”本质上是将样本序列聚合成标量值。不同的指标有不同类型的子指标（请参阅此清单）；对于gpu__time_active，这些是：.min, .max, .sum, .avg。是的，如果您想知道 - 他们缺少第二时刻指标，例如方差或样本标准差。

因此，您必须指定一个或多个子指标（请参见上面的示例），或者升级到较新版本的 NSight Compute，使用它您实际上可以明显地获取所有样本。

归档时间：	5 年，2 月前
查看次数：	1723 次
最近记录：	5 年前