解读tensorflow benchmark工具的结果

Question

解读tensorflow benchmark工具的结果

mrg*_*oom 0 benchmarking tensorflow tensorflow-lite

Tensorflow 有几个基准测试工具：

对于.pb 模型和.tflite 模型

我对 .pb 基准工具的参数有几个问题：

被num_threads相关的单线程实验或通过使用tensorflow内螺纹平行运行次数？
为桌面构建工具时是否可以使用 GPU，即不适用于移动设备？如果是这样，如何确保不使用GPU？

还有一些关于结果解释的问题：

什么是count在结果输出？如何Timings (microseconds): count=相关--max_num_runs的参数？

例子：

Run --num_threads=-1 --max_num_runs=1000:
    2019-03-20 14:30:33.253584: I tensorflow/core/util/stat_summarizer.cc:85] Timings (microseconds): count=1000 first=3608 curr=3873 min=3566 max=8009 avg=3766.49 std=202
    2019-03-20 14:30:33.253584: I tensorflow/core/util/stat_summarizer.cc:85] Memory (bytes): count=1000 curr=3301344(all same)
    2019-03-20 14:30:33.253591: I tensorflow/core/util/stat_summarizer.cc:85] 207 nodes observed
    2019-03-20 14:30:33.253597: I tensorflow/core/util/stat_summarizer.cc:85]
    2019-03-20 14:30:33.378352: I tensorflow/tools/benchmark/benchmark_model.cc:636] FLOPs estimate: 116.65M
    2019-03-20 14:30:33.378390: I tensorflow/tools/benchmark/benchmark_model.cc:638] FLOPs/second: 46.30B

Run --num_threads=1 --max_num_runs=1000:
    2019-03-20 14:32:25.591915: I tensorflow/core/util/stat_summarizer.cc:85] Timings (microseconds): count=1000 first=7502 curr=7543 min=7495 max=7716 avg=7607.22 std=34
    2019-03-20 14:32:25.591934: I tensorflow/core/util/stat_summarizer.cc:85] Memory (bytes): count=1000 curr=3301344(all same)
    2019-03-20 14:32:25.591952: I tensorflow/core/util/stat_summarizer.cc:85] 207 nodes observed
    2019-03-20 14:32:25.591970: I tensorflow/core/util/stat_summarizer.cc:85]
    2019-03-20 14:32:25.805970: I tensorflow/tools/benchmark/benchmark_model.cc:636] FLOPs estimate: 116.65M
    2019-03-20 14:32:25.806007: I tensorflow/tools/benchmark/benchmark_model.cc:638] FLOPs/second: 15.46B

Run --num_threads=-1 --max_num_runs=10000:
    2019-03-20 14:38:48.045824: I tensorflow/core/util/stat_summarizer.cc:85] Timings (microseconds): count=3570 first=3961 curr=3899 min=3558 max=6997 avg=3841.2 std=175
    2019-03-20 14:38:48.045829: I tensorflow/core/util/stat_summarizer.cc:85] Memory (bytes): count=3570 curr=3301344(all same)
    2019-03-20 14:38:48.045833: I tensorflow/core/util/stat_summarizer.cc:85] 207 nodes observed
    2019-03-20 14:38:48.045837: I tensorflow/core/util/stat_summarizer.cc:85]
    2019-03-20 14:38:48.169368: I tensorflow/tools/benchmark/benchmark_model.cc:636] FLOPs estimate: 116.65M
    2019-03-20 14:38:48.169412: I tensorflow/tools/benchmark/benchmark_model.cc:638] FLOPs/second: 48.66B

Run --num_threads=1 --max_num_runs=10000:
    2019-03-20 14:35:50.826722: I tensorflow/core/util/stat_summarizer.cc:85] Timings (microseconds): count=1254 first=7496 curr=7518 min=7475 max=7838 avg=7577.23 std=50
    2019-03-20 14:35:50.826735: I tensorflow/core/util/stat_summarizer.cc:85] Memory (bytes): count=1254 curr=3301344(all same)
    2019-03-20 14:35:50.826746: I tensorflow/core/util/stat_summarizer.cc:85] 207 nodes observed
    2019-03-20 14:35:50.826757: I tensorflow/core/util/stat_summarizer.cc:85]
    2019-03-20 14:35:51.053143: I tensorflow/tools/benchmark/benchmark_model.cc:636] FLOPs estimate: 116.65M
    2019-03-20 14:35:51.053180: I tensorflow/tools/benchmark/benchmark_model.cc:638] FLOPs/second: 15.55B

Run Code Online (Sandbox Code Playgroud)

即当--max_num_runs=10000使用计数count=3570和count=1254这是什么意思？

对于.tflite基准工具：

--num_threads=1 --num_runs=10000
    Initialized session in 0.682ms
    Running benchmark for at least 1 iterations and at least 0.5 seconds
    count=54 first=23463 curr=8019 min=7911 max=23463 avg=9268.5 std=2995
    Running benchmark for at least 1000 iterations and at least 1 seconds
    count=1000 first=8022 curr=6703 min=6613 max=10333 avg=6766.23 std=337
    Average inference timings in us: Warmup: 9268.5, Init: 682, no stats: 6766.23

Run Code Online (Sandbox Code Playgroud)

什么no stats: 6766.23意思？

Answer 1

McA*_*gus 6

在深入研究代码后，我发现了以下内容（所有时间都以微秒为单位）：

count: 实际运行次数
first: 第一次迭代所用的时间
curr: 上次迭代的时间
min: 迭代所需的最短时间
max: 迭代花费的最长时间
avg: 迭代平均时间
std：所有运行时间的标准偏差
Warmup: 预热运行平均值
Init: 启动时间（应始终与相同Initialized session in）
no stats: 是名字很差的平均运行时间（与avg=前一行中的匹配）
num_threads：这用于设置intra_op_parallelism_threads和inter_op_parallelism_threads（更多信息在这里）

相关文件（链接到正确的行）是：

stats_calculator.h - 实际跟踪运行时的代码
benchmark_model.cc(tflite) - 奇怪的“无统计数据”名称
benchmark_model.cc(pb) - 使用 num_threads

我不太确定使用 GPU 还是不使用 GPU。如果您使用freeze_graph导出.pb文件，那么它将在图中存储每个节点的设备。您可以在导出之前使用设备放置来执行此操作。如果您在尝试设置环境变量CUDA_VISIBLE_DEVICES=""以确保不使用 GPU后需要更改它。

归档时间：	6 年，9 月前
查看次数：	1023 次
最近记录：	6 年，9 月前