管道和重定向速度，`pv` 和 UUOC

Question

管道和重定向速度，`pv` 和 UUOC

我正在测试不同的方法来产生随机垃圾并通过管道输出来比较它们的速度pv，如下所示：

$ cmd | pv -s "$size" -S > /dev/null

Run Code Online (Sandbox Code Playgroud)

我还想要一个“基线参考”，所以我用cat最快的源测量了最快的“生成器” /dev/zero：

$ cat /dev/zero | pv -s 100G -S > /dev/null
 100GiB 0:00:33 [2,98GiB/s] [=============================>] 100%

Run Code Online (Sandbox Code Playgroud)

3GB /秒，这是相当令人印象深刻，特别是相对于〜70MB我从中获取/dev/urandom。

但是，嘿，对于/dev/zero我不需要的特殊情况cat！只是为了好玩，我删除了这本教科书UUOC：

$ < /dev/zero pv -s 100G -S > /dev/null
 100GiB 0:00:10 [9,98GiB/s] [=============================>] 100%

Run Code Online (Sandbox Code Playgroud)

什么？？？几乎10GB/s ? 如何将cat管道移除速度提高三倍以上？如果使用较慢的源，例如/dev/urandom差异可以忽略不计。是pv在做一些巫毒魔法吗？所以我测试了：

$ dd if=/dev/zero iflag=count_bytes count=200G of=/dev/null status=progress
205392969728 bytes (205 GB, 191 GiB) copied, 16 s, 12,8 GB/s

Run Code Online (Sandbox Code Playgroud)

12,8 GB/秒！与相同pv，比使用管道快 4 倍。

是cat罪有应得？管道与重定向有很大不同吗？毕竟，两者都转到pvas stdin，对吗？什么可以解释这种巨大的差异？

Answer 1

Ste*_*itt 8

杀手是两个进程的使用。

用cat | pv，cat读和写，pv读和写，两个进程都需要运行：

$ perf stat sh -c 'cat /dev/zero | pv -s 100G -S > /dev/null'
 100GiB 0:00:26 [3.72GiB/s] [====================================================================================>] 100%            

 Performance counter stats for 'sh -c cat /dev/zero | pv -s 100G -S > /dev/null':

         34,048.63 msec task-clock                #    1.267 CPUs utilized          
         1,676,706      context-switches          #    0.049 M/sec                  
             3,678      cpu-migrations            #    0.108 K/sec                  
               304      page-faults               #    0.009 K/sec                  
   119,270,941,758      cycles                    #    3.503 GHz                      (74.89%)
   137,822,862,590      instructions              #    1.16  insn per cycle           (74.94%)
    32,379,369,104      branches                  #  950.974 M/sec                    (75.14%)
       216,658,446      branch-misses             #    0.67% of all branches          (75.04%)

      26.865741948 seconds time elapsed

       1.257950000 seconds user
      38.893870000 seconds sys

Run Code Online (Sandbox Code Playgroud)

随着pv而已，还有的只是pv读，写，开关不需要上下文（或几乎没有）：

$ perf stat sh -c '< /dev/zero pv -s 100G -S > /dev/null'
 100GiB 0:00:07 [13.3GiB/s] [====================================================================================>] 100%            

 Performance counter stats for 'sh -c < /dev/zero pv -s 100G -S > /dev/null':

          7,501.68 msec task-clock                #    1.000 CPUs utilized          
                37      context-switches          #    0.005 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               198      page-faults               #    0.026 K/sec                  
    27,916,420,023      cycles                    #    3.721 GHz                      (75.00%)
    62,787,377,126      instructions              #    2.25  insn per cycle           (74.99%)
    15,361,951,954      branches                  # 2047.801 M/sec                    (75.03%)
        51,741,595      branch-misses             #    0.34% of all branches          (74.98%)

       7.505304560 seconds time elapsed

       1.768600000 seconds user
       5.733786000 seconds sys

Run Code Online (Sandbox Code Playgroud)

有一些并行性（“使用了 1.267 个 CPU”），但这并不能弥补上下文切换数量的巨大差异。

考虑到数据路径，情况可能更糟——在第一种情况下，数据似乎从内核 ( /dev/zero) 流向cat，然后返回内核（对于管道），到pv，再到内核 ( /dev/null)。在第二种情况下，数据从内核流到pv，然后返回到内核。但在第一种情况下，pv用于splice从管道复制数据，避免通过内核拥有的内存。

归档时间：	5 年前
查看次数：	114 次
最近记录：	5 年前