管道和进程替换之间的性能差异

Joh*_*n B 8 performance shell bash pipe process-substitution

我倾向于在我的bash脚本在进程替换在大多数情况下使用的管道,尤其是在使用多组的命令,因为它似乎更具可读性做的情况下,... | ... | ...过度... < <(... < <(...))

我想知道为什么在某些情况下使用进程替换比使用管道要快得多。

为了测试这一点,我time使用10000相同附加命令的迭代编写了两个脚本,一个使用管道,另一个使用进程替换。

脚本:

pipeline.bash

for i in {1..10000}; do
    echo foo bar |
    while read; do
        echo $REPLY >/dev/null
    done
done
Run Code Online (Sandbox Code Playgroud)

proc-sub.bash

for i in {1..10000}; do
    while read; do
        echo $REPLY >/dev/null
    done < <(echo foo bar)
done
Run Code Online (Sandbox Code Playgroud)

结果:

~$ time ./pipeline.bash

real    0m17.678s
user    0m14.666s
sys     0m14.807s

~$ time ./proc-sub.bash

real    0m8.479s
user    0m4.649s
sys     0m6.358s
Run Code Online (Sandbox Code Playgroud)

我知道管道创建了一个子进程,而进程替换创建了一个命名管道或一些文件/dev/fd,但不清楚这些差异如何影响性能。

cuo*_*glm 9

做同样的strace事情,你可以看到不同之处:

pipe

$ strace -c ./pipe.sh 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 57.89    0.103005           5     20000           clone
 40.81    0.072616           2     30000     10000 wait4
  0.58    0.001037           0    120008           rt_sigprocmask
  0.40    0.000711           0     10000           pipe
Run Code Online (Sandbox Code Playgroud)

proc-sub

$ strace -c ./procsub.sh 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 85.08    0.045502           5     10000           clone
  3.25    0.001736           0     90329       322 read
  2.12    0.001133           0     20009           open
  2.03    0.001086           0     50001           dup2
Run Code Online (Sandbox Code Playgroud)

有了上面的统计,你可以看到pipe创建更多的子进程(clonesyscall)并花费很多时间等待子进程(wait4syscall)完成以让父进程继续执行。

Process substitution不是。它可以直接从子进程读取。Process substitution与参数和变量扩展同时执行,命令Process Substitution在后台运行。来自bash manpage

Process Substitution
       Process  substitution  is supported on systems that support named pipes
       (FIFOs) or the /dev/fd method of naming open files.  It takes the  form
       of  <(list) or >(list).  The process list is run with its input or out?
       put connected to a FIFO or some file in /dev/fd.  The name of this file
       is  passed  as  an argument to the current command as the result of the
       expansion.  If the >(list) form is used, writing to the file will  pro?
       vide  input  for list.  If the <(list) form is used, the file passed as
       an argument should be read to obtain the output of list.

       When available, process substitution is performed  simultaneously  with
       parameter  and variable expansion, command substitution, and arithmetic
       expansion.
Run Code Online (Sandbox Code Playgroud)

更新

使用来自子进程的统计信息执行 strace:

pipe

$ strace -fqc ./pipe.sh 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 70.76    0.215739           7     30000     10000 wait4
 28.04    0.085490           4     20000           clone
  0.78    0.002374           0    220008           rt_sigprocmask
  0.17    0.000516           0    110009     20000 close
  0.15    0.000456           0     10000           pipe
Run Code Online (Sandbox Code Playgroud)

proc-sub

$ strace -fqc ./procsub.sh 
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.38    0.033977           3     10000           clone
 32.24    0.020913           0     96070      6063 read
  5.24    0.003398           0     20009           open
  2.34    0.001521           0    110003     10001 fcntl
  1.87    0.001210           0    100009           close
Run Code Online (Sandbox Code Playgroud)

  • 你需要 `strace -fqc` 来获得相关的统计数据,因为 `bash` 对孩子的作用也很重要。 (2认同)