在bash中并行运行有限数量的子进程？

Question

在bash中并行运行有限数量的子进程？

我有一大堆文件,需要进行一些繁重的处理.这种单线程处理,使用几百MiB的RAM(在用于启动作业的机器上),需要几分钟才能运行.我目前的用例是在输入数据上启动一个hadoop作业,但在之前的其他情况下我遇到了同样的问题.

为了充分利用可用的CPU功率,我希望能够在并行中运行多个这些任务.

但是,像这样的非常简单的示例shell脚本会因过度负载和交换而导致系统性能下降:

find . -type f | while read name ; 
do 
   some_heavy_processing_command ${name} &
done

Run Code Online (Sandbox Code Playgroud)

所以我想要的基本上类似于"gmake -j4"的作用.

我知道bash支持"wait"命令,但只等待直到所有子进程都已完成.在过去,我创建了执行'ps'命令的脚本,然后按名称grep子进程(是的,我知道......丑陋).

做我想要的最简单/最干净/最好的解决方案是什么？

编辑:感谢Frederik:是的,这确实是如何限制bash中函数中使用的线程/子进程数量的副本 "xargs --max-procs = 4"就像一个魅力.(所以我投票结束了我自己的问题)

Answer 1

小智 22

我知道我已经迟到了这个答案,但我想我会发布一个替代方案,恕我直言,使脚本的主体更清洁,更简单.(显然,您可以将值2和5更改为适合您的方案.)

function max2 {
   while [ `jobs | wc -l` -ge 2 ]
   do
      sleep 5
   done
}

find . -type f | while read name ; 
do 
   max2; some_heavy_processing_command ${name} &
done
wait

Run Code Online (Sandbox Code Playgroud)

老兄,这个作品非常出色!谢谢!:) (2认同)

Answer 2

Dun*_*nes 20

#! /usr/bin/env bash

set -o monitor 
# means: run background processes in a separate processes...
trap add_next_job CHLD 
# execute add_next_job when we receive a child complete signal

todo_array=($(find . -type f)) # places output into an array

index=0
max_jobs=2

function add_next_job {
    # if still jobs to do then add one
    if [[ $index -lt ${#todo_array[*]} ]]
    # apparently stackoverflow doesn't like bash syntax
    # the hash in the if is not a comment - rather it's bash awkward way of getting its length
    then
        echo adding job ${todo_array[$index]}
        do_job ${todo_array[$index]} & 
        # replace the line above with the command you want
        index=$(($index+1))
    fi
}

function do_job {
    echo "starting job $1"
    sleep 2
}

# add initial set of jobs
while [[ $index -lt $max_jobs ]]
do
    add_next_job
done

# wait for all jobs to complete
wait
echo "done"

Run Code Online (Sandbox Code Playgroud)

话虽如此,弗雷德里克认为xargs完全符合你的要求......

Answer 3

Ole*_*nge 9

使用GNU Parallel,它变得更简单:

find . -type f | parallel  some_heavy_processing_command {}

Run Code Online (Sandbox Code Playgroud)

了解详情:https://www.youtube.com/playlist？list = PL284C9FF2488BC6D1

归档时间：	14 年，7 月前
查看次数：	18023 次
最近记录：	10 年，3 月前